Skip to content

maldil/ICSE2023_PyEvolve_Artifacts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PYEVOLVE: Automating Frequent Code Changes in Python ML Systems

We made all the tools and data used in the research publicly available in order to claim "Available" and "Reusable" batches. This artifact consists of the open-source versions of the tool (PyEvolve) and two kinds of data sets that are used to evaluate the tool. We have readily installed the tool on a VirtualBox VM image and included the evaluation subjects for the reviewer’s convenience. The git repositories of the tools contain detailed instructions for building and using the tools, allowing the public to reuse them.

Note 1: The VirtualBox VM image is a 19GB file. The time it takes to download this file is determined by the internet connection speed. Therefore, we would like to kindly request that reviewers begin downloading this file prior to beginning the review process (check Step 1.2 of Section-1). Furthermore, reviewers must have at least 40 GB of free space on their computer in order to load the file into VirtualBox. 

Note 2:  We have generated a VirtualBox image on a Mac computer with an Intel chip. The evaluation process heavily relies on the ability to load and run the virtual machine, which may pose challenges as VirtualBox is known to have specific compatibility issues with certain operating systems. To mitigate this, we recommend the reviewer have Mac computer with an Intel chip to run the attached image. 

Note 3: We tested the compatibility of the VM image and evaluation instructions on a different machine (Mac-Intel chip), not the one used for creating the image, in order to ensure that all steps will run without interruption.

About the artifact

In Section 1, we first presented the steps to execute the tool in VirtualBox VM image which can be done in under 30 minutes. In section 2, we described our public datasets. The public may access all of these resources (tool and data) through our primary website or archived repositories at Zenodo.

We release one tool

  • PyEvolve - a tool that automatically transplants code change patterns to Python software systems

We made two types of large datasets available.

  • More than 40,000 code transformation trials were used to evaluate PyEvolve. This was achieved using a dataset of change patterns collected from real repositories. The dataset is publically available. 
  • PyEvolve atomatically transplanted code change patterns by submitting 40 pull requests to open source repositories. We made the list of the pull requests public.

1. Tool - PyEvolve

a. Initial setup

Step 1.1: If you do not already have VirtualBox installed, please use this link (https://www.virtualbox.org) and follow the instructions in the link to install VirtualBox on your computer.

Step 1.2: We offer a VM image with all the tools and data pre-installed. The image (named pyevolve.ova) can be downloaded from the links provided below. The image file is 19GB in size and may take some time to download depending on your internet connection speed. We have provided a few links below in case the first one does not work. You can get the image by clicking on one of the links provided below.

Step 1.3: Open VirtualBox. Click on the tools tab and then import the downloaded image file (pyevolve.ova) to VirtualBox, as shown below. For the image to be loaded into VirtualBox, you might need at least 40GB of free space. If not, you are likely to experience an error with the error code NS ERROR_INVALID_ARG.

Import1

You will be taken to the window shown below, where the settings should appear as depicted. Click the 'Import' button, as indicated in the image. This may take a few minutes to import the image into VirtualBox.

Step 1.4: To set up the configurations successfully, you need to disable the USB port of the virtual machine. To do this, follow the steps outlined in the following image. If this is not done properly, you will receive an error in the step Step 1.5 with the error code NS_ERROR_FAILURE.

disableUSB.mov

Step 1.5: Start the virtual machine by pressing the "Start" button at the top of the window.

Step 1.6: The virtual machine should ideally start now. However, sometimes it may enter into a shell prompt as shown below.

Screen Shot 2023-01-22 at 11 36 41 PM

If this is the case, you will need to manually run the following two commands one after the other.

  • FS1:
  • System\Library\CoreServices\boot.efi

The following video illustrates the steps described above (check Step 1.7: for login password).

load.mov

Step 1.7: The above step may take several minutes to start the machine, and you should see the startup screen as shown below. Use the password abc@123 to log in to the machine. Once logged in, you will find a folder named PYEVOLVE_FILES which contains the executables, source code, and data for PyEvolve.

startup

You have successfully configured all the necessary setup for executing PyEvolve.

b. Executing PyEvolve

Under the evaluation, we demostrate transplantation of following patterns to the project Keras. Please refer the technical paper for the further information of the pattern.

                                     \
:[[l1]] = open(:[[l2]], "r")      ----\    with open(:[[l2]], "r") as :[[l1]]:
:[l4] = :[[l1]].readlines()       ----/        :[[l4]] = :[[l1]].readlines()
:[[l1]].close()                      /

These changes were submited to the project keras and was accepted throught this pull request (keras-team/keras#16874).

Step 2.1: Open the teminal application in the virtual machine and Navigate to the folder PYEVOLVE_FILES using the command cd ~/Desktop/PYEVOLVE_FILES.

Step 2.2: Execute ls to view the PyEvolve executable pyevolve-1.0-SNAPSHOT.jar and other data.

For your convenience, we have included all the commands needed in the following steps in the file ~/Desktop/commands.txt, so that you do not have to type the long commands.

Step 2.3: You can use the command java --enable-preview -jar pyevolve-1.0-SNAPSHOT.jar to view all the required input arguments to successfully run the tool (your working directory must be ~/Desktop/PYEVOLVE_FILES). Below are the arguments for your knowledge (you do not have any action to perform).

  • -p,--patterns This folder contains code change patterns with two types of filenames: those that begin with 'l_', and those that begin with 'r_'. These prefixes indicate the rule of a code change before and after. It is essential that the names following the prefixes are the same for the tool to correctly identify the files that belong to the same change. For example, you can check the folder ~/Desktop/PYEVOLVE_FILES/PATTERNS/ which contains patterns that we are planning to use in this evaluation.
  • -r,--repositories The path to the project repository where the code change must be transplanted. This folder houses all of the projects.
  • -f,--files This file contains the project files that must be reviewed and modified. The paths has to be a relative path to the folder indicated by the argument -r.
  • -t,--types This folder holds the "type" information of the program elements of the projects found in the project repository (-r). This information needs to be inferred and stored in this folder before running the "PyEvolve". Instructions for inferring type information can be found in this repository: https://github.com/mlcodepatterns/PythonTypeInformation. For your convenience, the type information has already been included in the VirtualBox Image.

Step 2.4: To clone the project keras to the folder ~/Desktop/PYEVOLVE_FILES/PROJECTS/keras-team/ please execute following commands.

  • Navigate to the cd ~/Desktop/PYEVOLVE_FILES/PROJECTS/keras-team/
  • To clone git clone https://github.com/keras-team/keras.git ./keras/
  • To navigate to the folder keras, execute cd keras/
  • To retrieve the previous snapshot, before merging the pull request, execute git checkout f49e66c72ea5fe337c5292ee42f61cd75bc74727.

Step 2.5: To apply the patterns in the folder ./PATTERNS/ execute the following command. The argument descriptions are provided in the Step 2.3.

  • You should first navigate to the folder PYEVOLVE_FILES using the command cd ~/Desktop/PYEVOLVE_FILES/
  • To apply patterns execute java --enable-preview -jar pyevolve-1.0-SNAPSHOT.jar -r ./PROJECTS/ -f ./refactoring_files.txt -p ./PATTERNS/ -t ./TYPE_REPO/

The above command executes the main function of Pyevolve given in this link and makes the code changes for the project included in the folder ~/Desktop/PYEVOLVE_FILES/PROJECTS/.

Step 2.6: To check the changed files, navigate to the folder ~/Desktop/PYEVOLVE_FILES/PROJECTS/keras-team/keras execute git diff, scroll down to see all the changes. You can observe a successful transplantation of the patten given in Section 2.

You have successfully executed PyEvolve and transplanted patterns to the Keras project.

The steps 2.4, 2.5, and 2.6 are demonstrated in the video below.

evaluation4.mp4

2. Dataset

We have made available two distinct types of large datasets. To evaluate PyEvolve, we utilized over 40,000 code transformation trials by utilizing a dataset of change patterns collected from actual repositories. This dataset is readily accessible to the public. Additionally, PyEvolve automatically transplanted code change patterns by submitting 40 pull requests to open-source repositories. We have also made the list of these pull requests publicly available.

Cross validation dataset

PyEvolve was evaluated using the corss validation dataset. This data set contains frequent code changes made in open source Python projects. We evaluated PyEvolve with over 40,000 transformation trials. To provide easy access to data, we released data on a website hosted on GitHub, as described above. We have, however, archived all tools and data in accordance with ICSE 2023 Open Science Policies.

Pull requests patches generated by PyEvolve

# Created at Url State Merged
1 2022-06-26T18:21:33Z HazyResearch/pdftotree#122 closed True
2 2022-06-26T19:55:38Z brightmart/text_classification#149 closed True
3 2022-06-28T08:12:21Z tensorflow/lattice#73 closed True
4 2022-06-29T07:44:12Z quadrismegistus/prosodic#37 closed True
5 2022-07-01T01:31:54Z idaholab/raven#1877 closed True
6 2022-07-01T08:00:50Z erikbern/ann-benchmarks#303 closed True
7 2022-07-01T08:13:58Z david-abel/simple_rl#61 closed True
8 2022-07-04T09:41:28Z microsoft/nni#4982 closed True
9 2022-07-04T20:33:39Z ray-project/ray#26284 closed True
10 2022-07-05T00:03:01Z jindongwang/transferlearning#341 closed True
11 2022-07-06T06:48:04Z dipy/dipy#2618 closed True
12 2022-07-06T08:45:22Z pgmpy/pgmpy#1551 closed True
13 2022-07-14T08:44:44Z reframe-hpc/reframe#2565 closed True
14 2022-07-14T08:52:19Z DeepLabCut/DeepLabCut#1905 closed True
15 2022-08-06T06:16:59Z pytorch/pytorch#82929 closed True
16 2022-08-06T06:46:26Z ray-project/ray#27600 closed True
17 2022-08-06T08:04:47Z keras-team/keras#16874 closed True
18 2022-08-06T08:22:04Z GoogleCloudDataproc/cloud-dataproc#152 closed True
19 2022-08-06T08:29:03Z facebookresearch/ParlAI#4718 closed True
20 2022-08-06T09:35:13Z idaholab/raven#1930 closed True
21 2022-08-07T05:24:26Z BindsNET/bindsnet#570 closed True
22 2022-08-07T05:49:51Z CellProfiler/CellProfiler#4610 closed True
23 2022-08-07T06:32:24Z daniellerch/aletheia#21 closed True
24 2022-08-07T06:51:40Z deepinsight/insightface#2070 closed True
25 2022-08-06T07:12:28Z scikit-image/scikit-image#6458 open True
26 2022-08-07T06:14:46Z danforthcenter/plantcv#932 open True
27 2022-08-07T00:50:05Z aws/sagemaker-python-sdk#3286 closed True
28 2022-08-07T05:56:14Z cesium-ml/cesium#309 closed True
29 2022-07-05T01:40:15Z LCAV/pyroomacoustics#271 closed True
30 2022-06-27T02:46:23Z Pinafore/qb#107 open False
31 2022-06-29T02:33:54Z tensorflow/transform#280 open False
32 2022-06-29T05:18:34Z tensorflow/ranking#325 open False
33 2022-07-01T01:47:41Z google-research/google-research#1189 open False
34 2022-07-05T08:05:16Z bnpy/bnpy#42 open False
35 2022-07-05T08:23:56Z brainiak/brainiak#516 open False
36 2022-07-13T20:16:05Z LxMLS/lxmls-toolkit#176 open False
37 2022-06-26T22:09:38Z cornellius-gp/gpytorch#2049 closed False
38 2022-06-30T08:12:36Z lmcinnes/pynndescent#192 closed False
39 2022-07-05T05:41:14Z pyRiemann/pyRiemann#185 closed False
40 2022-07-11T06:56:15Z calico/basenji#125 closed False

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published