JuICe Dataset

Code for the paper. This repository produces the dataset from the collected Jupyter notebooks. The dataset is available here if you don't want to run this pipeline. Modeling code is here.

Leaderboard

Model	Dev Bleu	Dev EM	Test Bleu	Test EM
LSTM Baseline	21.66	5.57	20.92	5.71

Download notebooks

Get the notebooks here.

Setup

conda create -n {name} python=3.6
conda activate {name}

pip install -r requirements.txt

# decompress the downloaded notebooks file
unzip juice_notebooks.zip

Running pipelines

Produces the dataset. The pipeline requires around 254gb of disk space, and takes about 12 hours to complete on a 12 core machine.

./run_all {downloaded notebooks directory} {pipeline directory}

The datasets will be created under {pipeline directory}/final-dataset

Dataset Format

Each dataset record contains the following keys:

code_tokens_clean: The tokenized target code to generate.

context: A list of cells above starting with the cell directly above. If the cell is markdown the key nl will store the tokenized mardown. If it's a code cell the code_tokens_clean will store the tokenized code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
jupyter		jupyter
README.md		README.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JuICe Dataset

Leaderboard

Download notebooks

Setup

Running pipelines

Dataset Format

About

Releases

Packages

Languages

rajasagashe/JuICe

Folders and files

Latest commit

History

Repository files navigation

JuICe Dataset

Leaderboard

Download notebooks

Setup

Running pipelines

Dataset Format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages