This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?.

Run following commands to build a docker image for the environment:
cd docker
sudo docker build -t oscar:latest .And you can launch a container with nvidia-docker command.
sudo nvidia-docker run -it --shm-size=100g --mount type=bind,source="$(pwd)",target=/oscar oscar:latestTo compile the binaries for processing the data:
cd /oscar/bin
makeThen the OSCAR LLVM analyzer pass (located in analyzer), IR Lexer (located in irlexer), and FastBPE (located in fastBPE) will be compiled.
First, please visit https://1drv.ms/f/c/0cb8cc76ec823036/EjYwgux2zLgggAwIBAAAAAAB7290YKG35xKA4FX5_eqvEg?e=b5XdV7 to download the data for pretraining and downstream tasks. Extract the downloaded tarballs to the data-raw directory.
To process the data for pretraining and the downstream tasks, enter the coressponding directories and execute ./process.sh. Raw data needs to be placed in the directory data-raw. Processed data will be placed in the directory data-bin.
Use following commands to pretrain the model:
cd /oscar/model
./scripts/pretrain.shFor downstream tasks the procedure is similar.