Skip to content

This repository contains the implementations of our experiments and our approach presented in the paper: CoNCRA: A Convolutional Neural Network Code Retrieval Approach

License

mrezende/concra

Repository files navigation

CoNCRA: A Convolutional Neural Network Code Retrieval Approach

This repository is the official implementation of CoNCRA: A Convolutional Neural Network Code Retrieval Approach.

Our source code its an adaptation of: https://github.com/codekansas/keras-language-modeling

We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA). Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language. We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow. Our preliminary results showed that our technique, which prioritizes local interactions (words nearby), improved the state-of-the-art (SOTA) by 5% on average, retrieving the most relevant code snippets in the top 3 (three) positions by almost 80% of the time.

Illustration of the joint embedding technique for code retrieval.

Requirements

We ran our experiments at Google colab. The notebooks and source code to run our models is available at notebooks folder.

Training

To train the model(s) in the paper, execute the following notebooks:

  • train_cnn_stack_over_flow_qa.ipynb
  • train_shared_cnn_stack_over_flow_qa.ipynb
  • train_unif_embedding_stack_over_flow_qa.ipynb

Evaluation

To evaluate my model on StaQC Dataset, run the following notebooks:

  • evaluate_best_stack_over_flow_qa.ipynb
  • evaluate_stack_over_flow_qa.ipynb

Pre-trained Models

You can download pretrained models here:

  • CoNCRA Model trained on StaQC Dataset using margin 0.05, 4000 filters and kernel size 2.

Results

Our model achieves the following performance on :

Model name MRR Top 1 Accuracy
CoNCRA Model 0.701 57.7%
Unif 0.675 53.9%
Embedding Model 0.637 49.3%

Cite

If you are using our code, please cite the following paper:

@inproceedings{de-rezende-martins-concra-2020,
author = {de Rezende Martins, Marcelo and Gerosa, Marco Aur\'{e}lio},
title = {CoNCRA: A Convolutional Neural Networks Code Retrieval Approach},
year = {2020},
isbn = {9781450387538},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3422392.3422462},
doi = {10.1145/3422392.3422462},
booktitle = {Proceedings of the 34th Brazilian Symposium on Software Engineering},
pages = {526–531},
numpages = {6},
keywords = {joint embedding, neural networks, code search},
location = {Natal, Brazil},
series = {SBES '20}
}

About

This repository contains the implementations of our experiments and our approach presented in the paper: CoNCRA: A Convolutional Neural Network Code Retrieval Approach

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published