Skip to content
MLonCode workshop
Jupyter Notebook Python Dockerfile Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
images
notebooks
.dockerignore
.gitignore
Dockerfile
Makefile
README.md
jupyter-notebook-config.json
jupyter-server-config.json
requirements-bigartm.txt
requirements-tf.txt
requirements.txt

README.md

Understand your code with Machine Learning

Workshop given at DevFest Nantes 2019.

Slides: on gDrive

OSS tools covered:

Abstract

Machine Learning on Source Code (MLonCode) is an emerging and exciting research domain which stands at the sweet spot between deep learning, natural language processing, social science, and programming.

During this 2 hours workshop, we are going to show you how to extract insights from code bases—step by step—by shedding light on those crucial aspects:

  • What information is available in your code
  • How to extract this information
  • What can you do with this knowledge: what are the tasks solvable by MLonCode
  • Which models can be used to solve them

To get our hands dirty, we will solve several example tasks, using source{d}, an open source stack to gain insights from codebases:

  • Suggest function names automatically
  • Cluster developers
  • Search projects by similarity

Prerequisites: a laptop with Docker installed. We will provide an image to all participants.

Slides: on gDrive

Prerequisites

  • Docker

Dependencies

Import Docker images (works offline):

docker load -i images/jupyter.tgz
docker load -i images/gitbase.tgz
docker load -i images/bblfshd-with-drivers.tgz

docker images

Run bblfsh

docker run \
    --detach \
    --rm \
    --name devfest_bblfshd \
    --privileged \
    --publish 9432:9432 \
    bblfsh/bblfshd:v2.15.0-drivers \
    --log-level DEBUG

Run gitbase

docker run \
    --detach \
    --rm \
    --name devfest_gitbase \
    --publish 3306:3306 \
    --link devfest_bblfshd:devfest_bblfshd \
    --env BBLFSH_ENDPOINT=devfest_bblfshd:9432 \
    --env MAX_MEMORY=1024 \
    --volume $(pwd)/repos/git-data:/opt/repos \
    srcd/gitbase:v0.24.0-rc2

Run the jupyter image

docker run \
    --rm \
    --name devfest_jupyter \
    --publish 8888:8888 \
    --link devfest_bblfshd:devfest_bblfshd \
    --link devfest_gitbase:devfest_gitbase \
    --volume $(pwd)/notebooks:/devfest/notebooks \
    --volume $(pwd)/repos:/devfest/repos \
    mloncode/devfest
With make

To build the workshop image and launch the 3 required containers

make build-and-run

To only launch the 3 required containers

make

Workflow

1. Download the data

We are going to use top 50 repositories from Apache Software Foundation though this workshop.

Notebook 1: data collection pipeline

2. Project and Developer Similarities

Build a vector model for projects and developers using Topic Modelling of code identifiers.

Notebook 2: project and developer similarities

3. Function Name Suggestion

Train a NMT seq2seq model for predicting method names based on identifiers in method bodies.

Notebook 2: function name suggestion

You can’t perform that action at this time.