What is this?

This library turns java source code (.java files) into Augmented ASTs (.gml (graphml) files) as per the paper Open Vocabulary Learning on Source Code with a Graph-Structured Cache.

More specifically, you list the names of any java repos from the Maven Repository that you'd like to convert into a dataset, and then this library will automatically download those repos and generate Augmented ASTs of all their constituent files, one .gml file per .java file.

How do I install it?

You'll need Apache Maven installed. (And the basic linux command line utilities.)

Then run

cd <root directory of this repo>
mvn install -DskipTests
cd <root directory of this repo>/javaparser-dloc
mvn install -DskipTests

How do I use it?

1. Create list of maven repositories

There is a file called repositories.txt in javaparser-dloc/scripts. You should change this file to contain whatever repo names from the Maven Repository that you'd like to process into datapoints. The format is one repo per line, each line reading <org name>:<repo name>:<version number>. At the moment, repositories.txt contains the names of the 18 Maven repos used in the Deep Learning On Code With A Graph Vocabulary.

Once you've edited repositories.txt, run the createDatasets.sh script as follows:

export dataset=<path to where you'd like the dataset to go>
cat repositories.txt | xargs -I{} <root directory of this repo>/javaparser-dloc/scripts/createDatasets.sh {} $dataset

2. Process all files in all repositories

Now that you've downloaded and built the repos, process them all into graphml-formatted files:

ls $dataset | xargs -I{} <root directory of this repo>/javaparser-dloc/scripts/processDataset.sh {} $dataset

Questions?

Feel free to get in touch with Milan Cvitkovic or any of the other paper authors. We'd love to hear from you!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dev-files		dev-files
javaparser-core-generators		javaparser-core-generators
javaparser-core		javaparser-core
javaparser-dloc		javaparser-dloc
javaparser-metamodel-generator		javaparser-metamodel-generator
javaparser-symbol-solver-core		javaparser-symbol-solver-core
javaparser-symbol-solver-logic		javaparser-symbol-solver-logic
javaparser-symbol-solver-model		javaparser-symbol-solver-model
javaparser-symbol-solver-testing		javaparser-symbol-solver-testing
javaparser-testing		javaparser-testing
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.APACHE		LICENSE.APACHE
LICENSE.GPL		LICENSE.GPL
LICENSE.LGPL		LICENSE.LGPL
README.md		README.md
appveyor.yml		appveyor.yml
changelog.md		changelog.md
pom.xml		pom.xml
run_core_generators.sh		run_core_generators.sh
run_metamodel_generator.sh		run_metamodel_generator.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

What is this?

How do I install it?

How do I use it?

1. Create list of maven repositories

2. Process all files in all repositories

Questions?

About

Licenses found

Releases

Packages

Languages

License

Licenses found

mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor

Folders and files

Latest commit

History

Repository files navigation

What is this?

How do I install it?

How do I use it?

1. Create list of maven repositories

2. Process all files in all repositories

Questions?

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages