Assumes that elasticsearch is already running.
-
Make sure Oracle JDK (version >= 8) is installed.
-
Install python dependencies:
pip install -r requirements.txt
-
Run the script and specify number of minutes to crawl (1 is enough for testing). The script will run the crawler if the directory
download_repo
is not present, run the indexer and finally launch the interface.
./run_all 1
The crawler and indexer use Python 3. Source files are located in src/
Install python dependencies with:
pip install -r requirements.txt
The interface uses Java with JavaFX. Source files are located in src/GithubSearchInterface/
python -m src.Crawler
python -m src.Indexer
First run RelevanceScoring.py to manually rank documents with:
python -m src.RelevanceScoring
Enter the search phrase, eg "quick sort" to be evaluated and then manually rank the documents based upon percieved relevance.
The results will be saved in the folder evaluation_results/relevance_scoring_results/
, in the format:DOC_ID,RANK
.
To use elasticsearch to use the relevance scores for evaluation run:
python -m src.Evaluater
This will read the file content of ./evaluation_results/relevance_scoring_results/
and save the json response to evaluation_results/
Source files in src/GithubSearchInterface/
Java 8 with JavaFX is used.
The dependencies are included in this repository but they were downloaded from the following places.
- richtextfx-0.9.0.jar
- org.json.jar (zipped)
cd src\GithubSearchInterface\
if not exist classes mkdir classes
javac -cp "imports/*" -d ./classes ./src/dd2476/project/*.java
java -cp "classes;imports/*;src" dd2476.project.Main
java -jar ./src/GithubSearchInterface/out/artifacts/GithubSearchInterface/GithubSearchInterface.jar
work in progress
To download and extract:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz.sha512
shasum -a 512 -c elasticsearch-6.2.4.tar.gz.sha512
tar -xzf elasticsearch-6.2.4.tar.gz
To start:
cd elasticsearch-6.2.4/bin
./elasticsearch
To delete the index (in case you want to re-index):
rm -rf elasticsearch-6.2.4/data/nodes/0
- Crawl (a part of) the publicly available GitHub code.
- Filter out one programming language that you feel comfortable with. (Java)
- Process the files and separate class names, method names, modifiers (for example public, private, static, final etc.), variable names – things that you may want to search and filter!
- Index it into elasticsearch (https://github.com/elastic/elasticsearch), or another search engine of your choice.
- Create an interface where you can search and filter methods or classes based on the metadata you have created.
- A sample query could be methodName:quicksort AND returnType:List i i.e. search for quicksort, and filter by methods with returnType List. What would you want to search for?
- The solution to the problem is novel in some respect (i.e., it has not been published before in a book, report, article or paper). A novel combination of known techniques is fine.
- The results are evaluated, preferably on realistic data, preferably using methods from the literature.
- The poster presentation is clear and understandable to another student who has not read the report or references in it.
- The report is clear, complete, technically correct, and written in grammatically correct English.