Github search

Run using run script

Assumes that elasticsearch is already running.

Install dependencies:

Make sure Oracle JDK (version >= 8) is installed.
Install python dependencies:

pip install -r requirements.txt
Run the script and specify number of minutes to crawl (1 is enough for testing). The script will run the crawler if the directory download_repo is not present, run the indexer and finally launch the interface.

./run_all 1

Run individual components

The crawler and indexer use Python 3. Source files are located in src/

Install python dependencies with:

pip install -r requirements.txt

The interface uses Java with JavaFX. Source files are located in src/GithubSearchInterface/

Crawler

python -m src.Crawler

Indexer

python -m src.Indexer

To evaluate

First run RelevanceScoring.py to manually rank documents with:

python -m src.RelevanceScoring

Enter the search phrase, eg "quick sort" to be evaluated and then manually rank the documents based upon percieved relevance.

The results will be saved in the folder evaluation_results/relevance_scoring_results/, in the format:DOC_ID,RANK.

To use elasticsearch to use the relevance scores for evaluation run:

python -m src.Evaluater

This will read the file content of ./evaluation_results/relevance_scoring_results/ and save the json response to evaluation_results/

Interface

Source files in src/GithubSearchInterface/

Java 8 with JavaFX is used.

Dependencies:

The dependencies are included in this repository but they were downloaded from the following places.

richtextfx-0.9.0.jar
org.json.jar (zipped)

Runing the interface

Compiling and running on Windows

cd src\GithubSearchInterface\
if not exist classes mkdir classes
javac -cp "imports/*" -d ./classes ./src/dd2476/project/*.java
java -cp "classes;imports/*;src" dd2476.project.Main

Running Jar-file on Linux

java -jar ./src/GithubSearchInterface/out/artifacts/GithubSearchInterface/GithubSearchInterface.jar

OS X

work in progress

Elasticsearch

To download and extract:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz.sha512
shasum -a 512 -c elasticsearch-6.2.4.tar.gz.sha512 
tar -xzf elasticsearch-6.2.4.tar.gz

To start:

cd elasticsearch-6.2.4/bin
./elasticsearch

To delete the index (in case you want to re-index):

rm -rf elasticsearch-6.2.4/data/nodes/0

Project Specification

Crawl (a part of) the publicly available GitHub code.
Filter out one programming language that you feel comfortable with. (Java)
Process the files and separate class names, method names, modifiers (for example public, private, static, final etc.), variable names – things that you may want to search and filter!
Index it into elasticsearch (https://github.com/elastic/elasticsearch), or another search engine of your choice.
Create an interface where you can search and filter methods or classes based on the metadata you have created.
A sample query could be methodName:quicksort AND returnType:List i i.e. search for quicksort, and filter by methods with returnType List. What would you want to search for?

Additional criteria, 2 needed for C, 3 needed for B etc

The solution to the problem is novel in some respect (i.e., it has not been published before in a book, report, article or paper). A novel combination of known techniques is fine.
The results are evaluated, preferably on realistic data, preferably using methods from the literature.
The poster presentation is clear and understandable to another student who has not read the report or references in it.
The report is clear, complete, technically correct, and written in grammatically correct English.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.idea		.idea
evaluation_results		evaluation_results
res/tests		res/tests
src		src
.gitignore		.gitignore
GithubSearchInterface.jar		GithubSearchInterface.jar
README.md		README.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh
run_gui.sh		run_gui.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Github search

Run using run script

Install dependencies:

Run individual components

Crawler

Indexer

To evaluate

Interface

Dependencies:

Runing the interface

Compiling and running on Windows

Running Jar-file on Linux

OS X

Elasticsearch

Project Specification

Additional criteria, 2 needed for C, 3 needed for B etc

About

Releases

Packages

Languages

oeng/DD2476_Github_Search

Folders and files

Latest commit

History

Repository files navigation

Github search

Run using run script

Install dependencies:

Run individual components

Crawler

Indexer

To evaluate

Interface

Dependencies:

Runing the interface

Compiling and running on Windows

Running Jar-file on Linux

OS X

Elasticsearch

Project Specification

Additional criteria, 2 needed for C, 3 needed for B etc

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages