Get repo recommendation based on your GitHub star history. (EoS)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
core
preprocessing Add data preprocessing Sep 11, 2017
website Use polyfill with CDN sources Sep 11, 2017
LICENSE.md Create LICENSE.md Sep 17, 2017
README.md Update alg testing status Dec 30, 2017

README.md

Octomender

Github Repo Recommender System.

Octomender = Octocat + Recommender

Get repo recommendation based on your GitHub star history.

[HELP] Algorithm Testing End of Service

The recommendation algorithm is deployed and being tested on octomend.com.

Visit octomend.com to help improve the recommendation.

End of Service since GitHub published "Discover Repositories" service.

Dependencies

  • redis: An in-memory database that persists on disk

Core

  • hireids: Minimalistic C client for Redis >= 1.2
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming

Preprocessing

Website

  • Flask: A microframework for Python based on Werkzeug, Jinja 2 and good intentions
  • GitHub-Flask: Flask extension for authenticating users with GitHub and making requests to the API
  • gunicorn: A Python WSGI HTTP Server for UNIX
  • google-cloud-datestore: Low-level Java and Python client libraries for Google Cloud Datastore

Dataset

Github Archive

Build Core

cd core; make

Preprocessing

parse.py

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

parse_mp.py

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

mergedata.py

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

graph2redis.py

Insert graph data into redis database.

Usage: graph2redis.py <input-edgelist> <redis-port>
Ex:    graph2redis.py data/2016-Q1.edge 6379

License

MIT