Skip to content

Get repo recommendation based on your GitHub star history. (EoS)

License

Notifications You must be signed in to change notification settings

yilinjuang/Octomender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Octomender

Github Repo Recommender System.

Octomender = Octocat + Recommender

Get repo recommendation based on your GitHub star history.

The recommendation algorithm is deployed and being tested on octomend.com.

Visit octomend.com to help improve the recommendation.

End of Service since GitHub published "Discover Repositories" service.

Dependencies

  • redis: An in-memory database that persists on disk

Core

  • hireids: Minimalistic C client for Redis >= 1.2
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming

Preprocessing

Website

  • Flask: A microframework for Python based on Werkzeug, Jinja 2 and good intentions
  • GitHub-Flask: Flask extension for authenticating users with GitHub and making requests to the API
  • gunicorn: A Python WSGI HTTP Server for UNIX
  • google-cloud-datestore: Low-level Java and Python client libraries for Google Cloud Datastore

Dataset

Github Archive

Build Core

cd core; make

Preprocessing

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse_mp.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

Insert graph data into redis database.

Usage: graph2redis.py <input-edgelist> <redis-port>
Ex:    graph2redis.py data/2016-Q1.edge 6379

Thanks

importpython and reddit.

importpython

reddit

License

MIT