Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Octomender

Github Repo Recommender System.

Octomender = Octocat + Recommender

Get repo recommendation based on your GitHub star history.

[HELP] Algorithm Testing End of Service

The recommendation algorithm is deployed and being tested on octomend.com.

Visit octomend.com to help improve the recommendation.

End of Service since GitHub published "Discover Repositories" service.

Dependencies

  • redis: An in-memory database that persists on disk

Core

  • hireids: Minimalistic C client for Redis >= 1.2
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming

Preprocessing

Website

  • Flask: A microframework for Python based on Werkzeug, Jinja 2 and good intentions
  • GitHub-Flask: Flask extension for authenticating users with GitHub and making requests to the API
  • gunicorn: A Python WSGI HTTP Server for UNIX
  • google-cloud-datestore: Low-level Java and Python client libraries for Google Cloud Datastore

Dataset

Github Archive

Build Core

cd core; make

Preprocessing

parse.py

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

parse_mp.py

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse_mp.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

mergedata.py

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

graph2redis.py

Insert graph data into redis database.

Usage: graph2redis.py <input-edgelist> <redis-port>
Ex:    graph2redis.py data/2016-Q1.edge 6379

Thanks

importpython and reddit.

importpython

reddit

License

MIT

You can’t perform that action at this time.