Real-time collaborative repository recommendations based on GitHub stars.
Application shows related GitHub projects, by analysing GitHub stars.
- List of GitHub Repositories and Users (stored in PostgreSQL)
- List of starred Repositories of each User (stored in Redis)
We use similar approach to predictor, with important differences, among others:
- Instead of computing intersection of stars between given repository and all repositories related to it, similarities are computed massively using zunionstore Redis command.
- Similarities are computed and cached using Lua script executed directly in Redis instance.
- For repositories with thousand stars, a representative sample of 100-5000 users is taken.
- Employ optimizations by computations on sets of integers instead set of strings.
- Redis is used in 32bit mode and with increased shared integer pool to improve memory usage.
- The "popularity penalty factor" is used for discovering less popular repositories. The penalty factor can be provided by user.
- For real-time recommendations, ignoring users that have more than 1000 stars.
The similarity formula reads as follows:
|U(A)| ∩ |U(B)| S(A, B) = ------------------- |U(A)| + P * |U(B)|
A is subject repository,
B is related repository,
U(x) is set of users starring
x repository, and
P is a "popularity penalty factor" provided by user in UI.
The algorithm is implemented in redis_recommender.rb.
Algorithm is able to analyse hundreds of thousands of stars well under 1 second while maintaining memory usage less than 1GB on GitHub dataset. One Redis database with caching is enough for handling GitHub-size dataset.
Recommendation speed can be improved by introducing more Redis slaves.
- Ruby 2.1.0
- PostgreSQL 9.x
- Redis 3.0.0, preferably 32bit
- Ruby & Rails 4
- CoffeeScript & Angular 1.2
- Rails Assets
- SCSS, SLIM
Application requires Redis and PostgreSQL database dumps. They can be downloaded using
bin/download script. Please download only if you really need to test live data.
curl -o db/dump.rdb http://sheerun.net/dump.rdb curl -o db/dump.sql.gz http://sheerun.net/dump.sql.gz
You'll also need compiled redis instance in 32bit mode, and increased shared integer count:
#define REDIS_SHARED_INTEGERS 15000000
After your redis instance is up and running with downloaded
dump.rdb, and PostgreSQL with imported
dump.sql.gz, you can bundle application:
bundle install bin/rake db:create bin/rake db:migrate
You also need to create github application with callback set to:
.env file with following configuration:
Application and sidekiq worker can be started with:
We need help with following:
- Making recommendation engine even more performant
- Better front-end design and interaction (author is Ruby developer)
- Improvements in recommendation algorithm to get better suggestions
- Testing, fixing and maintaining application.
If you think you could help, please post issue or pull request on this repository.
This project is MIT-licensed.