Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Github Social Code Climate Build Status

Real-time collaborative repository recommendations based on GitHub stars.


Application shows related GitHub projects, by analysing GitHub stars.

Application is using offline data that is updated continously from GitHub API. The seed database has been extracted from Github Archive, and GH Torrent websites. Specifically:

  • List of GitHub Repositories and Users (stored in PostgreSQL)
  • List of starred Repositories of each User (stored in Redis)

Used algorithm

Application is using Memory-based, Item-based Collaborative Filtering algorithm using modifier Sørensen–Dice coefficient for detecting similarity between given two repositories.

We use similar approach to predictor, with important differences, among others:

  • Instead of computing intersection of stars between given repository and all repositories related to it, similarities are computed massively using zunionstore Redis command.
  • Similarities are computed and cached using Lua script executed directly in Redis instance.
  • For repositories with thousand stars, a representative sample of 100-5000 users is taken.
  • Employ optimizations by computations on sets of integers instead set of strings.
  • Redis is used in 32bit mode and with increased shared integer pool to improve memory usage.
  • The "popularity penalty factor" is used for discovering less popular repositories. The penalty factor can be provided by user.
  • For real-time recommendations, ignoring users that have more than 1000 stars.

The similarity formula reads as follows:

            |U(A)| ∩ |U(B)|
S(A, B) = -------------------
          |U(A)| + P * |U(B)|

Where A is subject repository, B is related repository, U(x) is set of users starring x repository, and P is a "popularity penalty factor" provided by user in UI.

The algorithm is implemented in redis_recommender.rb.


Algorithm is able to analyse hundreds of thousands of stars well under 1 second while maintaining memory usage less than 1GB on GitHub dataset. One Redis database with caching is enough for handling GitHub-size dataset.

Recommendation speed can be improved by introducing more Redis slaves.


  • Ruby 2.1.0
  • PostgreSQL 9.x
  • Redis 3.0.0, preferably 32bit


  • Ruby & Rails 4
  • CoffeeScript & Angular 1.2
  • Rails Assets
  • Sidekiq

Production installation

Application requires Redis and PostgreSQL database dumps. They can be downloaded using bin/download script. Please download only if you really need to test live data.

curl -o db/dump.rdb
curl -o db/dump.sql.gz

You'll also need compiled redis instance in 32bit mode, and increased shared integer count:

#define REDIS_SHARED_INTEGERS 15000000
make 32bit

After your redis instance is up and running with downloaded dump.rdb, and PostgreSQL with imported dump.sql.gz, you can bundle application:

bundle install
bin/rake db:create
bin/rake db:migrate

You also need to create github application with callback set to:


And add .env file with following configuration:


Application and sidekiq worker can be started with:

bin/foreman start


We need help with following:

  1. Making recommendation engine even more performant
  2. Better front-end design and interaction (author is Ruby developer)
  3. Improvements in recommendation algorithm to get better suggestions
  4. Testing, fixing and maintaining application.

If you think you could help, please post issue or pull request on this repository.


This project is MIT-licensed.


Collaborative repository recommendations based on GitHub stars




No releases published


No packages published