## Simple GitHub Recommender System
In this notebook we will try to find new repos by looking at the stars from the original repo.
At first we will list users who starred the repo and the look at their starred repos. For simplicity we will print the most common repos.

Lets get started by importing our modules.

In [108]:
import json
import urllib.request, urllib.error, urllib.parse
import datetime
import time
import pandas as pd
import random

#### 1. Enter Token
Get your token from <a href='https://github.com/settings/tokens'>the github settings</a>

In [132]:
access_token = "xxx"

#### 2. Fetch users who starred the repo of choice
The API call goes through the pages of stargazers. It saves their name in a list. At the end the total list of Stargazers is returned.
The parameters are repo and limit. The repo will be the source repo which will be matched and the limit is the page limit.

In [91]:
#Source: fetching stargazers forked by https://github.com/minimaxir/get-profile-data-of-repo-stargazers

def get_stargazers(repo, limit = 4):
    page_number = 0
    stars_remaining = True
    list_stars = []
    print("Gathering Stargazers for %s..." % repo)
    while stars_remaining and page_number < limit:
        query_url = "https://api.github.com/repos/%s/stargazers?page=%s&access_token=%s" % (repo, page_number, access_token)

        req = urllib.request.Request(query_url)
        req.add_header('Accept', 'application/vnd.github.v3.star+json')
        response = urllib.request.urlopen(req)
        data = json.loads(response.read())

        for user in data:
            username = user['user']['login']
            list_stars.append(username)
            
        if len(data) < 25:
            stars_remaining = False

        page_number += 1

    print(("Done Gathering Stargazers for %s!" % repo))    

    list_stars = list(set(list_stars)) # remove dupes
    
    return list_stars

#### 3. Fetch repos the users starred
Same procedure but this time it create a new list of usernames and what they liked. Also we will shuffle our user sample.

In [116]:
def get_their_stars(lst,limit=50,rnd=True):
    print("Looking on %s for starred repos on their GitHub Profiles..." % str(len(lst)))
    allstars = []
    random.shuffle(lst)
    for username in lst[:limit]:
        try:
            query_url = "https://api.github.com/users/%s/starred?access_token=%s" % (username, access_token)

            req = urllib.request.Request(query_url)
            response = urllib.request.urlopen(req)
            data = json.loads(response.read())
            datalist = []
            for dat in data:
                allstars.append([username, dat['full_name']])
            time.sleep(1) # stay within API rate limit of 5000 requests / hour + buffer
        except Exception as e:
            time.sleep(1)
            print(e)
    return allstars

#### 4. Convert to a DataFrame and display most named repos
As said a super simple recommender. No blackbox or voodoo magic :)

In [129]:
def get_df(lst, num):
    df = pd.DataFrame(lst,columns=['user','repo'])
    d0 =  pd.DataFrame(df['repo'].value_counts().head(num),columns=['repo'])[1:]
    d0.columns = ['count']
    d0['repourl'] = 'https://github.com/' + d0.index.astype(str)
    return d0

#### 5. Run the script
Lets run it with a limit of 50 users :)
In my test it took 50 seconds (1 second for a user). It's very slow to make sure not hit the API rate limit.

In [118]:
stargazers = get_stargazers('facebook/prophet')
total_stars = get_their_stars(stargazers,40)
get_df(total_stars, 10)

Gathering Stargazers for facebook/prophet...
Done Gathering Stargazers for facebook/prophet!
Looking on 60 for starred repos on their GitHub Profiles...


Unnamed: 0,repo
facebook/prophet,12
dnouri/skorch,5
aksnzhy/xlearn,4
lmcinnes/umap,4
uber/pyro,4
ricequant/rqalpha,3
apple/turicreate,3
Cadene/pretrained-models.pytorch,3
quantopian/zipline,3
Microsoft/DMTK,3


Unnamed: 0,count,repourl
dnouri/skorch,5,https://github.com/dnouri/skorch
aksnzhy/xlearn,4,https://github.com/aksnzhy/xlearn
lmcinnes/umap,4,https://github.com/lmcinnes/umap
uber/pyro,4,https://github.com/uber/pyro
ricequant/rqalpha,3,https://github.com/ricequant/rqalpha
apple/turicreate,3,https://github.com/apple/turicreate
Cadene/pretrained-models.pytorch,3,https://github.com/Cadene/pretrained-models.py...
quantopian/zipline,3,https://github.com/quantopian/zipline
Microsoft/DMTK,3,https://github.com/Microsoft/DMTK


# 🎉🎉🎉🎉