# GitHub data integration fetch - **final stage**

As part of the ** final stage ** relevant GitHub data is integrated for the tools using [pyGitHub](https://pygithub.readthedocs.io/en/stable/).

In this notebook the GitHub data is fetched from GitHub.


## Imports

In [None]:
from github import Github
import pandas as pd
from snippets import *
import time

## Create GitHub Instance

You need to configure the GITHUB_TOKEN

In [None]:
# Use a valid GitHub API access token
GITHUB_TOKEN=""

In [None]:

g = Github(GITHUB_TOKEN)

## Load tools from consolidated stage

Try to get GitHub data for all the tools present in consolidated tool table

In [None]:
df = pd.read_csv("data/04_consolidated/tools.csv")

## Fetch GitHub data for tools

Iterates over all tools and tries to fetch the GitHub project information considered as part of this thesis.
As the number of API request is limited for a certain time period. This might take multiple hours, as it is very naive and blocking implementation.

In [None]:
results=[]
errors = []

In [None]:
for id,repo_url in zip(df["id"].to_list(),df["repo_url"].to_list()):

    while g.rate_limiting[0]<50:
        time.sleep(600)
        print("wait for rate limit")
    is_github,pr_name = get_github_project_name_from_url(repo_url)
    #print(pr_name)
    if not is_github:
        continue
    pr_name_normalized=pr_name.replace("/","_")
    try:
        pr_info = get_project_infromation(g, pr_name)

        result= {"status":"okay",
                 "id":id,
                 "repo_url":repo_url,
                 "data":pr_info}
        
        results.append(result)

    except Exception as e:
        result= {"status":"error",
                 "id":id,
                 "repo_url":repo_url,
                 "data":e}
        results.append(result)
        print("ERROR:",pr_name)


### Save result of GitHub data integration

In [None]:
import pickle
with open('data/04_consolidated/github_scrape_results.pickle', 'wb') as handle:
    pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)