## 03 Challenge

- Extract all repositories from the GitHub API for a specific user, and calculate the total number of stars they received. 

- Use pagination when fetching the data from the API. 

- Load the result into a new table in the local PostgreSQL database.

Import Libraries and load environment variables

In [46]:
import requests
from psycopg2 import connect, sql
import os
from dotenv import load_dotenv #pip install python-dotenv
from os import environ as env
import pandas as pd

# Load environment variables
# Connection string stored in .env file
load_dotenv()
conn_string = os.getenv('conn_string')
github_token = os.getenv('github_token')
github_username = os.getenv('github_username')

if 'conn_string' in env:
    print(env['conn_string'][:35])

dbname='etl_bites' user='joemiller'


Request data from the github api for a specific user in this case 'freecodecamp'

In [30]:
def get_data_from_api(url):
    response = requests.get(url, auth=(github_username, github_token))
    return response.json()

github_url = "https://api.github.com/users/freecodecamp"

github_data = get_data_from_api(github_url)
github_data

{'login': 'freeCodeCamp',
 'id': 9892522,
 'node_id': 'MDEyOk9yZ2FuaXphdGlvbjk4OTI1MjI=',
 'avatar_url': 'https://avatars.githubusercontent.com/u/9892522?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/freeCodeCamp',
 'html_url': 'https://github.com/freeCodeCamp',
 'followers_url': 'https://api.github.com/users/freeCodeCamp/followers',
 'following_url': 'https://api.github.com/users/freeCodeCamp/following{/other_user}',
 'gists_url': 'https://api.github.com/users/freeCodeCamp/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/freeCodeCamp/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/freeCodeCamp/subscriptions',
 'organizations_url': 'https://api.github.com/users/freeCodeCamp/orgs',
 'repos_url': 'https://api.github.com/users/freeCodeCamp/repos',
 'events_url': 'https://api.github.com/users/freeCodeCamp/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/freeCodeCamp/received_events',
 'type': 'Organizatio

We can see thus user has 219 public repos

In [31]:
github_data['public_repos']

219

Lets take a look at the repos

In [47]:
def get_data_from_api(url):
    response = requests.get(url, auth=(github_username, github_token))
    return response.json()

repos_url = "https://api.github.com/users/freecodecamp/repos"

repos_url = get_data_from_api(repos_url)
repos_url

[{'id': 349435099,
  'node_id': 'MDEwOlJlcG9zaXRvcnkzNDk0MzUwOTk=',
  'name': '.github',
  'full_name': 'freeCodeCamp/.github',
  'private': False,
  'owner': {'login': 'freeCodeCamp',
   'id': 9892522,
   'node_id': 'MDEyOk9yZ2FuaXphdGlvbjk4OTI1MjI=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/9892522?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/freeCodeCamp',
   'html_url': 'https://github.com/freeCodeCamp',
   'followers_url': 'https://api.github.com/users/freeCodeCamp/followers',
   'following_url': 'https://api.github.com/users/freeCodeCamp/following{/other_user}',
   'gists_url': 'https://api.github.com/users/freeCodeCamp/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/freeCodeCamp/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/freeCodeCamp/subscriptions',
   'organizations_url': 'https://api.github.com/users/freeCodeCamp/orgs',
   'repos_url': 'https://api.github.com/users/freeCodeCamp/repos'

We see that when we request the repos we are limited to 30 due to pagination.

Each page we return has 30 repos

In [33]:
len(repos_url)

30

219 repos divided by 30 is 7.30, meaning we need to do 8 seperate api calls to retrieve all of the pages

In [56]:
219/30

7.3

Lets look at the 8th page of repos

In [68]:
params = {'page': 8}

def get_data_from_api(url):
    response = requests.get(url, auth=(github_username, github_token), params=params)
    return response.json()

repos_url = "https://api.github.com/users/freecodecamp/repos"

repos_url = get_data_from_api(repos_url)
repos_url

[{'id': 435589727,
  'node_id': 'R_kgDOGfaSXw',
  'name': 'top-contributor-tool',
  'full_name': 'freeCodeCamp/top-contributor-tool',
  'private': False,
  'owner': {'login': 'freeCodeCamp',
   'id': 9892522,
   'node_id': 'MDEyOk9yZ2FuaXphdGlvbjk4OTI1MjI=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/9892522?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/freeCodeCamp',
   'html_url': 'https://github.com/freeCodeCamp',
   'followers_url': 'https://api.github.com/users/freeCodeCamp/followers',
   'following_url': 'https://api.github.com/users/freeCodeCamp/following{/other_user}',
   'gists_url': 'https://api.github.com/users/freeCodeCamp/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/freeCodeCamp/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/freeCodeCamp/subscriptions',
   'organizations_url': 'https://api.github.com/users/freeCodeCamp/orgs',
   'repos_url': 'https://api.github.com/users/freeCodeCamp/

We can see this page only has 9 repos not 30 as this is the last page of the pagination

In [57]:
len(repos_url)

9

We want to get the number of stars for each repo so lets find the key/value pair containing the star info

In [59]:
for key in repos_url[0].keys():
    if 'star' in key:
        print(key)

stargazers_url
stargazers_count


In [70]:
repos_url[0]['stargazers_count']

5

We now loop through all of the repos in all of the pages and calculate the total number of stars for this user

In [76]:
def get_data_from_api(url):
    response = requests.get(url, auth=(github_username, github_token), params=params)
    return response.json()

repos_url = "https://api.github.com/users/freecodecamp/repos"


params = {'page': 1}
total_stars = 0

for page in range(1,9):
    repo_chunk = get_data_from_api(repos_url)
    for repo in repo_chunk:
        total_stars += repo['stargazers_count']
    params['page'] += 1

total_stars


# repos_url = get_data_from_api(repos_url)
# repos_url

436779

Lets create a new dataframe to store the results

In [86]:
data = {
    'user_name' : github_data['login'],
    'total_stars' : total_stars
}

print(data)

stars_df = pd.DataFrame([data])
stars_df

{'user_name': 'freeCodeCamp', 'total_stars': 436779}


Unnamed: 0,user_name,total_stars
0,freeCodeCamp,436779


We now want to upload the new dataframe to our SQL database.

First we must create a new sql table

In [88]:
# Create tables in analytical DB
# This could also be done manually via a GUI (e.g. TablePlus) or with a SQL script
def execute_query_postgresql(conn_string, query):
    with connect(conn_string) as conn:
        with conn.cursor() as cur:
            cur.execute(query)
            conn.commit()

create_api_data_table = '''
DROP TABLE IF EXISTS total_stars CASCADE;
CREATE TABLE total_stars (
user_name TEXT NOT NULL,
total_stars INTEGER NOT NULL
);
'''

execute_query_postgresql(conn_string, create_api_data_table)

In [89]:
%load_ext sql

In [90]:
%sql postgresql+psycopg2://joemiller:@localhost:5432/etl_bites

We now have an empty SQL table ready to be populated

In [91]:
%%sql

SELECT * FROM total_stars;

 * postgresql+psycopg2://joemiller:***@localhost:5432/etl_bites
0 rows affected.


user_name,total_stars


We now populate our SQL table with our results

In [93]:
def insert_data_to_postgresql(conn_string, table_name, data):
    with connect(conn_string) as conn:
        with conn.cursor() as cur:
            for row in data.index:
                query = sql.SQL("INSERT INTO {} (user_name, total_stars) VALUES (%s, %s)").format(sql.Identifier(table_name))
                cur.execute(query, (data['user_name'][row], int(data['total_stars'][row])))
        conn.commit()

table_name = "total_stars"
insert_data_to_postgresql(conn_string, table_name, stars_df)

We can now select all from our new SQL table and see the results have been added

In [94]:
%%sql

SELECT * FROM total_stars;

 * postgresql+psycopg2://joemiller:***@localhost:5432/etl_bites
1 rows affected.


user_name,total_stars
freeCodeCamp,436779
