## Challenge

The `JSONPlaceholder` API does not support pagination out of the box, but you could
simulate pagination by slicing the results you receive from the API (e.g.
limiting the page size to X number of posts.)

However that would not be a great challenge as it is not something you would
typically do this way. It is a useful exercise though if you'd like to give it a
go at some point to play with some Python algorithms!

We will use an alternative API, the GitHub API, which supports true pagination
through query parameters.

**Extract all repositories from the GitHub API for a specific user, and
calculate the total number of stars they received. Use pagination when fetching
the data from the API. Load the result into a new table in the local PostgreSQL
database.**

<details>
  <summary>Hint!</summary>

  To fetch paginated data from the GitHub API, use the page and per_page query
  parameters in the API request.
  
  <details>
    <summary>Another hint!</summary>

    Loop through the pages until there are no more repositories to fetch. Sum the
    stargazers_count attribute for all repositories to calculate the total number
    of stars. Load the results into a new table in the local PostgreSQL database.
  </details>
</details>


In [35]:
import requests
from psycopg2 import connect, sql

# Configure your PostgreSQL connection string
conn_string = "dbname='etl_bites' user='jackdench' host='localhost' port='5432'"

# The initial URL sets the number of repos per page to 50 and requests the first page
repos_url = "https://api.github.com/users/makersacademy/repos?page=1&per_page=50"

def get_data_from_api(url):
    repo_data = []
    while url:
        # Request API data, convert result to JSON format and then add each JSON item to the repo_data list
        result = requests.get(url)
        json_data = result.json()
        repo_data.extend(json_data)
        # Check if there's a 'Link' header for pagination
        if 'Link' in result.headers:
            links = result.headers['Link'].split(', ')
            next_page_url = None
            for link in links:
                if 'rel="next"' in link:
                    next_page_url = link.split('; ')[0][1:-1]
                    break
            
            # Update the URL for the next page, if available
            url = next_page_url
        else:
            # If there's no 'Link' header, end the loop
            url = None
    
    return repo_data

repo_data = get_data_from_api(repos_url)



In [36]:
# Function to count the total number of stars across all repos for this user
def count_stars(data):
    stars_count = 0
    for repo in data:
        repo_stars = repo['stargazers_count']
        stars_count += repo_stars
    return stars_count

In [37]:
# Create tables in analytical DB
# This could also be done manually via a GUI (e.g. TablePlus) or with a SQL script
def execute_query_postgresql(conn_string, query):
    with connect(conn_string) as conn:
        with conn.cursor() as cur:
            cur.execute(query)
            conn.commit()

create_api_data_table = '''
CREATE TABLE IF NOT EXISTS repo_stars_count (
    owner_name TEXT,
    repo_count INTEGER,
    repo_stars_count INTEGER
);
'''

execute_query_postgresql(conn_string, create_api_data_table)

In [38]:
def insert_data_to_postgresql(conn_string, table_data):
    with connect(conn_string) as conn:
        with conn.cursor() as cur:
            query = sql.SQL("INSERT INTO {} (owner_name, repo_count, repo_stars_count) VALUES (%s, %s, %s)").format(sql.Identifier(table_data['table_name']))
            cur.execute(query, (table_data['owner_name'], table_data['repo_count'], table_data['star_count']))
        conn.commit()

table_name = 'repo_stars_count'
owner_name = repo_data[0]['owner']['login']
repo_count = len(repo_data)
star_count = count_stars(repo_data)
table_data = {'table_name': table_name, 'owner_name': owner_name, 'repo_count': repo_count, 'star_count': star_count}
insert_data_to_postgresql(conn_string, table_data)