# Entregable # 1

El primer entregable para CODER HOUSE debe extraer datos de una API publica y crear una tabla en Redshift. Una vez creada la tabla, se debe cargar los datos en Redshift.

![arquitectura_propuesta](images/_home_marm1984_github_data_engineering_coder_house_entregable_uno_entrega_1.png)


In [1]:
from colorama import Back, Fore, Style
import pandas as pd
import requests
import psycopg2
import random
import os

In [2]:
# Function that requests over HTTP a JSON object from a given URL

JSON = int | str | float | bool | None | dict[str, "JSON"] | list["JSON"]
JSONObject = dict[str, JSON]

def http_get_sync(url: str) -> JSONObject:
    """Synchronously performs an HTTP GET request and returns the JSON response."""
    try :
        print(Back.BLACK + Fore.CYAN + "GET: " + url + Style.RESET_ALL)
        return requests.get(url).json()
    except:
        print(Back.BLACK + Fore.RED + "ERROR: " + url + Style.RESET_ALL)
        return {}

In [3]:
# The limit parameters for the API calls

API_KEY = os.environ.get("NAPSTER_API_KEY")
OFFSET = 1000  # Number of artists to seed in db

In [4]:
# The URL for the API call

# artist_page_offset = 116 # johnny-cash 212 Bob Dylan
artist_page_offset = random.randint(0, OFFSET)
napster_url = f'https://napi-v2-2-cloud-run-b3gtd5nmxq-uw.a.run.app/v2.2/artists/top{API_KEY}&limit=1&offset={artist_page_offset}'

In [5]:
# Get the JSON object from the URL

napster_json = http_get_sync(napster_url)

[40m[36mGET: https://napi-v2-2-cloud-run-b3gtd5nmxq-uw.a.run.app/v2.2/artists/top?apikey=MjZkYmFhZTctMjFkZi00NjY3LWEwNGMtZDYzNmQ4YmM3OThi&limit=1&offset=890[0m


In [6]:
napster_json

{'artists': [{'type': 'artist',
   'id': 'art.62824078',
   'href': 'https://api.napster.com/v2.2/artists/art.62824078',
   'name': 'Lukas Graham',
   'shortcut': 'lukas-graham',
   'blurbs': [],
   'bios': [{'title': 'Bebop Digital',
     'author': 'Bebop Digital',
     'publishDate': '',
     'bio': 'El  grupo danés de sonoridad soul, funk, rock y pop, es liderado por el vocalista Lukas Graham Forchhammer, cuenta con Mark “Lovestick” Falgren en la batería, Magnus Larsson en el bajo y Kasper Daugaard en los teclados. La trayectoria inició de forma natural en 2011, lanzando dos músicas en YouTube. La repercusión fue tan buena que dio origen a una gira con más de 30 mil ingresos vendidos, eso antes, hasta del lanzamiento del primer álbum. <I>Lukas Graham </I>  salió en 2012 y tuvo todo su éxito en su país natal, con los hits “Drunk in the Morning”, “Ordinary Things” y “Better Than Yourself (Criminal Mind Pt.2)”. El trabajo alcanzó una buena repercusión también en Alemania, Noruega y Sue

In [7]:
# Get all the keys inside artist object
for key in napster_json['artists'][0].keys():
    print(key)

type
id
href
name
shortcut
blurbs
bios
albumGroups
links


In [8]:
# Check shortcut for artist name
napster_json['artists'][0]['shortcut']

'lukas-graham'

# Creating Table

The table for the artist, creates an Artist. Each artist has an ID, type, blurbs, name, hred and shorcut.

## Artist Table 

![Artist Table](images/artist_table.png)

# Data Structures

id = string
type = string
href = string
name = string
shortcut = string


In [9]:
# Import environment variables

host = os.environ['HOST']
port = os.environ['PORT']
user = os.environ['USER']
password = os.environ['PASSWORD']
database = os.environ['DATABASE']

In [10]:
# Create a connection to the database

try:
    conn = psycopg2.connect(
        host=host,
        port=port,
        user=user,
        password=password,
        database=database
    )
    print(Back.BLACK + Fore.GREEN + "SUCCESS: Connection to database" + Style.RESET_ALL)
except:
    print(Back.BLACK + Fore.RED + "ERROR: Connection to database" + Style.RESET_ALL)

[40m[32mSUCCESS: Connection to database[0m


In [11]:
# Create a cursor to execute SQL commands

try:
    cur = conn.cursor()
    print(Back.BLACK + Fore.GREEN + "SUCCESS: Cursor created" + Style.RESET_ALL)
except:
    print(Back.BLACK + Fore.RED + "ERROR: Cursor not created" + Style.RESET_ALL)

[40m[32mSUCCESS: Cursor created[0m


In [12]:
# Check the connection
personal_schema = os.environ['PERSONAL_SCHEMA']

#  Check schema exists from CODER
cur.execute(f"SELECT * FROM information_schema.tables WHERE table_schema = '{personal_schema}';")
print(cur.fetchall())

[('data-engineer-database', 'marm1984_coderhouse', 'artist_napster', 'BASE TABLE', None, None, None, None, None), ('data-engineer-database', 'marm1984_coderhouse', 'artist', 'BASE TABLE', None, None, None, None, None)]


In [13]:
# Drop table if exists

cur.execute(f"DROP TABLE IF EXISTS {personal_schema}.artist;")

In [14]:
# Create a table for artists

try:
    cur.execute(f"""
                CREATE TABLE IF NOT EXISTS {personal_schema}.artist (
                    id VARCHAR(255) PRIMARY KEY,
                    name VARCHAR(255),
                    shortcut VARCHAR(255),
                    url VARCHAR(255),
                    type VARCHAR(255)
                );
                """)
    print(Back.BLACK + Fore.GREEN + "SUCCESS: Table created" + Style.RESET_ALL)
except:
    print(Back.BLACK + Fore.RED + "ERROR: Table not created" + Style.RESET_ALL)
    
                

[40m[32mSUCCESS: Table created[0m


In [15]:
# Create dataframe from JSON object

df_artist = pd.DataFrame.from_dict(napster_json['artists'])
df_artist.head()

Unnamed: 0,type,id,href,name,shortcut,blurbs,bios,albumGroups,links
0,artist,art.62824078,https://api.napster.com/v2.2/artists/art.62824078,Lukas Graham,lukas-graham,[],"[{'title': 'Bebop Digital', 'author': 'Bebop D...","{'compilations': ['alb.754287876', 'alb.707295...",{'albums': {'href': 'https://api.napster.com/v...


In [16]:
# Create a list of columns to drop

columns_table = ['id', 'name', 'shortcut', 'href', 'type']

# Create a list of columns to keep
for column in df_artist.columns:
    if column not in columns_table:
        df_artist.drop(column, axis=1, inplace=True)

In [17]:
df_artist.head()

Unnamed: 0,type,id,href,name,shortcut
0,artist,art.62824078,https://api.napster.com/v2.2/artists/art.62824078,Lukas Graham,lukas-graham


In [18]:
# Deliver the dataframe to the database

try:
    for index, row in df_artist.iterrows():
        cur.execute(f"""
                    INSERT INTO {personal_schema}.artist (id, name, shortcut, url, type)
                    VALUES ('{row['id']}', '{row['name']}', '{row['shortcut']}', '{row['href']}', '{row['type']}');
                    """)
        conn.commit()
        print(Back.BLACK + Fore.GREEN + "SUCCESS: Dataframe delivered to database" + Style.RESET_ALL)
except Exception as e:
    print(e)
    print(Back.BLACK + Fore.RED + "ERROR: Dataframe not delivered to database" + Style.RESET_ALL)

[40m[32mSUCCESS: Dataframe delivered to database[0m


In [19]:
# Test and bring back the data from the database

try:
    cur.execute(f"SELECT * FROM {personal_schema}.artist;")
    print(Back.BLACK + Fore.GREEN + "SUCCESS: Dataframe delivered to database" + Style.RESET_ALL)
    for row in cur.fetchall():
        print(row)
except:
    print(Back.BLACK + Fore.RED + "ERROR: Dataframe not delivered to database" + Style.RESET_ALL)

[40m[32mSUCCESS: Dataframe delivered to database[0m
('art.62824078', 'Lukas Graham', 'lukas-graham', 'https://api.napster.com/v2.2/artists/art.62824078', 'artist')
