# Retention Redshift Insert (API Method)
* StelllarAlgo Data Science
* Ryan Kazmerik & Nakisa Rad
* Mar 7, 2022

This notebook provides example code of how to execute CRUD operations against a RedShift database using the RedShift data API. The data and databases used are for demonstration purposes only:

In [1]:
import boto3
import pandas as pd
import awswrangler as wr

### Let's create a dummy dataset to write into our RedShift database:

In [37]:
players = [
    {"dob":"1988-08-04","gamesplayed":"20","injured":"false","position":"RW","name":"Dale","numassists":"24","numgoals":"21","pointpercentage":"2.1"},
    {"dob":"1985-06-05","gamesplayed":"20","injured":"false","position":"C","name":"Skip","numassists":"15","numgoals":"36","pointpercentage":"2.5"},
    {"dob":"1985-03-15","gamesplayed":"15","injured":"true","position":"LW","name":"Sanders","numassists":"20","numgoals":"30","pointpercentage":"1.9"},
    {"dob":"1983-02-20","gamesplayed":"20","injured":"false","position":"LD","name":"Patty","numassists":"38","numgoals":"12","pointpercentage":"1.5"},
    {"dob":"1987-08-04","gamesplayed":"18","injured":"false","position":"RD","name":"Reynolds","numassists":"16","numgoals":"6","pointpercentage":"0.8"}
]

df_players = pd.DataFrame(data=players)

df_players.head()

Unnamed: 0,dob,gamesplayed,injured,position,name,numassists,numgoals,pointpercentage
0,1988-08-04,20,False,RW,Dale,24,21,2.1
1,1985-06-05,20,False,C,Skip,15,36,2.5
2,1985-03-15,15,True,LW,Sanders,20,30,1.9
3,1983-02-20,20,False,LD,Patty,38,12,1.5
4,1987-08-04,18,False,RD,Reynolds,16,6,0.8


### To connect to RedShift, we have to tell AWS which profile we'd like to login to, this will send us for a browser authentication trip:

In [29]:
! aws sso login --profile Stellaralgo-DataScienceAdmin

Attempting to automatically open the SSO authorization page in your default browser.
If the browser does not open or you wish to use a different device to authorize this request, open the following URL:

https://device.sso.us-east-1.amazonaws.com/

Then enter the code:

DMZB-PXLG
Successully logged into Start URL: https://stellaralgo.awsapps.com/start


### Now we can create a session and client to RedShift (QA), and create a new connection using AWS wrangler:

In [4]:
session = boto3.setup_default_session(profile_name='Stellaralgo-DataScienceAdmin')
client = boto3.client('redshift')

dbname = 'datascience'
schema = 'ds'
table = 'dummytable'
    
conn = wr.data_api.redshift.connect(
    cluster_id = "qa-app",
    database = dbname,
    db_user = "admin"
)

print("CREDENTIALS RETRIEVED SUCCESSFULLY!")

CREDENTIALS RETRIEVED SUCCESSFULLY!


### Let's insert our dataframe of team members into the dummy table:

In [38]:
print(f"INSERTING {len(df_players)} PLAYERS INTO DUMMY TABLE:")

fields = f"""
    INSERT INTO {dbname}.{schema}.{table} (
        dob,
        gamesplayed,
        injured,
        position,
        name,
        numassists,
        numgoals,
        pointpercentage
    ) VALUES """


values_list = []
for i, player in df_players.iterrows():
    
    values = f"""(
        '{player["dob"]}',
        {player["gamesplayed"]},
        {player["injured"]},
        '{player["position"]}',
        '{player["name"]}',
        {player["numassists"]},
        {player["numgoals"]},
        {player["pointpercentage"]}
    )"""
    
    values_list.append(values)
    print(f" > ADDED PLAYER {i+1} TO ROSTER")

insert_statement = fields + ",".join(values_list)+";"
 
wr.data_api.redshift.read_sql_query(
    sql = insert_statement, 
    con = conn
)
    
print(f"INSERTED {len(df_players)} PLAYERS INTO: {dbname}.{schema}.{table}")

INSERTING 5 PLAYERS INTO DUMMY TABLE:
 > ADDED PLAYER 1 TO ROSTER
 > ADDED PLAYER 2 TO ROSTER
 > ADDED PLAYER 3 TO ROSTER
 > ADDED PLAYER 4 TO ROSTER
 > ADDED PLAYER 5 TO ROSTER
INSERTED 5 INTO: datascience.ds.dummytable


### Now we can query the table directly to get back our records, let's just get back players who are not injured:

In [39]:
select_statement = f"""
    SELECT *
    FROM {dbname}.{schema}.{table}
    WHERE injured = False
"""

df_healthy = wr.data_api.redshift.read_sql_query(
    sql = select_statement, 
    con = conn
)

print(f"HEALTHY PLAYERS: {df_healthy.shape[0]}")
df_healthy.head()

HEALTHY PLAYERS: 4


Unnamed: 0,playerid,dob,gamesplayed,injured,position,name,numassists,numgoals,pointpercentage
0,1,1988-08-04,20,False,RW,Dale,24,21,2.1
1,2,1985-06-05,20,False,C,Skip,15,36,2.5
2,4,1983-02-20,20,False,LD,Patty,38,12,1.5
3,5,1987-08-04,18,False,RD,Reynolds,16,6,0.8


### Now let's update the player stats (goals, assists & points percentage):

In [45]:
update_statement = f"""
    UPDATE {dbname}.{schema}.{table}
    SET gamesplayed = gamesplayed + 1
"""

wr.data_api.redshift.read_sql_query(
    sql = update_statement, 
    con = conn
)

print(f"UPDATED {len(df_players)} PLAYERS IN: {dbname}.{schema}.{table}")

UPDATED 5 PLAYERS IN: datascience.ds.dummytable


### We can also delete all of the players from our table:

In [32]:
delete_statement = f"""
    DELETE 
    FROM {dbname}.{schema}.{table}
"""

wr.data_api.redshift.read_sql_query(
    sql = delete_statement, 
    con = conn
)

### Let's query the entire table again to see if it's empty:

In [46]:
sql_statement = f"""
    SELECT *
    FROM {dbname}.{schema}.{table}
"""

df_result = wr.data_api.redshift.read_sql_query(
    sql = sql_statement, 
    con = conn
)

print(f"TOTAL PLAYERS: {df_result.shape[0]}")
df_result.head()

TOTAL PLAYERS: 5


Unnamed: 0,playerid,dob,gamesplayed,injured,position,name,numassists,numgoals,pointpercentage
0,1,1988-08-04,22,False,RW,Dale,24,21,2.1
1,2,1985-06-05,22,False,C,Skip,15,36,2.5
2,3,1985-03-15,17,True,LW,Sanders,20,30,1.9
3,4,1983-02-20,22,False,LD,Patty,38,12,1.5
4,5,1987-08-04,20,False,RD,Reynolds,16,6,0.8


### This notebook demonstrated some simple SQL statements for creating, reading, updating and deleting records from an AWS RedShift table. 

### Full documentation can be found at: https://docs.aws.amazon.com/redshift/index.html