# Retention RedShift (DB Method)
* StelllarAlgo Data Science
* Ryan Kazmerik & Nakisa Rad
* Mar 7, 2022

This notebook provides example code of how to execute CRUD operations against a RedShift database by connecting to the RedShift db directly using the psycodb2 package. The data and databases used are for demonstration purposes only:

In [4]:
import boto3
import pandas as pd
import psycopg2

### Let's create a dummy dataset to write into our RedShift database:

In [7]:
players = [
    {"dob":"1988-08-04","gamesplayed":"20","injured":"false","position":"RW","name":"Dale","numassists":"24","numgoals":"21","pointpercentage":"2.1"},
    {"dob":"1985-06-05","gamesplayed":"20","injured":"false","position":"C","name":"Skip","numassists":"15","numgoals":"36","pointpercentage":"2.5"},
    {"dob":"1985-03-15","gamesplayed":"15","injured":"true","position":"LW","name":"Sanders","numassists":"20","numgoals":"30","pointpercentage":"1.9"},
    {"dob":"1983-02-20","gamesplayed":"20","injured":"false","position":"LD","name":"Patty","numassists":"38","numgoals":"12","pointpercentage":"1.5"},
    {"dob":"1987-08-04","gamesplayed":"18","injured":"false","position":"RD","name":"Reynolds","numassists":"16","numgoals":"6","pointpercentage":"0.8"}
]

df_players = pd.DataFrame(data=players)

df_players.head()

Unnamed: 0,dob,gamesplayed,injured,position,name,numassists,numgoals,pointpercentage
0,1988-08-04,20,False,RW,Dale,24,21,2.1
1,1985-06-05,20,False,C,Skip,15,36,2.5
2,1985-03-15,15,True,LW,Sanders,20,30,1.9
3,1983-02-20,20,False,LD,Patty,38,12,1.5
4,1987-08-04,18,False,RD,Reynolds,16,6,0.8


### To connect to RedShift, we have to tell AWS which profile we'd like to login to, this will send us for a browser authentication trip:

In [3]:
! aws sso login --profile Stellaralgo-DataScienceAdmin

Attempting to automatically open the SSO authorization page in your default browser.
If the browser does not open or you wish to use a different device to authorize this request, open the following URL:

https://device.sso.us-east-1.amazonaws.com/

Then enter the code:

MWWV-ZTJH
Successully logged into Start URL: https://stellaralgo.awsapps.com/start


### Now we can create a session and client to RedShift (QA), and create a new connection using Psycopg2:

In [8]:
session = boto3.session.Session(profile_name='Stellaralgo-DataScienceAdmin')
client = session.client('redshift')

cluster = 'qa-app'
dbname = 'datascience'
schema = 'ds'
table = 'dummytable'

creds = client.get_cluster_credentials(                
    ClusterIdentifier = cluster,
    DbUser = 'admin',
    DbName = dbname,
    DbGroups = ['admin_group'],
    AutoCreate=True
)
    
conn = psycopg2.connect(
    host = 'qa-app.ctjussvyafp4.us-east-1.redshift.amazonaws.com',
    port = 5439,
    user = creds['DbUser'],
    password = creds['DbPassword'],
    database = dbname
)

print('CREATED CONNECTION TO DATABASE')

CREATED CONNECTION TO DATABASE


### Let's insert our dataframe of team members into the dummy table:

In [9]:
print(f"INSERTING {len(df_players)} PLAYERS INTO DUMMY TABLE:")

fields = f"""
    INSERT INTO {dbname}.{schema}.{table} (
        dob,
        gamesplayed,
        injured,
        position,
        name,
        numassists,
        numgoals,
        pointpercentage
    ) VALUES """

values_list = []
for i, player in df_players.iterrows():
    
    values = f"""(
        '{player["dob"]}',
        {player["gamesplayed"]},
        {player["injured"]},
        '{player["position"]}',
        '{player["name"]}',
        {player["numassists"]},
        {player["numgoals"]},
        {player["pointpercentage"]}
    )"""
    
    values_list.append(values)
    print(f" > ADDED PLAYER {i+1} TO ROSTER")

insert_statement = fields + ",".join(values_list)+";"

cursor = conn.cursor()
cursor.execute(insert_statement)
conn.commit()

count = cursor.rowcount

print(f"INSERTED {count} PLAYERS INTO: {dbname}.{schema}.{table}")

INSERTING 5 PLAYERS INTO DUMMY TABLE:
 > ADDED PLAYER 1 TO ROSTER
 > ADDED PLAYER 2 TO ROSTER
 > ADDED PLAYER 3 TO ROSTER
 > ADDED PLAYER 4 TO ROSTER
 > ADDED PLAYER 5 TO ROSTER
INSERTED 5 PLAYERS INTO: datascience.ds.dummytable


### Now we can query the table directly to get back our records, let's just get back players who are not injured:

In [15]:
select_statement = f"""
    SELECT *
    FROM {dbname}.{schema}.{table}
    WHERE injured = False
"""

cursor.execute(select_statement)
records = cursor.fetchall()

print(f"HEALTHY PLAYERS: {len(records)/2}")
print(records)

HEALTHY PLAYERS: 4.0
[(1, datetime.date(1988, 8, 4), 23, False, 'RW', 'Dale', 24, 21, 2.1), (2, datetime.date(1985, 6, 5), 23, False, 'C', 'Skip', 15, 36, 2.5), (4, datetime.date(1983, 2, 20), 23, False, 'LD', 'Patty', 38, 12, 1.5), (5, datetime.date(1987, 8, 4), 21, False, 'RD', 'Reynolds', 16, 6, 0.8), (6, datetime.date(1988, 8, 4), 21, False, 'RW', 'Dale', 24, 21, 2.1), (7, datetime.date(1985, 6, 5), 21, False, 'C', 'Skip', 15, 36, 2.5), (9, datetime.date(1983, 2, 20), 21, False, 'LD', 'Patty', 38, 12, 1.5), (10, datetime.date(1987, 8, 4), 19, False, 'RD', 'Reynolds', 16, 6, 0.8)]


### Now let's update the player stats (goals, assists & points percentage):

In [20]:
update_statement = f"""
    UPDATE {dbname}.{schema}.{table}
    SET gamesplayed = gamesplayed + 1
"""

cursor.execute(update_statement)
conn.commit()

count = cursor.rowcount
print(count/2, "RECORDS UPDATED SUCCESSFULLY")

5.0 RECORDS UPDATED SUCCESSFULLY


### We can also delete all of the players from our table:

In [24]:
delete_statement = f"""
    DELETE 
    FROM {dbname}.{schema}.{table}
"""

cursor.execute(delete_statement)
conn.commit()

count = cursor.rowcount
print("RECORDS DELETED SUCCESSFULLY")

RECORDS DELETED SUCCESSFULLY


### Let's query the entire table again to see if it's empty:

In [27]:
sql_statement = f"""
    SELECT *
    FROM {dbname}.{schema}.{table}
"""

cursor.execute(sql_statement)
conn.commit()

records = cursor.fetchall()
count = cursor.rowcount

print(f"TOTAL PLAYERS: {count/2}")
print(records)

TOTAL PLAYERS: 0.0
[]


### This notebook demonstrated some simple SQL statements for creating, reading, updating and deleting records from an AWS RedShift table. 

### Full documentation can be found at: https://www.psycopg.org/docs/