# Calling RedShift Stored Procedure
* StelllarAlgo Data Science
* Ryan Kazmerik & Nakisa Rad
* May 15, 2022

This notebook provides example code of how to execute a RedShift stored procedure and get a result set back by connecting to the RedShift db directly using the psycodb2 package. The data and databases used are for demonstration purposes only:

In [10]:
import boto3
import pandas as pd
import psycopg2

pd.options.display.max_columns = 100
pd.options.display.max_rows = 100

### To connect to RedShift, we have to tell AWS which profile we'd like to login to, this will send us for a browser authentication trip:

In [11]:
! aws sso login --profile Stellaralgo-DataScienceAdmin

Attempting to automatically open the SSO authorization page in your default browser.
If the browser does not open or you wish to use a different device to authorize this request, open the following URL:

https://device.sso.us-east-1.amazonaws.com/

Then enter the code:

NTNP-FGRL
Successfully logged into Start URL: https://stellaralgo.awsapps.com/start#/


### Now we can create a session and client to RedShift (QA), and create a new connection using Psycopg2:

In [12]:
session = boto3.session.Session(profile_name='Stellaralgo-DataScienceAdmin')
client = session.client('redshift')

CLUSTER = 'qa-app'
DBNAME = 'stlrnflvikings'

creds = client.get_cluster_credentials(                
    ClusterIdentifier = CLUSTER,
    DbUser = 'admin',
    DbName = DBNAME,
    DbGroups = ['admin_group'],
    AutoCreate=True
)

# create a connection & cursor and call the stored procedure
conn = psycopg2.connect(
    host = 'qa-app.ctjussvyafp4.us-east-1.redshift.amazonaws.com',
    port = 5439,
    user = creds['DbUser'],
    password = creds['DbPassword'],
    database = DBNAME
)

print(f"GOT CONNECTION TO DATABASE: {CLUSTER} {DBNAME}")

GOT CONNECTION TO DATABASE: qa-app stlrnflvikings


### Now we can call our stored procedure, and in the last parameter include the name of our named cursor, which we can then use to fetch all of the data returned by the stored proc, and load this into a dataframe:

In [16]:
# call the stored proc to get data for the retention model
cur = conn.cursor()
cur.execute(f"CALL ds.getretentionmodeldata(112, 2019, 2022, 'rkcursor')")

# create a named cursor based on the cursor name passed in above
named_cursor = conn.cursor('rkcursor')
data = named_cursor.fetchall()

# load the data and columns into a data frame
cols = [row[0] for row in named_cursor.description]
df = pd.DataFrame(data=data, columns=cols)

conn.commit()

In [19]:
df.info()
df.to_csv("/Users/grantdonst/Downloads/vikings_out.csv")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20020 entries, 0 to 20019
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   lkupclientid         20020 non-null  int64         
 1   dimcustomermasterid  20020 non-null  int64         
 2   year                 20020 non-null  int64         
 3   productgrouping      20020 non-null  object        
 4   totalspent           20020 non-null  float64       
 5   recentdate           20020 non-null  datetime64[ns]
 6   attendancepercent    20020 non-null  float64       
 7   renewedbeforedays    20020 non-null  int64         
 8   source_tenure        20020 non-null  object        
 9   tenure               20020 non-null  int64         
 10  disttovenue          20020 non-null  float64       
 11  recency              20020 non-null  int64         
 12  missed_games_1       20020 non-null  object        
 13  missed_games_2       20020 non-

In [18]:
df.head()


Unnamed: 0,lkupclientid,dimcustomermasterid,year,productgrouping,totalspent,recentdate,attendancepercent,renewedbeforedays,source_tenure,tenure,disttovenue,recency,missed_games_1,missed_games_2,missed_games_over_2,forward_records,opentosendratio,clicktosendratio,clicktoopenratio,gender,phonecall,inperson_contact,isnextyear_buyer
0,112,274570,2019,Full Season,51000.0,2019-12-29,0.75,205,4380,338,405.6,0,0,0,1,262,0.0,0.0,0.0,Male,0,0,1
1,112,312701,2021,Full Season,1120.0,1970-01-01,0.0,213,1825,1031,107.15,0,0,0,1,20,0.0,0.0,0.0,Female,0,0,0
2,112,329351,2021,Full Season,1520.0,2022-01-09,0.44,213,5475,1080,1.33,0,0,0,1,10,0.0,0.0,0.0,Male,0,0,1
3,112,1508148,2019,Full Season,4880.0,2019-09-08,0.11,205,2920,338,1652.95,6,0,1,1,12,0.0,0.0,0.0,Male,0,0,1
4,112,2476090,2021,Full Season,4040.0,1970-01-01,0.0,213,2920,1080,2529.4,0,0,0,1,40,0.0,0.0,0.0,Male,0,0,1
