# Calling RedShift Stored Procedure
* StelllarAlgo Data Science
* Ryan Kazmerik & Nakisa Rad
* May 15, 2022

This notebook provides example code of how to execute a RedShift stored procedure and get a result set back by connecting to the RedShift db directly using the psycodb2 package. The data and databases used are for demonstration purposes only:

In [2]:
import boto3
import pandas as pd
import psycopg2

pd.options.display.max_columns = 100
pd.options.display.max_rows = 100

### To connect to RedShift, we have to tell AWS which profile we'd like to login to, this will send us for a browser authentication trip:

In [3]:
! aws sso login --profile Stellaralgo-DataScienceAdmin

Attempting to automatically open the SSO authorization page in your default browser.
If the browser does not open or you wish to use a different device to authorize this request, open the following URL:

https://device.sso.us-east-1.amazonaws.com/

Then enter the code:

TZRH-GDTF
Successfully logged into Start URL: https://stellaralgo.awsapps.com/start#/


### Now we can create a session and client to RedShift (QA), and create a new connection using Psycopg2:

In [4]:
session = boto3.session.Session(profile_name='Stellaralgo-DataScienceAdmin')
client = session.client('redshift')

CLUSTER = 'qa-app'
DBNAME = 'stlrcfl'

creds = client.get_cluster_credentials(                
    ClusterIdentifier = CLUSTER,
    DbUser = 'admin',
    DbName = DBNAME,
    DbGroups = ['admin_group'],
    AutoCreate=True
)

# create a connection & cursor and call the stored procedure
conn = psycopg2.connect(
    host = 'qa-app.ctjussvyafp4.us-east-1.redshift.amazonaws.com',
    port = 5439,
    user = creds['DbUser'],
    password = creds['DbPassword'],
    database = DBNAME
)

print(f"GOT CONNECTION TO DATABASE: {CLUSTER} {DBNAME}")

OperationalError: connection to server at "qa-app.ctjussvyafp4.us-east-1.redshift.amazonaws.com" (54.162.199.152), port 5439 failed: Operation timed out
	Is the server running on that host and accepting TCP/IP connections?


### Now we can call our stored procedure, and in the last parameter include the name of our named cursor, which we can then use to fetch all of the data returned by the stored proc, and load this into a dataframe:

In [33]:
# call the stored proc to get data for the retention model
cur = conn.cursor()
cur.execute(f"CALL dw.getretentionmodeldata(35, 2020, 2021, 'rkcursor')")

# create a named cursor based on the cursor name passed in above
named_cursor = conn.cursor('rkcursor')
data = named_cursor.fetchall()

# load the data and columns into a data frame
cols = [row[0] for row in named_cursor.description]
df = pd.DataFrame(data=data, columns=cols)

conn.commit()

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5645 entries, 0 to 5644
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   lkupclientid         5645 non-null   int64         
 1   dimcustomermasterid  5645 non-null   int64         
 2   year                 5645 non-null   int64         
 3   productgrouping      5645 non-null   object        
 4   totalspent           5645 non-null   float64       
 5   recentdate           5645 non-null   datetime64[ns]
 6   attendancepercent    5645 non-null   float64       
 7   renewedbeforedays    5645 non-null   int64         
 8   source_tenure        5645 non-null   object        
 9   tenure               5645 non-null   int64         
 10  disttovenue          5645 non-null   float64       
 11  recency              5645 non-null   int64         
 12  missed_games_1       5645 non-null   object        
 13  missed_games_2       5645 non-nul

In [9]:
df.head()


Unnamed: 0,lkupclientid,dimcustomermasterid,year,productgrouping,totalspent,recentdate,attendancepercent,renewedbeforedays,source_tenure,tenure,disttovenue,recency,missed_games_1,missed_games_2,missed_games_over_2,forward_records,opentosendratio,clicktosendratio,clicktoopenratio,gender,phonecall,inperson_contact,isnextyear_buyer
0,11,272867517,2014,Group,254.83,1970-01-01,0.76,5.0,5,5.0,24.07,0,1,0,0,0,0.0,0.0,0.0,Unknown,0,0,0
1,11,272808838,2014,Group,80.0,1970-01-01,1.0,6.0,6,6.0,0.87,0,1,0,0,0,0.0,0.0,0.0,Unknown,0,0,1
2,11,272918909,2014,Mini Plan,72.0,2014-08-25,0.67,35.0,1825,132.0,19.28,0,0,0,9,0,0.0,0.0,0.0,Unknown,0,0,1
3,11,304301466,2014,Mini Plan,64.0,1970-01-01,1.0,8.0,1825,81.0,12.12,0,0,1,0,0,0.0,0.0,0.0,Unknown,0,0,1
4,11,272883990,2014,Group,170.81,1970-01-01,0.84,11.0,11,11.0,30.64,0,1,0,0,0,0.0,0.0,0.0,Unknown,0,0,0
