# PostgreSQL Workflow

In [1]:
import psycopg2 as pg
import pandas as pd

My idea for this workflow is that we can separate out feature engineering and modeling into more modular pieces. First, we can all connect to the same postgres database for a single source truth. Note if you want to just explore the data in a GUI environment you'll want to download pgadmin4 and use these same credentials

In [22]:
# establish connection to postgres
conn = pg.connect(database='postgres',
                  user='postgres',
                  password='w207final',
                  host='35.185.225.167')

Here's an example of just quickly pulling the raw data that was loaded. There are several other ways to pull data also that doesn't require directly using Pandas

In [6]:
query1 = '''SELECT * FROM "Teams"'''
example1 = pd.read_sql_query(query1, conn)

In [9]:
example1.head()

Unnamed: 0,TeamID,TeamName,FirstD1Season,LastD1Season
0,1101,Abilene Chr,2014,2018
1,1102,Air Force,1985,2018
2,1103,Akron,1985,2018
3,1104,Alabama,1985,2018
4,1105,Alabama A&M,2000,2018


Let's take a look at our features table. (keep in mind this is just an example table with a subset of the data for now)

In [10]:
query2 = '''SELECT * FROM features_example LIMIT 20'''
example2 = pd.read_sql_query(query2, conn)
example2.head()

Unnamed: 0,Season,DayNum,Team,Opponent,Outcome,Score,OpponentScore,NumOT,WLoc,holdout
0,1985,136,1116,1234,1,63,54,0,N,0
1,1985,136,1120,1345,1,59,58,0,N,0
2,1985,136,1207,1250,1,68,43,0,N,0
3,1985,136,1229,1425,1,58,55,0,N,0
4,1985,136,1242,1325,1,49,38,0,N,0


Next, say for example one of us came up with a good idea for a feature. We could create the feature in Python and push it up to the features table in postgres. For this particular example, I'm going to generate features based on columns from the features_example table -- but this data would more realistically come from some other source -- either loaded or external.

In [16]:
query3 = '''
SELECT "Season","DayNum", "Team", "Score", "OpponentScore"
FROM features_example'''

example3 = pd.read_sql_query(query3, conn)
example3['new_feature'] = example3['Score'] - example3['OpponentScore']
example3.head()

Unnamed: 0,Season,DayNum,Team,Score,OpponentScore,new_feature
0,1985,136,1116,63,54,9
1,1985,136,1120,59,58,1
2,1985,136,1207,68,43,25
3,1985,136,1229,58,55,3
4,1985,136,1242,49,38,11


Now that I've created a new feature, I want to push this to postgres so any of us can use the new feature to model on.

In [25]:
# establish connection to postgres
conn = pg.connect(database='postgres',
                  user='postgres',
                  password='w207final',
                  host='35.185.225.167')

query1 = '''ALTER TABLE features_example ADD COLUMN IF NOT EXISTS "NewFeature" INT'''
query2 = '''Code to update column here...'''
c = conn.cursor()
c.execute(query1)
conn.commit()
conn.close()