# SQL Database Construction
The purpose of this notebook is to utilize the combined .csv files created in Initial_Kaggle_Dataset_Construction.ipynb to build a SQL database and to begin running some initial queries of MLB Pitch data from 2015-2019.

First, loading in needed packages:

In [7]:
from sqlalchemy import create_engine
import pandas as pd

Created mlb_pitches psql database in terminal with the following SQL code:
```sql
patrickbovard=# CREATE DATABASE mlb_pitches
patrickbovard-# ;
```

In [8]:
#First, creating an engine and then importing the various .csv files.
engine = create_engine('postgresql://patrickbovard:localhost@localhost:5432/mlb_pitches')

Checking working directory:

In [9]:
pwd

'/Users/patrickbovard/Documents/GitHub/metis_final_project/MLB_Pitch_Data_Setup_SQL'

Loading the combined .csv files into pandas:

In [4]:
pitches = pd.read_csv('./Data/Kaggle_Files/combined_pitches.csv')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [5]:
atbats = pd.read_csv('./Data/Kaggle_Files/combined_atbats.csv')
games = pd.read_csv('./Data/Kaggle_Files/combined_games.csv')
players = pd.read_csv('./Data/Kaggle_Files/player_names.csv')

In [35]:
pitches.drop(columns=['Unnamed: 0'], inplace=True)

In [36]:
pitches.head()

Unnamed: 0,px,pz,start_speed,end_speed,spin_rate,spin_dir,break_angle,break_length,break_y,ax,...,event_num,b_score,ab_id,b_count,s_count,outs,pitch_num,on_1b,on_2b,on_3b
0,0.416,2.963,92.9,84.1,2305.05,159.235,-25.0,3.2,23.7,7.665,...,3,0.0,2015000000.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,-0.191,2.347,92.8,84.1,2689.93,151.402,-40.7,3.4,23.7,12.043,...,4,0.0,2015000000.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0
2,-0.518,3.284,94.1,85.2,2647.97,145.125,-43.7,3.7,23.7,14.368,...,5,0.0,2015000000.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0
3,-0.641,1.221,91.0,84.0,1289.59,169.751,-1.3,5.0,23.8,2.104,...,6,0.0,2015000000.0,0.0,2.0,0.0,4.0,0.0,0.0,0.0
4,-1.821,2.083,75.4,69.6,1374.57,280.671,18.4,12.0,23.8,-10.28,...,7,0.0,2015000000.0,1.0,2.0,0.0,5.0,0.0,0.0,0.0


Adding these tables to the mlb_pitches sql database (commenting out after initial time running, since they are already tables then):

In [2]:
#atbats.to_sql('atbats', engine, index=False)

In [3]:
#games.to_sql('games', engine, index=False)

In [4]:
#players.to_sql('players', engine, index=False)

In [5]:
#pitches.to_sql('pitches', engine, index=False)

Testing queries:

In [10]:
query = '''
SELECT * 
FROM pitches
LIMIT 5
;
'''
df = pd.read_sql(query, engine)

df.tail()

Unnamed: 0,px,pz,start_speed,end_speed,spin_rate,spin_dir,break_angle,break_length,break_y,ax,...,event_num,b_score,ab_id,b_count,s_count,outs,pitch_num,on_1b,on_2b,on_3b
0,0.416,2.963,92.9,84.1,2305.052,159.235,-25.0,3.2,23.7,7.665,...,3,0.0,2015000000.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,-0.191,2.347,92.8,84.1,2689.935,151.40200000000004,-40.7,3.4,23.7,12.043,...,4,0.0,2015000000.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0
2,-0.518,3.284,94.1,85.2,2647.972,145.125,-43.7,3.7,23.7,14.368,...,5,0.0,2015000000.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0
3,-0.641,1.221,91.0,84.0,1289.59,169.75099999999995,-1.3,5.0,23.8,2.104,...,6,0.0,2015000000.0,0.0,2.0,0.0,4.0,0.0,0.0,0.0
4,-1.821,2.083,75.4,69.6,1374.569,280.671,18.4,12.0,23.8,-10.28,...,7,0.0,2015000000.0,1.0,2.0,0.0,5.0,0.0,0.0,0.0


Everything looks good on the SQL database set up.

Verifying the table set up in my mlb_pitches database in PSQL:  
```sql
patrickbovard=# \connect mlb_pitches
You are now connected to database "mlb_pitches" as user "patrickbovard".
mlb_pitches=# \dt
            List of relations
 Schema |  Name   | Type  |     Owner     
--------+---------+-------+---------------
 public | atbats  | table | patrickbovard
 public | games   | table | patrickbovard
 public | pitches | table | patrickbovard
 public | players | table | patrickbovard
(4 rows)
```

Great, all the tables are now set up in a SQL database and ready to go for querying.

In [41]:
engine.dispose()

# Next: initial_sql_queries.ipynb