# Lab 1 - Creating the SQL Tables

In this lab, use `sqlalchemy` to create, populate, and query a table from the baseball database, as well as for the `super_hero_powers.csv` table.  

In [1]:
import pandas as pd
artwork = pd.read_csv("./data/Artworks.csv")

## Part 1 - Baseball Managers

In this part of the lab, you will walk through the process of creating a manager table from [Lahman’s Baseball Database](http://www.seanlahman.com/baseball-archive/statistics/)

## Task 1 - Download, unzip, rename 

1. Download the baseball database linked above (save to desktop)
2. Unzip the file and rename to `baseball`
3. Load the `core/Managers.csv` file into a pandas `DataFrame` using `read_csv`
4. Inspect the `column` names and `dtypes`

In [5]:
import pandas as pd
managers = pd.read_csv('~/Desktop/baseball/core/Managers.csv')
managers.head()

Unnamed: 0,playerID,yearID,teamID,lgID,inseason,G,W,L,rank,plyrMgr
0,wrighha01,1871,BS1,,1,31,20,10,3.0,Y
1,woodji01,1871,CH1,,1,28,19,9,2.0,Y
2,paborch01,1871,CL1,,1,29,10,19,8.0,Y
3,lennobi01,1871,FW1,,1,14,5,9,8.0,Y
4,deaneha01,1871,FW1,,2,5,2,3,8.0,Y


In [6]:
managers.columns

Index(['playerID', 'yearID', 'teamID', 'lgID', 'inseason', 'G', 'W', 'L',
       'rank', 'plyrMgr'],
      dtype='object')

In [17]:
managers.dtypes

playerID     object
yearID        int64
teamID       object
lgID         object
inseason      int64
G             int64
W             int64
L             int64
rank        float64
plyrMgr      object
dtype: object

In [18]:
managers.reset_index()
managers

Unnamed: 0,playerID,yearID,teamID,lgID,inseason,G,W,L,rank,plyrMgr
0,wrighha01,1871,BS1,,1,31,20,10,3.0,Y
1,woodji01,1871,CH1,,1,28,19,9,2.0,Y
2,paborch01,1871,CL1,,1,29,10,19,8.0,Y
3,lennobi01,1871,FW1,,1,14,5,9,8.0,Y
4,deaneha01,1871,FW1,,2,5,2,3,8.0,Y
5,fergubo01,1871,NY2,,1,33,16,17,5.0,Y
6,mcbridi01,1871,PH1,,1,28,21,7,1.0,Y
7,hastisc01,1871,RC1,,1,25,4,21,9.0,Y
8,pikeli01,1871,TRO,,1,4,1,3,6.0,Y
9,cravebi01,1871,TRO,,2,25,12,12,6.0,Y


#### Task 2 - Create a `sqlalchemy` types `dict`

In [9]:
from sqlalchemy import String, Integer
sql_types = {'playerID': String,
            'plyrMgr': String,
            'teamID': String,
            'lgID': String,
            'yearID': Integer,
            'inseason': Integer,
            'G': Integer,
            'W': Integer,
            'L': Integer,
            'rank': Integer}

#### Task 4 - Create an `engine` and `schema`

In [10]:
!rm databases/baseball.db

rm: databases/baseball.db: No such file or directory


In [14]:
from sqlalchemy import create_engine
mang_eng = create_engine("sqlite:///databases/baseball.db")
mang_eng.echo = True
schema = pd.io.sql.get_schema(managers, 'manager', keys= 'playerID', con= mang_eng, dtype= sql_types)
print(schema)


CREATE TABLE manager (
	"playerID" VARCHAR NOT NULL, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" VARCHAR, 
	CONSTRAINT manager_pk PRIMARY KEY ("playerID")
)




In [15]:
#Execute the Schema
mang_eng.execute(schema)

2019-01-25 08:48:40,310 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-01-25 08:48:40,314 INFO sqlalchemy.engine.base.Engine ()
2019-01-25 08:48:40,318 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-01-25 08:48:40,322 INFO sqlalchemy.engine.base.Engine ()
2019-01-25 08:48:40,326 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE manager (
	"playerID" VARCHAR NOT NULL, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" VARCHAR, 
	CONSTRAINT manager_pk PRIMARY KEY ("playerID")
)


2019-01-25 08:48:40,328 INFO sqlalchemy.engine.base.Engine ()
2019-01-25 08:48:40,334 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x10bd07940>

#### Task 5 - Use `to_sql` with `if_exists='append'` to insert the data

In [16]:
managers.to_sql('manager',
               con=mang_eng,
               dtype=sql_types,
               index=False,
               if_exists='append')

2019-01-25 08:49:26,989 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("manager")
2019-01-25 08:49:26,992 INFO sqlalchemy.engine.base.Engine ()
2019-01-25 08:49:26,999 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-25 08:49:27,043 INFO sqlalchemy.engine.base.Engine INSERT INTO manager ("playerID", "yearID", "teamID", "lgID", inseason, "G", "W", "L", rank, "plyrMgr") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2019-01-25 08:49:27,045 INFO sqlalchemy.engine.base.Engine (('wrighha01', 1871, 'BS1', None, 1, 31, 20, 10, 3.0, 'Y'), ('woodji01', 1871, 'CH1', None, 1, 28, 19, 9, 2.0, 'Y'), ('paborch01', 1871, 'CL1', None, 1, 29, 10, 19, 8.0, 'Y'), ('lennobi01', 1871, 'FW1', None, 1, 14, 5, 9, 8.0, 'Y'), ('deaneha01', 1871, 'FW1', None, 2, 5, 2, 3, 8.0, 'Y'), ('fergubo01', 1871, 'NY2', None, 1, 33, 16, 17, 5.0, 'Y'), ('mcbridi01', 1871, 'PH1', None, 1, 28, 21, 7, 1.0, 'Y'), ('hastisc01', 1871, 'RC1', None, 1, 25, 4, 21, 9.0, 'Y')  ... displaying 10 of 3469 total bound parameter set

IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: manager.playerID [SQL: 'INSERT INTO manager ("playerID", "yearID", "teamID", "lgID", inseason, "G", "W", "L", rank, "plyrMgr") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (('wrighha01', 1871, 'BS1', None, 1, 31, 20, 10, 3.0, 'Y'), ('woodji01', 1871, 'CH1', None, 1, 28, 19, 9, 2.0, 'Y'), ('paborch01', 1871, 'CL1', None, 1, 29, 10, 19, 8.0, 'Y'), ('lennobi01', 1871, 'FW1', None, 1, 14, 5, 9, 8.0, 'Y'), ('deaneha01', 1871, 'FW1', None, 2, 5, 2, 3, 8.0, 'Y'), ('fergubo01', 1871, 'NY2', None, 1, 33, 16, 17, 5.0, 'Y'), ('mcbridi01', 1871, 'PH1', None, 1, 28, 21, 7, 1.0, 'Y'), ('hastisc01', 1871, 'RC1', None, 1, 25, 4, 21, 9.0, 'Y')  ... displaying 10 of 3469 total bound parameter sets ...  ('bakerdu01', 2017, 'WAS', 'NL', 1, 160, 95, 65, 1.0, 'N'), ('speiech01', 2017, 'WAS', 'NL', 2, 2, 2, 0, 1.0, 'N'))]

#### Task 6 - Query the table to make sure it all worked

## Part 2 - Super Hero Powers

Now make a database and table for the super hero powers.

## Problem 1
    
**Task:** One the `super_hero_powers.csv` and verify that the contents of the columns are all Boolean.  In this problem, you need to

1. Create a `dict` that defines the `pandas` column type
2. Read the file in using a `pd.read_csv`.
3. Clean up all the column labels.
    
**Be sure to write clean code!**


## Problem 2
    
Now define an `sqlalchemy` table for these data using `pandas` `to_sql` dataframe method.  You can use the `sqlalchemy.String` and `sqlalchemy.Boolean` columns type, which are [documented here](https://docs.sqlalchemy.org/en/latest/core/type_basics.html)

## Problem 3
    
Now you need to make a new `engine`, `inspect` your database, and make a `session` to query the database.

## Problem 4
    
Perform `sqlalchemy` queries to answer each of the following questions.

1. How many heroes have both Super Strength and Super Speed?
2. How many heroes have names that start with the word *Black*
3. Are heroes with Agility more likely to have Stealth?
4. What fraction of all heroes that can fly also have Super Strength?
5. Consider heroes that have names that contain `"girl"`, `"boy"`, `"woman"`, or `"man"`.  Compute the following ratio

$$\frac{N(\text{boy or man})}{N(\text{girl or woman}}$$

**Hint:** You will need to use some combination of `where`, `group_by`, and `count` for each part.

## Problem 5

Tell me another cool fact about the super powers.