<img src="https://github.com/christopherhuntley/DATA6510/blob/master/img/Dolan.png?raw=true" width="180px" align="right">

# **DATA 6510**
# **Homework 4: The NBA PlayLog DB Challenge** 
_From ERD to Live Database._


In this assignment you will build the NBA PlayLog database from just source data and an ERD. The source data is for two games, with some of the same players appearing in both games, just to be sure we can handle a full season of games.

## **1. Study the ERD.**

![NBA PlayLog ERD](https://github.com/christopherhuntley/DATA6510/raw/master/img/L9_NBA_PlayLog_ERD.png)





## **2. Create a fresh NBA PlayLog DB.**





In [1]:
# download the source data from GitHub
!wget https://raw.githubusercontent.com/christopherhuntley/DATA6510/master/data/NBA/PlayLog21900001-NOP%40TOR.csv
!wget https://raw.githubusercontent.com/christopherhuntley/DATA6510/master/data/NBA/PlayLog21900017-TOR%40BOS.csv

--2021-12-03 01:35:23--  https://raw.githubusercontent.com/christopherhuntley/DATA6510/master/data/NBA/PlayLog21900001-NOP%40TOR.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167211 (163K) [text/plain]
Saving to: ‘PlayLog21900001-NOP@TOR.csv’


2021-12-03 01:35:23 (11.6 MB/s) - ‘PlayLog21900001-NOP@TOR.csv’ saved [167211/167211]

--2021-12-03 01:35:23--  https://raw.githubusercontent.com/christopherhuntley/DATA6510/master/data/NBA/PlayLog21900017-TOR%40BOS.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 143318 (140K) [te

In [2]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create the DATA6510/data/MoviesTonight folder in Google Drive
from pathlib import Path

data_root = Path("./drive/My Drive/Colab Notebooks/DATA6510")
if not data_root.exists():
  print(
      '''
      Warning! The folder '/Colab Notebooks/DATA6510' could not be found in the connected Google Drive. 
      Please make 100% sure that both Colab and Chrome are set up use your @student.fairfield.edu account. 
      For now, a new folder with the correct path has been created in whatever Google Drive it found. 
      ''')
data_root = data_root / 'data' / 'NBA'
data_root.mkdir(parents=True, exist_ok=True)



Mounted at /content/drive


In [3]:
%%bash
# create (or refresh) the symlink
rm -rf data6510
ln -s drive/My\ Drive/Colab\ Notebooks/DATA6510 data6510

# delete old copy of the database file
rm -rf data6510/data/NBA/NBA_PlayLog.db

In [4]:
# Load %%sql magic
%load_ext sql

# Standard Imports
import sqlite3
import pandas as pd

# Database connection
%sql sqlite:///data6510/data/NBA/NBA_PlayLog.db

'Connected: @data6510/data/NBA/NBA_PlayLog.db'

In [5]:
# Load the data from csv files
# ONLY RUN THIS ONCE TO AVOID DATA DUPLICATION

# data_conf configures the rest so we can easily add files
data_conf = [('PlayLog_import','PlayLog21900001-NOP@TOR.csv'),('PlayLog_import','PlayLog21900017-TOR@BOS.csv')]

# connect to the database (kept in Google Drive)
conn = sqlite3.connect('data6510/data/NBA/NBA_PlayLog.db') 
with conn:

  # handles each CSV file we have configured at top
  for tbl,fname in data_conf:

    # tbl is the import table in the database; fname is the CSV file name
    print(tbl,fname)

    # Load the CSV file into a DataFrame
    df = pd.read_csv(fname)

    # determine home and away from the fname
    df['home_team'] = fname.split('@')[0][-3:]
    df['away_team'] = fname.split('@')[1][0:3]

    # Load into the import table
    df.to_sql(tbl,conn,if_exists='append',index=False)

PlayLog_import PlayLog21900001-NOP@TOR.csv
PlayLog_import PlayLog21900017-TOR@BOS.csv


In [6]:
%%sql
-- A quick check to make sure the data loaded
SELECT distinct home_team,away_team 
FROM PlayLog_import 
LIMIT 2;

 * sqlite:///data6510/data/NBA/NBA_PlayLog.db
Done.


home_team,away_team
NOP,TOR
TOR,BOS


## **3. Write and debug SQL DDL** 
Write SQL DDL for your tables. 
- Remember, we're using SQLite, not MySQL. Use the right data types. 
- Name each table in the plural form, with lowercase letters and underscores.
- You will need to create the tables in the right order; otherwise the FKs constraints won't work.
- Use `DROP TABLE IF EXISTS ...` statements to clear out each table before the `CREATE ...` statements. 


## **4. Populate tables from `PlayLog_import`.**
Remember that the order matters. 

## **5. Test with `SELECT` queries.**
These are up to you. However, you should at least be able to recreate the results from Homework 2A _without_ using the `PlayLog_import` table. 

## **6. Drop the `PlayLog_import` table.**

Then run this cell to force SQLite to delete the dropped table and minimize the file size. 

In [None]:
%%sql
vacuum;

## **6. Do it all again, from the top.**
From the Colab menus select `Runtime` $\rightarrow$ `Restart and run all`. If you get an error, then debug it. 

## **Turn it in!**