# Creating an sqlite database

In this notebook, we will 
- import the Nytimes COVID-19 us-counties dataset
- store the data in an sqlite database using pandas

In [1]:
# Import packages
import pandas as pd
import sqlite3

Below we import the county-cases dataset from the Nytimes github.

In [79]:
df = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
df.head()

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0.0
1,2020-01-22,Snohomish,Washington,53061.0,1,0.0
2,2020-01-23,Snohomish,Washington,53061.0,1,0.0
3,2020-01-24,Cook,Illinois,17031.0,1,0.0
4,2020-01-24,Snohomish,Washington,53061.0,1,0.0


In [83]:
def create_location_table(cursor):
    sql = '''  CREATE TABLE IF NOT EXISTS location (
               fips INTEGER PRIMARY KEY,
               county TEXT,
               state TEXT
        );'''
    
    cursor.execute(sql)
    print('location table created successfully....')


def create_cases_table(cursor):
    sql = '''CREATE TABLE IF NOT EXISTS cases (
             cases_id INTEGER PRIMARY KEY AUTOINCREMENT,
             fips INTEGER,
             date TEXT,
             cases INTEGER,
             deaths INTEGER,
            FOREIGN KEY (fips) 
                REFERENCES location (fips)
         )'''
    
    cursor.execute(sql)
    print('cases table created successfully....')
    
def create_tables(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute('''PRAGMA foreign_keys = ON;''')
    create_location_table(cursor)
    create_cases_table(cursor)
    conn.commit()
    conn.close()

In [84]:
create_tables('county-cases.db')

location table created successfully....
cases table created successfully....


## Insert records using pandas

Open up a connection the database.

In [116]:
conn = sqlite3.connect('county-cases.db')

Format dataframe so it matches the format of the sql table.

In [117]:
location = df[['fips', 'county', 'state']]\
.drop_duplicates(subset=['fips'])\
.set_index('fips')

Insert records into the database using `.to_sql` with `if_exists` set to `'append'`.

In [None]:
location.to_sql('location', conn, if_exists='append')

Repeat the same process with the data relevant to each table.

In [87]:
cases = df.reset_index()\
.rename({'index':'cases_id'},axis=1)\
.set_index('cases_id')[['fips','date','cases', 'deaths']]\

cases.to_sql('cases', conn, if_exists='append')

Commit changes to the connection

In [119]:
conn.commit()

Now, we can read in data from the database.

In [121]:
len(pd.read_sql('''SELECT * FROM cases''', conn))

813417

And we can see that the number of records match between the initial dataset and database

In [122]:
len(df)

813417