# Final Project Part I


## Source SQLite Database

* Dataset URL: **/dsa/data/DSA-7030/cc0122dbv2.sqlite.db**
* Data Dictionary: [pdf](./ChicagoData-Description.pdf)
* [Chicago Crimes 2001-Present Dashboard](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g)

This SQLite database consists of a set of normalized relations populated with publically available Chicago crime data for the years 2001 to 2022.  

## Database exploration

The cells below provide SQL DML statments for examining the underlying metadata in the SQLite database that describes the table, column, and relationship details.  An initial connection and subsequent SQL statements are provided for acquiring the information necessary for reconstructing the table and relational structure in your postgres SSO database.

In [1]:
#Load extention and connect to database
%load_ext sql
%sql sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db

'Connected: @/dsa/data/DSA-7030/cc0122dbv2.sqlite.db'

## Explore the SQLite Tables List

This quiery simply lists the names of the database tables.

In [2]:
%%sql
SELECT distinct m.type, m.tbl_name --m.sql
FROM sqlite_master AS m,
     pragma_table_info(m.name) AS t
WHERE m.type = 'table'
order by m.name, t.pk DESC

 * sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db
Done.


type,tbl_name
table,cc_case_location
table,cc_cases
table,cc_iucr_codes
table,cc_iucr_codes_primary_descriptions
table,cc_iucr_codes_secondary_descriptions
table,cc_nibrs_categories
table,cc_nibrs_crimes_against
table,cc_nibrs_fbicode_offenses
table,cc_nibrs_offenses_crimes_aginst


## Explore Column Details

The query below provdes the complete list of tables and their columns with important details.

* **tbl_name** = Name of the table
* **name** = column name
* **type** = declared data type
* **notnull** = indicates column declared as NOT NULL
* **pk** = indicates column is the primary key

In [3]:
%%sql 
SELECT m.tbl_name, t.* --m.sql
 FROM pragma_table_info(m.tbl_name) t, sqlite_master m WHERE m.type='table';

 * sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db
Done.


tbl_name,cid,name,type,notnull,dflt_value,pk
cc_case_location,0,case_number,varchar(20),0,,1
cc_case_location,1,block,varchar(100),0,,0
cc_case_location,2,location_description,varchar(100),0,,0
cc_case_location,3,community_area,integer,0,,0
cc_case_location,4,ward,integer,0,,0
cc_case_location,5,district,integer,0,,0
cc_case_location,6,beat,integer,0,,0
cc_case_location,7,latitude,real,0,,0
cc_case_location,8,longitude,real,0,,0
cc_iucr_codes,0,iucr_code,varchar(10),0,,1


## Below query provdes the list of columns that are declared "unique" for referential integrity enforcement.

<u>Query Output Descriptions</u>
* **name** = the table name begining at the "cc_" -- cc_case_location is table name.
* **unique** = indicates the column is declared "unique"
* **origin** = indicates the columns is declared as primary key
* **name_1** = column name

In [4]:
%%sql 
select il.*,ii.* --,m.sql
    from sqlite_master m, 
    pragma_index_list( m.name ) as il,
    pragma_index_info(il.name) as ii

 * sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db
Done.


seq,name,unique,origin,partial,seqno,cid,name_1
0,sqlite_autoindex_cc_case_location_1,1,pk,0,0,0,case_number
0,sqlite_autoindex_cc_iucr_codes_1,1,pk,0,0,0,iucr_code
0,sqlite_autoindex_cc_iucr_codes_primary_descriptions_1,1,pk,0,0,0,iucr_code
0,sqlite_autoindex_cc_iucr_codes_secondary_descriptions_1,1,pk,0,0,0,iucr_code
0,sqlite_autoindex_cc_nibrs_categories_1,1,pk,0,0,0,nibrs_offense_code
0,sqlite_autoindex_cc_nibrs_crimes_against_1,1,pk,0,0,0,nibrs_crime_against
0,sqlite_autoindex_cc_nibrs_fbicode_offenses_1,1,pk,0,0,0,nibrs_offense_code
0,sqlite_autoindex_cc_nibrs_offenses_crimes_aginst_1,1,pk,0,0,0,nibrs_crime_against
0,sqlite_autoindex_cc_nibrs_offenses_crimes_aginst_1,1,pk,0,1,1,nibrs_offense_code
0,sqlite_autoindex_cc_cases_1,1,pk,0,0,0,case_number


## Explore Relationship Details (get foreign key references)

The below query exracts the details describing the foreign key referenes bewtween tables.

* **from_table** = the name of the one-side table
* **from_column** = the name of the foreign key column in the one-side table
* **to_table** = the name of the many-side reference table
* **to_column** = the name of the foreign key column in the one-side reference table

These metadata can be translated to the necessary SQL statement to establish a relationship between tables:

```SQL
FOREIGN KEY (<from_column>) REFERENCES <to_table>(<to_column>)
```

In [5]:
%%sql
SELECT 
    m.name as from_table, f.'from' as from_column, f.'table' as to_table, f.'to' as to_column --, m.sql
FROM
    sqlite_master m
    JOIN pragma_foreign_key_list(m.name) f ON m.name != f."table"
WHERE m.type = 'table'
ORDER BY m.name
;

 * sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db
Done.


from_table,from_column,to_table,to_column
cc_case_location,case_number,cc_cases,case_number
cc_cases,iucr_code,cc_iucr_codes,iucr_code
cc_iucr_codes,iucr_code,cc_cases,iucr_code
cc_iucr_codes_primary_descriptions,iucr_code,cc_iucr_codes,iucr_code
cc_iucr_codes_secondary_descriptions,iucr_code,cc_iucr_codes,iucr_code
cc_nibrs_fbicode_offenses,nibrs_offense_code,cc_cases,nibrs_fbi_offense_code
cc_nibrs_fbicode_offenses,nibrs_offense_code,cc_nibrs_categories,nibrs_offense_code
cc_nibrs_offenses_crimes_aginst,nibrs_offense_code,cc_nibrs_fbicode_offenses,nibrs_offense_code
cc_nibrs_offenses_crimes_aginst,nibrs_crime_against,cc_nibrs_crimes_against,nibrs_crime_against


## Using the metadata from above:

## Implement the required CREATE TABLE statements for establishing the Chicago Crime Database in your SSO dsa_student database.  

The SQL statement takes this form:

```SQL
CREATE TABLE SSO.tbl_name (
 column_name_1 data_type <unqiue, not null>,
 column_name_N data_type <unqiue, not null>,
 PRIMARY KEY (<column_name>),
 <FOREIGN KEY (from_column_name) REFERENCES <SSO.to_table_name>(to_column_name)
 );
```

**The database tables and column names created in your SSO postgres server dsa_student database should be named exactly as they appear in the ```cc0122dbv2.sqlite.db``` SQLite database.**

Use as many cells as desired.

# Connect to your SSO database using sqlAlchmey connection and implement your database structure

In [8]:
#implement tables in SSO database
import psycopg2
import sqlalchemy
import getpass

user = "jwj8c8"
host = "pgsql.dsa.lan"
database = "dsa_student"
password = getpass.getpass()
connectionstring = "postgresql://" + user + ":" + password + "@" + host + "/" + database

engine = sqlalchemy.create_engine(connectionstring)

connection = None

try:
    connection = engine.connect()
except Excerption as err:
    print("An error has occurred trying to connect: {}".format(err))
    
del password

········


In [20]:
##Deletes Tables
with engine.connect() as cursor:
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_case_location CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_iucr_codes_primary_descriptions CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_iucr_codes_secondary_descriptions CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_nibrs_crimes_against CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_nibrs_categories CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_iucr_codes CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_nibrs_fbicode_offenses CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_nibrs_offenses_crimes_aginst CASCADE")
    cursor.execute("DROP TABLE IF EXISTS jwj8c8.cc_cases CASCADE")

In [22]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_iucr_codes (
    iucr_code varchar(10) not null unique,
    iucr_index_code char,
    PRIMARY KEY (iucr_code)    
);''')

In [23]:
with engine.connect() as connection:
    connection.execute("""CREATE TABLE jwj8c8.cc_iucr_codes_primary_descriptions (
    iucr_code varchar(10) not null unique,
    iucr_primary_desc varchar(100),  
    PRIMARY KEY (iucr_code),
    FOREIGN KEY (iucr_code) REFERENCES jwj8c8.cc_iucr_codes(iucr_code)
);""")

In [24]:
with engine.connect() as connection:
    connection.execute('''
CREATE TABLE jwj8c8.cc_iucr_codes_secondary_descriptions (
    iucr_code varchar(10) not null unique,
    iucr_secondary_desc varchar(100),
    PRIMARY KEY (iucr_code),
    FOREIGN KEY (iucr_code) REFERENCES jwj8c8.cc_iucr_codes(iucr_code)
);''')

In [25]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_nibrs_crimes_against (
    nibrs_crime_against varchar(20) not null unique,   
    PRIMARY KEY (nibrs_crime_against)
);''')

In [26]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_nibrs_categories (
    nibrs_offense_code varchar(10) not null unique,
    nibrs_offense_category_name varchar(50), 
    PRIMARY KEY (nibrs_offense_code)
);''')

In [27]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_nibrs_fbicode_offenses (
    nibrs_offense_code varchar(20) not null unique,
    nibrs_offense_name varchar(100) not null,
    PRIMARY KEY (nibrs_offense_code),
    FOREIGN KEY (nibrs_offense_code) REFERENCES jwj8c8.cc_nibrs_categories (nibrs_offense_code)
);''')

In [28]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_nibrs_offenses_crimes_aginst (
    nibrs_crime_against varchar(20),
    nibrs_offense_code varchar(20),
    PRIMARY KEY (nibrs_crime_against,nibrs_offense_code),
    FOREIGN KEY(nibrs_offense_code) REFERENCES jwj8c8.cc_nibrs_fbicode_offenses(nibrs_offense_code),
    FOREIGN KEY (nibrs_crime_against) REFERENCES jwj8c8.cc_nibrs_crimes_against(nibrs_crime_against)
);''')

In [29]:
with engine.connect() as connection:
    connection.execute('''CREATE TABLE jwj8c8.cc_cases (
    case_number varchar(20) not null unique,
    incident_date timestamp,
    iucr_code varchar(10),
    nibrs_fbi_offense_code varchar(10),
    arrest boolean,
    domestic boolean,
    updated_on timestamp,
    PRIMARY KEY (case_number),
    FOREIGN KEY (iucr_code) REFERENCES jwj8c8.cc_iucr_codes(iucr_code),
    FOREIGN KEY (nibrs_fbi_offense_code) REFERENCES jwj8c8.cc_nibrs_fbicode_offenses (nibrs_offense_code)
);''')

In [30]:
with engine.connect() as connection:
    connection.execute("""CREATE TABLE jwj8c8.cc_case_location (
   case_number varchar(20) not null unique,
   block varchar(100),
   location_description varchar(100),
   community_area integer,
   ward integer,
   district integer,
   beat integer,
   latitude real,
   longitude real, 
   PRIMARY KEY (case_number),
   FOREIGN KEY (case_number) REFERENCES jwj8c8.cc_cases(case_number)
);""")

## Construct and embed your Entity Relationship Diagram

Upload your ERD image to the "final_project" folder and update the markdown below to display it here:

![ERD-HERE](chicagocrimeERD.png)


# Perform the ETL of the source data to your SSO dsa_student Chicago Crime Database

* Establish a connection to the the SQLite source database using sqlAlchemy
* Establish a connection to your SSQ dsa_student postgres server destination database using sqlAlchemy
* Peform ETL of the source data tables to the destination data tables incrementally.
  * You may want to consider using pandas as the medium between the two databases 
     * it can easliy read sql table data
     * hold data in a data frame
     * make any necessary transformations to data values
     * write to sql table data
    

In [1]:
# ETL Here
##establish source connection
import psycopg2
import sqlalchemy
import getpass

source_engine = sqlalchemy.create_engine("sqlite:////dsa/data/DSA-7030/cc0122dbv2.sqlite.db")

source_connection = None

try:
    source_connection = source_engine.connect()
except Excerption as err:
    print("An error has occurred trying to connect: {}".format(err))

In [2]:
##establish destination connection
user = "jwj8c8"
host = "pgsql.dsa.lan"
database = "dsa_student"
password = getpass.getpass()
connectionstring = "postgresql://" + user + ":" + password + "@" + host + "/" + database

engine = sqlalchemy.create_engine(connectionstring)

connection = None

try:
    connection = engine.connect()
except Excerption as err:
    print("An error has occurred trying to connect: {}".format(err))
    
del password

········


In [4]:
import numpy as np
import pandas as pd

In [31]:
##Create cc_nibrs_categories

df = pd.read_sql_table("cc_nibrs_categories",con=source_connection)
df.to_sql("cc_nibrs_categories",
         engine,
         schema=user,
         if_exists= 'append',
         index = False)

In [33]:
## Create cc_nibres_crimes_against

df = pd.read_sql_table("cc_nibrs_crimes_against", con=source_connection)
df.to_sql("cc_nibrs_crimes_against",
         engine,
         schema = user,
         if_exists='append',
         index= False)

In [34]:
## Create cc_nibrs_fbicode_offenses

df = pd.read_sql_table("cc_nibrs_fbicode_offenses", con=source_connection)
df.to_sql("cc_nibrs_fbicode_offenses",
         engine,
         schema = user,
         if_exists='append',
         index= False)

In [35]:
## Create cc_nibrs_offenses_crimes_aginst

df = pd.read_sql_table("cc_nibrs_offenses_crimes_aginst", con=source_connection)
df.to_sql("cc_nibrs_offenses_crimes_aginst",
         engine,
         schema = user,
         if_exists='append',
         index= False)

In [36]:
## Create cc_iucr_codes

df = pd.read_sql_table("cc_iucr_codes", con=source_connection)
df.to_sql("cc_iucr_codes",
         engine,
         schema = user,
         if_exists='append',
         index= False)


In [37]:
##Create cc_iucr_codes_primary_descriptions
df = pd.read_sql_table("cc_iucr_codes_primary_descriptions",con=source_connection)
df.to_sql("cc_iucr_codes_primary_descriptions",
         engine,
         schema=user,
         if_exists='append',
         index=False)

In [38]:
##Create cc_iucr_codes_secondary_descriptions
df = pd.read_sql_table("cc_iucr_codes_secondary_descriptions",con=source_connection)
df.to_sql("cc_iucr_codes_secondary_descriptions",
         engine,
         schema = user,
         if_exists='append',
         index=False)

In [49]:
## Create cc_cases

for df in pd.read_sql_query("SELECT * FROM cc_cases", con=source_connection, chunksize=1000):
     
    df["arrest"] = df["arrest"].astype("bool")
    df["domestic"] = df["domestic"].astype("bool")    
        
    
    df.to_sql("cc_cases",
             engine,
             schema = user,
             if_exists='append',
             index=False)


In [7]:
##Create cc_case_location

for df in pd.read_sql_table("cc_case_location",con=source_connection, chunksize=100000):
    
    df.to_sql("cc_case_location",
             engine,
             schema = user,
             if_exists='append',
             index=False)


# Execute SQL DML commands to confirm the table record counts for the destination database tables are consistent with the source database table record counts

In [12]:
# Confirm counts here
## cc_nibrs_categories

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_categories", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_categories", con= engine.connect()))

   COUNT(*)
0        90
   count
0     90


In [13]:
# Confirm counts here
## cc_nibrs_crimes_against

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_crimes_against", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_crimes_against", con= engine.connect()))

   COUNT(*)
0         4
   count
0      4


In [14]:
# Confirm counts here
##cc_nibrs_fbicode_offenses

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_fbicode_offenses", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_fbicode_offenses", con= engine.connect()))

   COUNT(*)
0        90
   count
0     90


In [15]:
# Confirm counts here
## cc_nibrs_offenses_crimes_aginst

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_offenses_crimes_aginst", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_nibrs_offenses_crimes_aginst", con= engine.connect()))

   COUNT(*)
0        64
   count
0     64


In [16]:
# Confirm counts here
## cc_iucr_codes

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes", con= engine.connect()))

   COUNT(*)
0       520
   count
0    520


In [17]:
# Confirm counts here
## cc_iucr_codes_primary_descriptions

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes_primary_descriptions", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes_primary_descriptions", con= engine.connect()))

   COUNT(*)
0       401
   count
0    401


In [18]:
# Confirm counts here
## cc_iucr_codes_secondary_descriptions

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes_secondary_descriptions", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_iucr_codes_secondary_descriptions", con= engine.connect()))

   COUNT(*)
0       401
   count
0    401


In [19]:
# Confirm counts here
## cc_cases

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_cases", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_cases", con= engine.connect()))

   COUNT(*)
0   7676541
     count
0  7676541


In [8]:
# Confirm counts here
## cc_cases_location

print(pd.read_sql_query("SELECT COUNT(*) FROM cc_case_location", con= source_engine.connect()))
print(pd.read_sql_query("SELECT COUNT(*) FROM cc_case_location", con= engine.connect()))

   COUNT(*)
0   7676541
     count
0  7676541
