This analysis involves 3 datasets for the city of Chicago obtained from the Chicago Data Portal:

1. Chicago Socioeconomic Indicators

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” by Chicago community area, for the years 2008 – 2012.

2. Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year.

3. Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

The datasets are on Db2 Cloud

Connect to the database

In [None]:
 %reload_ext sql

In [None]:
# Remember the connection string is of the format:
%sql ibm_db_sa://xxxxxxxxxxxxxxxxxxxx:xxx/BLUDB
# Enter the connection string for your Db2 on Cloud database instance below
# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.


In [None]:
import ibm_db

In [None]:
import pandas
import ibm_db_dbi

In [None]:
#Replace the placeholder values with the actuals for your Db2 Service Credentials
dsn_driver = "IBM DB2 ODBC DRIVER"
dsn_database = "BLUDB"            # e.g. "BLUDB"
dsn_hostname = ""  # e.g.: "dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net"
dsn_port = ""                    # e.g. "50000" 
dsn_protocol = ""            # i.e. "TCPIP"
dsn_uid = ""                 # e.g. "abc12345"
dsn_pwd = ""                 # e.g. "7dBZ3wWt9XN6$o0J"

In [None]:
#Create database connection
#DO NOT MODIFY THIS CELL. Just RUN it with Shift + Enter
dsn = (
    "DRIVER={0};"
    "DATABASE={1};"
    "HOSTNAME={2};"
    "PORT={3};"
    "PROTOCOL={4};"
    "UID={5};"
    "PWD={6};").format(dsn_driver, dsn_database, dsn_hostname, dsn_port, dsn_protocol, dsn_uid, dsn_pwd)

try:
    conn = ibm_db.connect(dsn, "", "")
    print ("Connected to database: ", dsn_database, "as user: ", dsn_uid, "on host: ", dsn_hostname)

except:
    print ("Unable to connect: ", ibm_db.conn_errormsg() )


In [None]:
#connection for pandas
pconn = ibm_db_dbi.Connection(conn)

In [None]:
#query statement to retrieve all rows in CENSUS_DATA table
selectQuery = "select * from CENSUS_DATA"

#retrieve the query results into a pandas dataframe
df_CENSUS_DATA = pandas.read_sql(selectQuery, pconn)

#print the entire data frame
df_CENSUS_DATA

In [None]:
#query statement to retrieve all rows in CENSUS_DATA table
selectQuery = "select * from CHICAGO_PUBLIC_SCHOOLS"

#retrieve the query results into a pandas dataframe
df_CHICAGO_PUBLIC_SCHOOLS = pandas.read_sql(selectQuery, pconn)

#print the entire data frame
df_CHICAGO_PUBLIC_SCHOOLS

In [None]:
#query statement to retrieve all rows in CENSUS_DATA table
selectQuery = "select * from CHICAGO_CRIME_DATA"

#retrieve the query results into a pandas dataframe
df_CHICAGO_CRIME_DATA = pandas.read_sql(selectQuery, pconn)

#print the entire data frame
df_CHICAGO_CRIME_DATA

In [None]:
 %reload_ext sql

Analysis

In [None]:
# 1: Find the total number of crimes recorded in the crime table.
%sql SELECT COUNT(*) FROM CHICAGO_CRIME_DATA;

In [None]:
# 2: Retrieve first 10 rows from the CRIME table.
%sql SELECT * FROM  CHICAGO_CRIME_DATA limit 10;

In [None]:
# 3: How many crimes involve an arrest
%sql SELECT COUNT(arrest) FROM CHICAGO_CRIME_DATA where arrest = true;

In [None]:
# 4: Which unique types of crimes (e.g. THEFT) have been recorded at a GAS STATION locations?
%sql SELECT location_description, primary_type FROM CHICAGO_CRIME_DATA where location_description = 'GAS STATION' and primary_type = 'THEFT';


In [None]:
# 5: In the CENUS_DATA table list all community areas whose names start with the letter ‘B’.
%sql SELECT community_area_name FROM CENSUS_DATA where community_area_name like 'B%';

In [None]:
# 6: List the schools in community areas 10 to 15 that are healthy school certified.
%sql SELECT NAME_OF_SCHOOL, HEALTHY_SCHOOL_CERTIFIED, COMMUNITY_AREA_NUMBER FROM CHICAGO_PUBLIC_SCHOOLS where \
HEALTHY_SCHOOL_CERTIFIED = 'Yes' and COMMUNITY_AREA_NUMBER between 10 and 15;

In [None]:
# 7: What is the average school Safety Score?
%sql SELECT NAME_OF_SCHOOL, SAFETY_SCORE, HEALTHY_SCHOOL_CERTIFIED, COMMUNITY_AREA_NUMBER from CHICAGO_PUBLIC_SCHOOLS where \
HEALTHY_SCHOOL_CERTIFIED = 'Yes' and COMMUNITY_AREA_NUMBER between 10 and 15;

In [None]:
# 8: Find the top 5 Community Areas by average College Enrollment [number of students].
%sql SELECT distinct COMMUNITY_AREA_NAME, COLLEGE_ENROLLMENT from CHICAGO_PUBLIC_SCHOOLS order by COLLEGE_ENROLLMENT desc limit 5;

In [None]:
# 9: Use a sub-query to determine which Community Area has the least value for school Safety Score?
%sql SELECT COMMUNITY_AREA_NAME, SAFETY_SCORE from CHICAGO_PUBLIC_SCHOOLS where SAFETY_SCORE = \
(select MIN(SAFETY_SCORE) from CHICAGO_PUBLIC_SCHOOLS);

In [None]:
# 10: [Without using an explicit JOIN operator] Find the Per Capita Income of the Community Area which has a school Safety Score of 1.
%sql SELECT community_area_name, per_capita_income from CENSUS_DATA where community_area_name = 'Washington Park';


In [None]:
# Close connection
ibm_db.close(conn)

OBS: use JNB!