**How to Query the Chicago Crime Dataset (BigQuery)**

In [1]:
import bq_helper
from bq_helper import BigQueryHelper
# https://www.kaggle.com/sohier/introduction-to-the-bq-helper-package
chicago_crime = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                   dataset_name="chicago_crime")

In [2]:
bq_assistant = BigQueryHelper("bigquery-public-data", "chicago_crime")
bq_assistant.list_tables()

['crime']

In [3]:
bq_assistant.head("crime", num_rows=3)

Unnamed: 0,unique_key,case_number,date,block,iucr,primary_type,description,location_description,arrest,domestic,...,ward,community_area,fbi_code,x_coordinate,y_coordinate,year,updated_on,latitude,longitude,location
0,6445963,HP518816,2008-08-17 04:00:00+00:00,060XX S NORMANDY AVE,1365,CRIMINAL TRESPASS,TO RESIDENCE,RESIDENCE,False,False,...,23,64,26,1132775.0,1863699.0,2008,2018-02-28 15:56:25+00:00,41.782212,-87.788763,"(41.782211969, -87.788763243)"
1,3563797,HK648922,2004-09-25 03:30:00+00:00,091XX S SAGINAW AVE,497,BATTERY,AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON,RESIDENCE,False,True,...,7,48,04B,1195470.0,1844867.0,2004,2018-02-28 15:56:25+00:00,41.729216,-87.559529,"(41.729216041, -87.559528902)"
2,3024782,HJ730100,2003-08-13 08:45:00+00:00,091XX S PHILLIPS AVE,1242,DECEPTIVE PRACTICE,COMPUTER FRAUD,RESIDENCE,False,False,...,7,48,11,1194092.0,1844784.0,2003,2018-02-10 15:50:01+00:00,41.729022,-87.564579,"(41.729022164, -87.564579481)"


In [4]:
bq_assistant.table_schema("crime")

[SchemaField('unique_key', 'INTEGER', 'REQUIRED', 'Unique identifier for the record.', ()),
 SchemaField('case_number', 'STRING', 'NULLABLE', 'The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.', ()),
 SchemaField('date', 'TIMESTAMP', 'NULLABLE', 'Date when the incident occurred. this is sometimes a best estimate.', ()),
 SchemaField('block', 'STRING', 'NULLABLE', 'The partially redacted address where the incident occurred, placing it on the same block as the actual address.', ()),
 SchemaField('iucr', 'STRING', 'NULLABLE', 'The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.', ()),
 SchemaField('primary_type', 'STRING', 'NULLABLE', 'The primary description of the IUCR code.', ()),
 SchemaField('description', 'STRING', 'NULLABLE', 'The secondary description of the IUCR code, a subcategory of the primary descriptio

What categories of crime exhibited the greatest year-over-year increase between 2015 and 2016?


In [5]:
query1 = """SELECT
  primary_type,
  description,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016,
  FORMAT('%3.2f', (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015)*100) AS pct_change_2015_to_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2015,
    2016)
GROUP BY
  primary_type,
  description
HAVING
  COUNTIF(year = 2015) > 100
ORDER BY
  (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015) DESC
        """
response1 = chicago_crime.query_to_pandas_safe(query1)
response1.head(10)

Unnamed: 0,primary_type,description,arrests_2015,arrests_2016,pct_change_2015_to_2016
0,OTHER OFFENSE,VEHICLE TITLE/REG OFFENSE,288,418,45.14
1,OTHER OFFENSE,FALSE/STOLEN/ALTERED TRP,299,418,39.8
2,HOMICIDE,FIRST DEGREE MURDER,171,230,34.5
3,NARCOTICS,FOUND SUSPECT NARCOTICS,651,842,29.34
4,ASSAULT,AGGRAVATED: HANDGUN,468,523,11.75
5,MOTOR VEHICLE THEFT,AUTOMOBILE,779,864,10.91
6,ROBBERY,ARMED: HANDGUN,231,249,7.79
7,LIQUOR LAW VIOLATION,LIQUOR LICENSE VIOLATION,134,144,7.46
8,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,164,171,4.27
9,OTHER OFFENSE,OTHER VEHICLE OFFENSE,194,202,4.12


Which month generally has the greatest number of motor vehicle thefts?


In [6]:
query2 = """SELECT
  year,
  month,
  incidents
FROM (
  SELECT
    year,
    EXTRACT(MONTH
    FROM
      date) AS month,
    COUNT(1) AS incidents,
    RANK() OVER (PARTITION BY year ORDER BY COUNT(1) DESC) AS ranking
  FROM
    `bigquery-public-data.chicago_crime.crime`
  WHERE
    primary_type = 'MOTOR VEHICLE THEFT'
    AND year <= 2016
  GROUP BY
    year,
    month )
WHERE
  ranking = 1
ORDER BY
  year DESC
        """
response2 = chicago_crime.query_to_pandas_safe(query2)
response2.head(10)

Unnamed: 0,year,month,incidents
0,2016,12,1109
1,2015,8,968
2,2014,10,922
3,2013,1,1470
4,2012,6,1469
5,2011,1,1862
6,2010,12,1880
7,2009,12,1539
8,2008,7,2015
9,2007,10,1709
