**[SQL Micro-Course Home Page](https://www.kaggle.com/learn/intro-to-sql)**

---


# Introduction

The first test of your new data exploration skills uses data describing crime in the city of Chicago.

Before you get started, run the following cell. It sets up the automated feedback system to review your answers.

In [18]:
# Set up feedack system
from learntools.core import binder
binder.bind(globals())
from learntools.sql.ex1 import *
print("Setup Complete")

Setup Complete


Use the next code cell to fetch the dataset.

In [19]:
from google.cloud import bigquery

# Create a "Client" object
client = bigquery.Client()

# Construct a reference to the "chicago_crime" dataset
dataset_ref = client.dataset("chicago_crime", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

Using Kaggle's public dataset BigQuery integration.


# Exercises

### 1) Count tables in the dataset

How many tables are in the Chicago Crime dataset?

In [20]:
# Write the code you need here to figure out the answer

In [21]:
#Number of tables in the "chicago_crime" dataset
num_tables = len(list(client.list_tables(dataset)))  # Store the answer as num_tables and then run this cell

q_1.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

For a hint or the solution, uncomment the appropriate line below.

In [22]:
#q_1.hint()
#q_1.solution()

### 2) Explore the table schema

How many columns in the `crime` table have `TIMESTAMP` data?

In [23]:
print(table.schema)

[SchemaField('unique_key', 'INTEGER', 'REQUIRED', 'Unique identifier for the record.', ()), SchemaField('case_number', 'STRING', 'NULLABLE', 'The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.', ()), SchemaField('date', 'TIMESTAMP', 'NULLABLE', 'Date when the incident occurred. this is sometimes a best estimate.', ()), SchemaField('block', 'STRING', 'NULLABLE', 'The partially redacted address where the incident occurred, placing it on the same block as the actual address.', ()), SchemaField('iucr', 'STRING', 'NULLABLE', 'The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.', ()), SchemaField('primary_type', 'STRING', 'NULLABLE', 'The primary description of the IUCR code.', ()), SchemaField('description', 'STRING', 'NULLABLE', 'The secondary description of the IUCR code, a subcategory of the primary description.', (

In [24]:
# Construct a reference to the "full" table
table_ref = dataset_ref.table("crime")

# API request - fetch the table
table = client.get_table(table_ref)

# Number of columns with 'TIMESTAMP' data
num_timestamp_fields = 0
for col in table.schema:
    if col.field_type == 'TIMESTAMP':
        num_timestamp_fields += 1 

q_2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

For a hint or the solution, uncomment the appropriate line below.

In [25]:
q_2.hint()
q_2.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Begin by fetching the `crime` table. Then take a look at the table schema, and check the field type of each column.  How many times does `'TIMESTAMP'` appear?

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python

# Construct a reference to the "crime" table
table_ref = dataset_ref.table("crime")

# API request - fetch the table
table = client.get_table(table_ref)

# Print information on all the columns in the "crime" table in the "chicago_crime" dataset
print(table.schema)

num_timestamp_fields = 2

```

### 3) Create a crime map

If you wanted to create a map with a dot at the location of each crime, what are the names of the two fields you likely need to pull out of the `crime` table to plot the crimes on a map?

In [26]:
print(table.schema)

[SchemaField('unique_key', 'INTEGER', 'REQUIRED', 'Unique identifier for the record.', ()), SchemaField('case_number', 'STRING', 'NULLABLE', 'The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.', ()), SchemaField('date', 'TIMESTAMP', 'NULLABLE', 'Date when the incident occurred. this is sometimes a best estimate.', ()), SchemaField('block', 'STRING', 'NULLABLE', 'The partially redacted address where the incident occurred, placing it on the same block as the actual address.', ()), SchemaField('iucr', 'STRING', 'NULLABLE', 'The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.', ()), SchemaField('primary_type', 'STRING', 'NULLABLE', 'The primary description of the IUCR code.', ()), SchemaField('description', 'STRING', 'NULLABLE', 'The secondary description of the IUCR code, a subcategory of the primary description.', (

In [27]:
# Standard answer
fields_for_plotting = ['latitude', 'longitude']
# Another correct answer
# fields_for_plotting = ['x_coordinate', 'y_coordinate']

q_3.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

For a hint or the solution, uncomment the appropriate line below.

In [28]:
#q_3.hint()
#q_3.solution()

Thinking about the question above, there are a few columns that appear to have geographic data. Look at a few values (with the `list_rows()` command) to see if you can determine their relationship.  Two columns will still be hard to interpret. But it should be obvious how the `location` column relates to `latitude` and `longitude`.

In [29]:
# Scratch space for your code
client.list_rows(table, max_results=5).to_dataframe()
# Location is a tuple,pair of (latitude, longitude)

Unnamed: 0,unique_key,case_number,date,block,iucr,primary_type,description,location_description,arrest,domestic,beat,district,ward,community_area,fbi_code,x_coordinate,y_coordinate,year,updated_on,latitude,longitude,location
0,11737035,JC323893,2019-06-27 06:00:00+00:00,009XX E 132ND ST,915,MOTOR VEHICLE THEFT,"TRUCK, BUS, MOTOR HOME",PARKING LOT/GARAGE(NON.RESID.),False,False,533,5,9,54,07,1184964.0,1818080.0,2019,2019-07-04 16:09:46+00:00,41.655962,-87.598851,"(41.655961775, -87.598851218)"
1,11726456,JC311014,2019-06-18 01:30:00+00:00,049XX N NORDICA AVE,554,ASSAULT,AGG PO HANDS NO/MIN INJURY,SIDEWALK,False,False,1613,16,41,10,08A,1128606.0,1932315.0,2019,2019-06-30 15:56:27+00:00,41.970576,-87.802492,"(41.970575996, -87.802492422)"
2,11714827,JC291836,2019-06-03 20:30:00+00:00,087XX S STONY ISLAND AVE,281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,OTHER,False,False,412,4,8,48,02,1188320.0,1847443.0,2019,2019-06-30 15:56:27+00:00,41.736458,-87.585639,"(41.736458248, -87.585638764)"
3,11712205,JC293497,2019-05-31 11:00:00+00:00,136XX S BRAINARD AVE,1780,OFFENSE INVOLVING CHILDREN,OTHER OFFENSE,RESIDENCE,False,True,433,4,10,55,20,1199956.0,1814972.0,2019,2019-06-30 15:56:27+00:00,41.647069,-87.544099,"(41.647069431, -87.544099244)"
4,11697172,JC275374,2019-05-23 14:50:00+00:00,113XX S FORRESTVILLE AVE,545,ASSAULT,PRO EMP HANDS NO/MIN INJURY,"SCHOOL, PUBLIC, BUILDING",False,False,531,5,9,50,08A,1181944.0,1829952.0,2019,2019-06-30 15:56:27+00:00,41.68861,-87.609537,"(41.688610361, -87.609536712)"


# Keep going

You've looked at the schema, but you haven't yet done anything exciting with the data itself. Things get more interesting when you get to the data, so keep going to **[write your first SQL query](https://www.kaggle.com/dansbecker/select-from-where).**

---
**[SQL Micro-Course Home Page](https://www.kaggle.com/learn/intro-to-sql)**

