# Kaggle Intro to SQL (and BigQuery)
- https://www.kaggle.com/learn/intro-to-sql

## 1. Exercise: Getting Started With SQL and BigQuery
- Learn the workflow for handling big datasets with BigQuery and SQL

### Intro
- The first test of your new data exploration skills uses data describing crime in the city of Chicago.

In [20]:
### To fetch the dataset (in dataset var)
from google.cloud import bigquery

# Create a 'Client' object
client = bigquery.Client('jmpro2')

# Construct a reference to the 'chicago_crime' dataset
dataset_ref = client.dataset('chicago_crime', project='bigquery-public-data')

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

### Ex. 1) Count tables in the dataset

In [21]:
# List of all the tables in the "hacker_news" dataset
tables = list(client.list_tables(dataset))
print(len(tables))
for tbl in tables:
    print(tbl.table_id)

1
crime


In [22]:
num_tables = len(list(client.list_tables(dataset)))
num_tables

1

### Ex. 2) Explore the table schema
- How many columns in the `crime` table have `TIMESTAMP` data?

In [23]:
# Construct a reference to the "crime" table
table_ref = dataset_ref.table("crime")
# API request - fetch the table
table = client.get_table(table_ref)
print(len(table.schema))
table.schema

22


[SchemaField('unique_key', 'INTEGER', 'REQUIRED', None, (), None),
 SchemaField('case_number', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('date', 'TIMESTAMP', 'NULLABLE', None, (), None),
 SchemaField('block', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('iucr', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('primary_type', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('description', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('location_description', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('arrest', 'BOOLEAN', 'NULLABLE', None, (), None),
 SchemaField('domestic', 'BOOLEAN', 'NULLABLE', None, (), None),
 SchemaField('beat', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('district', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('ward', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('community_area', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('fbi_code', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('x_coord

In [24]:
num_timestamp_fields = 2

### Ex. 3) Create a crime map
- If you wanted to create a map with a dot at the location of each crime, what are the names of the two fields you likely need to pull out of the crime table to plot the crimes on a map?

In [25]:
fields_for_plotting = ['latitude', 'longitude']

Thinking about the question above, there are a few columns that appear to have geographic data. Look at a few values (with the list_rows() command) to see if you can determine their relationship. Two columns will still be hard to interpret. But it should be obvious how the location column relates to latitude and longitude.

In [26]:
client.list_rows(table, max_results=5).to_dataframe()

  client.list_rows(table, max_results=5).to_dataframe()


Unnamed: 0,unique_key,case_number,date,block,iucr,primary_type,description,location_description,arrest,domestic,...,ward,community_area,fbi_code,x_coordinate,y_coordinate,year,updated_on,latitude,longitude,location
0,26735,JF225985,2022-04-30 11:19:00+00:00,005XX E 106TH ST,110,HOMICIDE,FIRST DEGREE MURDER,ALLEY,True,False,...,9.0,49.0,01A,1181917.0,1834793.0,2022,2022-09-18 04:45:51+00:00,41.701895,-87.609487,"(41.701895341, -87.60948662)"
1,10524241,HZ266809,2016-04-17 07:15:00+00:00,103XX S STATE ST,263,CRIM SEXUAL ASSAULT,AGGRAVATED: KNIFE/CUT INSTR,STREET,False,False,...,34.0,49.0,02,1178071.0,1836598.0,2016,2018-02-10 03:50:01+00:00,41.706936,-87.623515,"(41.706936355, -87.623514952)"
2,5393782,HN234262,2007-03-18 06:00:00+00:00,003XX W 106TH ST,265,CRIM SEXUAL ASSAULT,AGGRAVATED: OTHER,RESIDENCE,False,False,...,34.0,49.0,02,1175835.0,1834633.0,2007,2018-02-28 03:56:25+00:00,41.701594,-87.631762,"(41.701594397, -87.63176174)"
3,11297090,JB239539,2018-04-21 10:00:00+00:00,009XX E 104TH ST,281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,True,...,9.0,50.0,02,1184238.0,1836191.0,2018,2018-05-04 03:51:04+00:00,41.705678,-87.600944,"(41.705677782, -87.600944364)"
4,1895933,G749374,2001-12-15 05:48:42+00:00,009XX E 104 ST,312,ROBBERY,ARMED:KNIFE/CUTTING INSTRUMENT,VEHICLE NON-COMMERCIAL,False,False,...,,,03,1184202.0,1836150.0,2001,2015-08-17 03:03:40+00:00,41.705566,-87.601077,"(41.705566113, -87.601077468)"
