## Querying Vancouver crimes

#### **Instructions**

Using the given sqlite3 connection:

* Select `TYPE`, `MONTH`, `DAY` and `NEIGHBOURHOOD` from the `van_crimes` table, but only the crime observations from `Stanley Park` or `West End`. 
* Store the information in a `van_crimes_df` DataFrame.
* Store the count of crimes per `TYPE` in a `crime_types_count` variable.

In [230]:
import pandas as pd
import sqlite3

In [231]:
conn = sqlite3.connect(':memory:') # In-memory database for testing

In [232]:
c = conn.cursor() # creating a cursor object to execute SQL commands

In [233]:
c.executescript(open('files/van_crime_2003.sql', 'r').read()) # restoring the given database dump

<sqlite3.Cursor at 0x1c9b37d4840>

In [234]:
c.description # to check the structure of the table

First we will check the records by executing one query in the table `van_crimes` with the `Cursor` object.

In [235]:
c.execute('SELECT * FROM van_crimes;') # checking the first 5 rows of the table

<sqlite3.Cursor at 0x1c9b37d4840>

In [236]:
results = c.fetchall() # fetching the results  

In [237]:
results

[('Theft from Vehicle',
  2003,
  6,
  28.0,
  13.0,
  30.0,
  '8XX EXPO BLVD',
  'Central Business District',
  491771.63,
  5458295.01),
 ('Theft from Vehicle',
  2003,
  11,
  17.0,
  16.0,
  0.0,
  '56XX OAK ST',
  'South Cambie',
  490682.32,
  5453536.96),
 ('Theft from Vehicle',
  2003,
  12,
  30.0,
  14.0,
  0.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft of Vehicle',
  2003,
  1,
  15.0,
  14.0,
  45.0,
  '6XX W 41ST AVE',
  'Oakridge',
  491372.94,
  5453422.83),
 ('Theft from Vehicle',
  2003,
  12,
  28.0,
  16.0,
  45.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft from Vehicle',
  2003,
  12,
  12.0,
  15.0,
  30.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft from Vehicle',
  2003,
  12,
  12.0,
  13.0,
  0.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft of Vehicle',
  2003,
  1,
  19.0,
  14.0,
  30.0,
  '6XX W 41ST AVE',
  '

Since it returns an list of tuples, let´s convert into a `Dataframe` for better visualization and exploration of the data.

```python

In [238]:
van_crimes_df = pd.DataFrame(data=results, columns=[column[0] for column in c.description]) # creating a DataFrame from the structure of the table with 'c.description'

Let´s select only the columns we are interested in: `TYPE`, `MONTH`, `DAY` and `NEIGHBOURHOOD` from the table `van_crimes` with only the crime observations from `Stanley Park` or `West End`. 

In [239]:
van_crimes_df = van_crimes_df[["TYPE", "MONTH", "DAY", "NEIGHBOURHOOD"]].loc[
    (van_crimes_df["NEIGHBOURHOOD"] == "Stanley Park") | (van_crimes_df["NEIGHBOURHOOD"] == "West End")
]

In [240]:
van_crimes_df

Unnamed: 0,TYPE,MONTH,DAY,NEIGHBOURHOOD
2,Theft from Vehicle,12,30.0,Stanley Park
4,Theft from Vehicle,12,28.0,Stanley Park
5,Theft from Vehicle,12,12.0,Stanley Park
6,Theft from Vehicle,12,12.0,Stanley Park
9,Theft from Vehicle,11,5.0,Stanley Park
...,...,...,...,...
241,Break and Enter Residential/Other,1,23.0,West End
242,Break and Enter Residential/Other,2,2.0,West End
243,Break and Enter Residential/Other,2,12.0,West End
245,Break and Enter Residential/Other,2,15.0,West End


Let´s count the crimes by `TYPE` and store the result in a variable called `crime_types_count`.

In [241]:
crime_types_count = van_crimes_df.loc[:,"TYPE"].value_counts().to_frame()

In [242]:
crime_types_count

Unnamed: 0_level_0,count
TYPE,Unnamed: 1_level_1
Theft from Vehicle,31
Break and Enter Residential/Other,15
Theft of Vehicle,11
Mischief,7


---

### **With the `read_sql_query` method**

With the `read_sql_query` method, we can execute a SQL query and store the result in a **DataFrame**. This method is very useful when we want to work with SQL queries in a more Pythonic way.

We can achieve the same result as above by using the `read_sql_query` method, by executing the SQL Query in the table `van_crimes` table with the `IN` operator to filter the `NEIGHBOURHOOD` column with set of values: `Stanley Park` and `West End`. 

In [243]:
van_crimes_df = pd.read_sql_query('SELECT TYPE, MONTH, DAY, NEIGHBOURHOOD FROM van_crimes WHERE NEIGHBOURHOOD IN ("Stanley Park", "West End")', conn)

In [244]:
van_crimes_df

Unnamed: 0,TYPE,MONTH,DAY,NEIGHBOURHOOD
0,Theft from Vehicle,12,30.0,Stanley Park
1,Theft from Vehicle,12,28.0,Stanley Park
2,Theft from Vehicle,12,12.0,Stanley Park
3,Theft from Vehicle,12,12.0,Stanley Park
4,Theft from Vehicle,11,5.0,Stanley Park
...,...,...,...,...
59,Break and Enter Residential/Other,1,23.0,West End
60,Break and Enter Residential/Other,2,2.0,West End
61,Break and Enter Residential/Other,2,12.0,West End
62,Break and Enter Residential/Other,2,15.0,West End


Finally, we can count the number of crimes by `TYPE` and store the result in a variable called `crime_types_count`, with the `groupby` and `IN` op

In [245]:
crime_types_count = pd.read_sql_query('SELECT TYPE, COUNT(*) as COUNT FROM van_crimes WHERE NEIGHBOURHOOD IN ("Stanley Park", "West End") GROUP BY TYPE', conn).sort_values(ascending=False, by='COUNT')

In [246]:
crime_types_count

Unnamed: 0,TYPE,COUNT
2,Theft from Vehicle,31
0,Break and Enter Residential/Other,15
3,Theft of Vehicle,11
1,Mischief,7


In [247]:
c.close() # closing the cursor
conn.close() # closing the connection

# GOOD PRACTICE: Always close the cursor and connection when done to free up resources.