## Vancouver crime information

#### **Instructions**

* Store all the crimes committed after 18:00 h in a `late_crimes` variable.
* Store the number of crimes committed on the month with most crimes in a `dangerous_month_crimes` variable.

In [2]:
import pandas as pd
import sqlite3

In [3]:
conn = sqlite3.connect(':memory:') # In-memory database for testing

In [4]:
c = conn.cursor() # creating a cursor object to execute SQL commands

In [5]:
c.executescript(open('files/van_crime_2003.sql', 'r').read()) # restoring the given database dump

<sqlite3.Cursor at 0x23e275a1dc0>

For storing all the crimes commited after 18:00h, first we will check the records by executing one query in the table `van_crimes` with the `Cursor` object.

In [6]:
c.execute('SELECT * FROM van_crimes;') # checking the first 5 rows of the table

<sqlite3.Cursor at 0x23e275a1dc0>

Let´s fetch all the results now, from the previous execution of the SQL Query, with `fetchall()` method. We will store the results in a variable called `results` and print it out to check the records.

In [7]:
results = c.fetchall() # fetching the results

In [8]:
results

[('Theft from Vehicle',
  2003,
  6,
  28.0,
  13.0,
  30.0,
  '8XX EXPO BLVD',
  'Central Business District',
  491771.63,
  5458295.01),
 ('Theft from Vehicle',
  2003,
  11,
  17.0,
  16.0,
  0.0,
  '56XX OAK ST',
  'South Cambie',
  490682.32,
  5453536.96),
 ('Theft from Vehicle',
  2003,
  12,
  30.0,
  14.0,
  0.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft of Vehicle',
  2003,
  1,
  15.0,
  14.0,
  45.0,
  '6XX W 41ST AVE',
  'Oakridge',
  491372.94,
  5453422.83),
 ('Theft from Vehicle',
  2003,
  12,
  28.0,
  16.0,
  45.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft from Vehicle',
  2003,
  12,
  12.0,
  15.0,
  30.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft from Vehicle',
  2003,
  12,
  12.0,
  13.0,
  0.0,
  '85XX STANLEY PARK DR',
  'Stanley Park',
  489104.19,
  5460347.36),
 ('Theft of Vehicle',
  2003,
  1,
  19.0,
  14.0,
  30.0,
  '6XX W 41ST AVE',
  '

Since it returns an list of tuples, let´s convert into a `Dataframe` for better visualization and exploration of the data, we will use `read_sql_query` wich it´s the same as the `Dataframe` class, since will parse the SQL query and return a `Dataframe` object.

In [9]:
van_crimes_df = pd.read_sql_query('SELECT * FROM van_crimes;', conn) # loading the table into a pandas DataFram

In [10]:
van_crimes_df.head() # checking the first 5 rows of the DataFrame

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
0,Theft from Vehicle,2003,6,28.0,13.0,30.0,8XX EXPO BLVD,Central Business District,491771.63,5458295.01
1,Theft from Vehicle,2003,11,17.0,16.0,0.0,56XX OAK ST,South Cambie,490682.32,5453536.96
2,Theft from Vehicle,2003,12,30.0,14.0,0.0,85XX STANLEY PARK DR,Stanley Park,489104.19,5460347.36
3,Theft of Vehicle,2003,1,15.0,14.0,45.0,6XX W 41ST AVE,Oakridge,491372.94,5453422.83
4,Theft from Vehicle,2003,12,28.0,16.0,45.0,85XX STANLEY PARK DR,Stanley Park,489104.19,5460347.36


In [11]:
van_crimes_df.info() # checking the info of the DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   TYPE           250 non-null    object 
 1   YEAR           250 non-null    int64  
 2   MONTH          250 non-null    int64  
 3   DAY            250 non-null    float64
 4   HOUR           245 non-null    float64
 5   MINUTE         245 non-null    float64
 6   HUNDRED_BLOCK  250 non-null    object 
 7   NEIGHBOURHOOD  245 non-null    object 
 8   X              250 non-null    float64
 9   Y              250 non-null    float64
dtypes: float64(5), int64(2), object(3)
memory usage: 19.7+ KB


Let´s filter the records with `late_crimes` variable, with the condition of the `hours` greater than 18:00h, to retrieve all the crimes that happened after 18:00h.

In [12]:
late_crimes = van_crimes_df[van_crimes_df['HOUR'] > 18] 

In [13]:
late_crimes

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
12,Theft of Vehicle,2003,9,2.0,21.0,0.0,20XX E 28TH AVE,Kensington-Cedar Cottage,495267.03,5454779.05
13,Theft from Vehicle,2003,9,27.0,22.0,30.0,85XX STANLEY PARK DR,Stanley Park,489104.19,5460347.36
14,Theft from Vehicle,2003,12,17.0,21.0,0.0,31XX WILLOW ST,Fairview,491115.72,5456039.96
18,Theft from Vehicle,2003,9,1.0,20.0,10.0,85XX STANLEY PARK DR,Stanley Park,489104.19,5460347.36
19,Theft from Vehicle,2003,8,17.0,19.0,0.0,85XX STANLEY PARK DR,Stanley Park,489104.19,5460347.36
29,Theft of Vehicle,2003,1,8.0,23.0,30.0,20XX E 20TH AVE,Kensington-Cedar Cottage,495323.79,5455541.93
31,Mischief,2003,10,9.0,21.0,30.0,9XX BEATTY ST,Central Business District,491591.17,5458195.67
34,Theft from Vehicle,2003,5,21.0,19.0,0.0,56XX OAK ST,South Cambie,490682.32,5453536.96
39,Theft of Vehicle,2003,2,13.0,21.0,45.0,6XX W 41ST AVE,Oakridge,491372.94,5453422.83
41,Theft of Vehicle,2003,12,30.0,21.0,0.0,20XX E 13TH AVE,Kensington-Cedar Cottage,495346.79,5456204.45


Let´s check now the mouth with the most crimes, for that we can check the number of occurences of each month, and store the result in a variable called `dangerous_month_crimes`.

In [14]:
month_crimes = pd.read_sql_query(
    'SELECT MONTH, COUNT(*) AS CRIME_COUNT FROM van_crimes GROUP BY MONTH;',
    conn
    
        )
    

We can check the number of occurences of each month with the `groupby` method, and then we can use the `count` method to count the number of occurences of each month, and store the result in a variable called `dangerous_month_crimes`.

In [15]:
month_crimes

Unnamed: 0,MONTH,CRIME_COUNT
0,1,23
1,2,19
2,3,17
3,4,19
4,5,31
5,6,27
6,7,17
7,8,14
8,9,24
9,10,19


After checking the number of crimes of each mouth in a `Dataframe`, we can now check the month with the most crimes, and store the result in a variable called `dangerous_month_crimes`, and print it out to check the result.

In [16]:
dangerous_month_crimes = month_crimes.sort_values('CRIME_COUNT', ascending=False).head(1)

In [17]:
dangerous_month_crimes

Unnamed: 0,MONTH,CRIME_COUNT
4,5,31


The mouth `5` or `May` is the month with the most crimes.

In [18]:
c.close() # closing the cursor
conn.close() # closing the connection