# Get Missing Latitudes and Longitudes

In [1]:
%load_ext lab_black
%load_ext autoreload
%autoreload 2

In [2]:
import configparser
from datetime import datetime
from glob import glob

import pandas as pd
from dotenv import find_dotenv, load_dotenv
from sqlalchemy import create_engine

Importing custom functions for geocoding (this function will be explained in detail later in this notebook)

In [3]:
%aimport src.geopy_helpers
from src.geopy_helpers import geocode_missing_lat_lon

In [4]:
# Access `../.env` (Bing Maps API key) as an environment variable
load_dotenv(find_dotenv())

True

## About

This notebook will retrieve missing `latitude` and `longitude` co-ordinates for establishments that were inspected and have missing values in these two columns.

These co-ordinates are needed in order to aggregated statistics (such as crimes committed, population, etc.) the neighbourhood containing each establishment that was inspected (see `4_get_stats_by_neighbourhood.ipynb`). These aggregated counts could then be used as features by an ML model.

## Connect to the MySQL Database

In [5]:
# Access `../sql.ini` (database connection details) as environment variables
config = configparser.ConfigParser()
config.read("../sql.ini")
default_cfg = config["default"]

In [6]:
DB_TYPE = default_cfg["DB_TYPE"]
DB_DRIVER = default_cfg["DB_DRIVER"]
DB_USER = default_cfg["DB_USER"]
DB_PASS = default_cfg["DB_PASS"]
DB_HOST = default_cfg["DB_HOST"]
DB_PORT = default_cfg["DB_PORT"]
DB_NAME = default_cfg["DB_NAME"]

In [7]:
# Connect to all databases (required to perform CRUD operations and submit queries)
URI = f"{DB_TYPE}+{DB_DRIVER}://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

## Connect to the MySQL Database

In [8]:
engine = create_engine(URI)
conn = engine.connect()

## Geocoding Latitude and Longitude

### Add latitude and longitude to Filtered and Aggregated Data

We'll load the transformed (filtered and aggregated) data (which contains one inspection per row)

In [9]:
%%time
df = pd.read_csv(glob("data/processed/filtered_transformed_data__*.csv")[-1])
df

CPU times: user 248 ms, sys: 13.6 ms, total: 262 ms
Wall time: 262 ms


Unnamed: 0,establishment_id,establishmenttype,establishment_address,inspection_id,inspection_date,infractions_summary,num_significant,num_crucial,num_minor,num_infractions,is_crucial
0,1222579,Food Take Out,870 MARKHAM RD,103015258,2013-06-26,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,4,4,8,16,1
1,1222579,Food Take Out,870 MARKHAM RD,103133558,2013-12-20,Food handler fail to wear headgear. Operator f...,0,0,6,6,0
2,1222579,Food Take Out,870 MARKHAM RD,103329697,2014-09-09,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,3,0,12,15,0
3,1222579,Food Take Out,870 MARKHAM RD,103420091,2015-01-08,Operator fail to properly wash equipment. Oper...,3,0,6,9,0
4,1222579,Food Take Out,870 MARKHAM RD,103868579,2016-12-21,Operator fail to properly wash equipment,0,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...
83697,10690581,Restaurant,3560 VICTORIA PARK AVE,104594294,2019-10-22,FAIL TO ENSURE EQUIPMENT SURFACE SANITIZED AS ...,0,0,3,3,0
83698,10690642,Bake Shop,20 ST PATRICK ST,104594681,2019-10-23,FAIL TO PROVIDE THERMOMETER IN REFRIGERATION E...,1,0,0,1,0
83699,10690660,Restaurant,549 BLOOR ST W,104594800,2019-10-23,FAIL TO MAINTAIN HANDWASHING STATIONS (LIQUID ...,1,0,1,2,0
83700,10690679,Food Take Out,1175 ST CLAIR AVE W,104594954,2019-10-23,SANITIZE UTENSILS IN WATER FOR LESS THAN 45 SE...,1,0,0,1,0


**Notes**
1. For each grouping of `establishment_id`, `establishmenttype` and `establishment_address`, we have the aggregated number of each type of infraction and the combined text of the details of all infractions. This dataset was created using a MySQL `GROUP BY` over these three columns in `2_sql_filter_transform.ipynb`.
2. This dataset does not contain the `latitude` and `longitude` column for each inspection. We'll query the MySQL database to get the latitude and longitude for each grouping of `establishment_id`, `establishmenttype` and `establishment_address`. We can then merge that query output with this loaded dataset, on these three columns, and get the corresponding latitude and longitude for each row (inspection) in this data.

Write a MySQL query to get the latitude and longitude for each establishment address in the database

In [10]:
%%time
df_query = pd.read_sql(
    """
    SELECT establishment_id,
           establishmenttype,
           establishment_address,
           MAX(latitude) AS latitude,
           MAX(longitude) AS longitude
    FROM inspections
    GROUP BY establishment_id, establishmenttype, establishment_address
    """,
    con=conn,
)
df_query

CPU times: user 211 ms, sys: 22.6 ms, total: 233 ms
Wall time: 1.75 s


Unnamed: 0,establishment_id,establishmenttype,establishment_address,latitude,longitude
0,1222579,Food Take Out,870 MARKHAM RD,43.7680,-79.2290
1,1222580,Supermarket,1550 JANE ST,,
2,1222807,Restaurant,1635 LAWRENCE AVE W,43.7046,-79.4922
3,1223056,Restaurant,606 BROWNS LINE,43.6053,-79.5473
4,1223438,Food Take Out,500 REXDALE BLVD,43.7204,-79.6001
...,...,...,...,...,...
30285,10690642,Bake Shop,20 ST PATRICK ST,43.6509,-79.3890
30286,10690660,Restaurant,549 BLOOR ST W,43.6652,-79.4102
30287,10690679,Food Take Out,1175 ST CLAIR AVE W,43.6777,-79.4434
30288,10690680,Food Store (Convenience / Variety),155 WELLINGTON ST W,43.6458,-79.3858


**Notes**
1. This output has a single combination of `latitude` and `longitude` for each grouping of `establishment_id`, `establishmenttype` and `establishment_address`.

We will now merge these two datasets (`df_query_output` and `df`) using the `establishment_id`, `establishmenttype` and `establishment_address` columns in order to get the `latitude` and `longitude` for each row of the transformed data (`df`)

In [11]:
%%time
df_with_lat_lon = df.merge(
    df_query,
    on=["establishment_id", "establishmenttype", "establishment_address"],
    how="left",
)
df_with_lat_lon

CPU times: user 30.6 ms, sys: 6.12 ms, total: 36.7 ms
Wall time: 36 ms


Unnamed: 0,establishment_id,establishmenttype,establishment_address,inspection_id,inspection_date,infractions_summary,num_significant,num_crucial,num_minor,num_infractions,is_crucial,latitude,longitude
0,1222579,Food Take Out,870 MARKHAM RD,103015258,2013-06-26,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,4,4,8,16,1,43.7680,-79.2290
1,1222579,Food Take Out,870 MARKHAM RD,103133558,2013-12-20,Food handler fail to wear headgear. Operator f...,0,0,6,6,0,43.7680,-79.2290
2,1222579,Food Take Out,870 MARKHAM RD,103329697,2014-09-09,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,3,0,12,15,0,43.7680,-79.2290
3,1222579,Food Take Out,870 MARKHAM RD,103420091,2015-01-08,Operator fail to properly wash equipment. Oper...,3,0,6,9,0,43.7680,-79.2290
4,1222579,Food Take Out,870 MARKHAM RD,103868579,2016-12-21,Operator fail to properly wash equipment,0,0,1,1,0,43.7680,-79.2290
...,...,...,...,...,...,...,...,...,...,...,...,...,...
83697,10690581,Restaurant,3560 VICTORIA PARK AVE,104594294,2019-10-22,FAIL TO ENSURE EQUIPMENT SURFACE SANITIZED AS ...,0,0,3,3,0,43.8060,-79.3375
83698,10690642,Bake Shop,20 ST PATRICK ST,104594681,2019-10-23,FAIL TO PROVIDE THERMOMETER IN REFRIGERATION E...,1,0,0,1,0,43.6509,-79.3890
83699,10690660,Restaurant,549 BLOOR ST W,104594800,2019-10-23,FAIL TO MAINTAIN HANDWASHING STATIONS (LIQUID ...,1,0,1,2,0,43.6652,-79.4102
83700,10690679,Food Take Out,1175 ST CLAIR AVE W,104594954,2019-10-23,SANITIZE UTENSILS IN WATER FOR LESS THAN 45 SE...,1,0,0,1,0,43.6777,-79.4434


### Get Addresses with a Missing Latitude or Longitude

Some but not all inspections are missing information in these two columns. Instead of geocoding all the addresses in the above data (which will involve unnecessary calls to an API), we will now get the unique addresses for which the `latitude` and `longitude` are missing

In [12]:
df_addr_lat_lon = (
    df_with_lat_lon.query("latitude.isnull() | longitude.isnull()")
    .groupby("establishment_address", as_index=False)[["latitude", "longitude"]]
    .max()
)
df_addr_lat_lon

Unnamed: 0,establishment_address,latitude,longitude
0,1 BALMORAL AVE,,
1,1 BAXTER ST,,
2,1 BLUE JAYS WAY,,
3,1 BYNG AVE,,
4,1 CENTRE ISLAND PK,,
...,...,...,...
3989,995 PAPE AVE,,
3990,996 QUEEN ST W,,
3991,997 EGLINTON AVE W,,
3992,998 ST CLAIR AVE W,,


**Notes**
1. We only need to geocode these addresses and then join back with `df_with_lat_lon` in order to fill in the missing `latitude`s and `longitude`s there.

The addresses above are missing the name of the city, province and country, which are needed to allow for accurate geocoding. We'll now append a suffix to the `establishment_address` column with this information

In [13]:
unique_addresses_missing_lat_lon = (
    df_addr_lat_lon["establishment_address"].str.title() + ", Toronto, ON, Canada"
)
unique_addresses_missing_lat_lon.rename("address").to_frame()

Unnamed: 0,address
0,"1 Balmoral Ave, Toronto, ON, Canada"
1,"1 Baxter St, Toronto, ON, Canada"
2,"1 Blue Jays Way, Toronto, ON, Canada"
3,"1 Byng Ave, Toronto, ON, Canada"
4,"1 Centre Island Pk, Toronto, ON, Canada"
...,...
3989,"995 Pape Ave, Toronto, ON, Canada"
3990,"996 Queen St W, Toronto, ON, Canada"
3991,"997 Eglinton Ave W, Toronto, ON, Canada"
3992,"998 St Clair Ave W, Toronto, ON, Canada"


### Prepare Database Table to Append Geocoded Data

The geocoded data will be stored locally in a database. We'll now create the `addressinfo` table in the `dinesafe` database

In [14]:
# _ = conn.execute("DROP TABLE IF EXISTS addressinfo")

In [15]:
# create_table_query = """
#                      CREATE TABLE IF NOT EXISTS addressinfo (
#                          address TEXT,
#                          neighbourhood TEXT,
#                          locality TEXT,
#                          formattedAddress TEXT,
#                          postalCode TEXT,
#                          latitude FLOAT,
#                          longitude FLOAT
#                      )
#                      """
# _ = conn.execute(create_table_query)

Close database connection and dispose the SQLAlchemy engine

In [16]:
conn.close()
engine.dispose()

**Note**
1. Geocoding is done with the Bing Maps API. Per [Bing Maps FAQ](https://www.microsoft.com/en-us/maps/faq/) (see *What is the policy on caching data?*), the geocoded attributes will **only** be stored locally in this database so they can be used in this analysis. After completion of this analysis, the entire database with the geocoded data will be deleted. Geocoded data will not be stored elsewhere.

### Geocode Addresses

Next, the addresses with a missing `latitude` or `longitude` will be geocoded using the `geopy` Python library with the [Bing Geocoder](https://geopy.readthedocs.io/en/stable/#bing). This is done using a helper function ` geocode_missing_lat_lon()` from `src.geopy_helpers.py` - the full contents of the code in this helper function are shown below

```python
import os
from random import randint
from time import sleep

import pandas as pd
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Bing
from sqlalchemy import create_engine


def run_bing_geocoder(row_number, street_address):
    """Geocode a single street addresses."""
    # Set up the Bing Geocoder
    geolocator = Bing(os.getenv("BING_MAPS_KEY"))

    # Perform geocoding
    try:
        # Geocode a single street address
        location = geolocator.geocode(
            street_address, include_neighborhood=True, exactly_one=True
        )
        # Get the street address key from the .raw attribute of the geocoded
        # output
        address_components = location.raw["address"]
        # Get the neighbourhood (if available)
        neighbourhood = (
            address_components["neighborhood"]
            if "neighborhood" in list(address_components)
            else None
        )
        # Get the locality (if available)
        locality = address_components["locality"]
        # Get the latitude and longitude coordinates
        lat, lon = location.raw["point"]["coordinates"]
        # Store geocoded output in a dictionary
        record = {
            "address": street_address,
            "neighbourhood": neighbourhood,
            "locality": locality,
            "formattedAddress": address_components["formattedAddress"]
            if "formattedAddress" in address_components
            else None,
            "postalCode": address_components["postalCode"]
            if "postalCode" in address_components
            else None,
            "latitude": lat,
            "longitude": lon,
        }
        print(f"{row_number}: Geocode completed for {street_address}", end="")
    except GeocoderTimedOut as e:
        # If geocoding did not work, create dictionary with None for each key
        # in the dictionary where geocoding was successful
        print(
            "{} - Error: geocode failed on input {} with message {}".format(
                row_number, street_address, e.message
            )
        )
        record = {
            "address": street_address,
            "neighbourhood": None,
            "locality": None,
            "formattedAddress": None,
            "postalCode": None,
            "latitude": None,
            "longitude": None,
        }
    return record


def geocode_missing_lat_lon(
    unique_addresses_missing_lat_lon,
    db_table_name=None,
    uri=None,
    min_delay_seconds=5,
    max_delay_seconds=10,
):
    """Geocode a column with one or more street addresses."""
    engine = create_engine(uri)
    conn = engine.connect()
    # Iterate over all street addresses to be geocoded
    for row_num, street_address in unique_addresses_missing_lat_lon.items():
        # Clean the street address
        street_address_clean = street_address.replace("'", "\\'")
        # Query local database for existing record with street address
        df_query = pd.read_sql(
            f"""
            SELECT COUNT(*) AS num_matching_street_addresses
            FROM {db_table_name}
            WHERE address = '{street_address_clean}'
            """,
            con=conn,
        )
        # If geocoded output is not available in local database, then preform
        # geocoding for street address
        if df_query["num_matching_street_addresses"].iloc[0] == 0:
            # Geocode
            geocoded_output = run_bing_geocoder(row_num, street_address)
            # Pause
            print("...Pausing...", end="")
            sleep(randint(min_delay_seconds, max_delay_seconds))
            print("Done.")
            # Convert dictionary of geocoded outputs to DataFrame
            df_geocoded = pd.DataFrame.from_dict(
                geocoded_output, orient="index"
            ).T.astype({"latitude": float, "longitude": float})
            # Append DataFrame of geocoded outputs to database
            df_geocoded.to_sql(
                name=db_table_name, con=conn, index=False, if_exists="append"
            )
        else:
            # If geocoded output is available in local database, then do not
            # geocode the same street address
            print(
                f"{row_num}: Found existing record for {street_address}. "
                "Did nothing."
            )
    conn.close()
    engine.dispose()
```

**Notes**
1. `geocode_missing_lat_lon()` iterates over every unique address to be geocoded and `run_bing_geocoder()` performs the geocoding returing a dictionary of location attributes including the latitude and longitude. `geocode_missing_lat_lon()` accumulates each returned dictionary (one per address that was geocoded) into a list, creates a `DataFrame` from this list of dicts and appends the `DataFrame` to a table in the local `dinesafe` database. If an address has been previously geocoded, then the `run_bing_geocoder()` helper function will skip the re-geocoding of this address in order to prevent unnecessary calls to the Bing Maps API.

In [17]:
%%time
geocode_missing_lat_lon(unique_addresses_missing_lat_lon, "addressinfo", URI, 1, 3)

0: Found existing record for 1 Balmoral Ave, Toronto, ON, Canada. Did nothing.
1: Found existing record for 1 Baxter St, Toronto, ON, Canada. Did nothing.
2: Found existing record for 1 Blue Jays Way, Toronto, ON, Canada. Did nothing.
3: Found existing record for 1 Byng Ave, Toronto, ON, Canada. Did nothing.
4: Found existing record for 1 Centre Island Pk, Toronto, ON, Canada. Did nothing.
5: Found existing record for 1 De Boers Dr, Toronto, ON, Canada. Did nothing.
6: Found existing record for 1 Dundas St W, Toronto, ON, Canada. Did nothing.
7: Found existing record for 1 Eastdale Ave, Toronto, ON, Canada. Did nothing.
8: Found existing record for 1 Eglinton Sq, Toronto, ON, Canada. Did nothing.
9: Found existing record for 1 Ellesmere Rd, Toronto, ON, Canada. Did nothing.
10: Found existing record for 1 Harbour Sq, Toronto, ON, Canada. Did nothing.
11: Found existing record for 1 High Meadow Pl, Toronto, ON, Canada. Did nothing.
12: Found existing record for 1 Mount Pleasant Rd, Toro

Connect to the MySQL database

In [18]:
engine = create_engine(URI)
conn = engine.connect()

Show all records where the geocode did not retrieve any of the requested attributes

In [19]:
%%time
df_query = pd.read_sql(
    """
    SELECT *
    FROM addressinfo
    WHERE postalCode IS NULL
    OR locality IS NULL
    OR formattedAddress IS NULL
    OR latitude IS NULL
    OR longitude IS NULL
    """,
    con=conn,
)
df_query

CPU times: user 0 ns, sys: 2.17 ms, total: 2.17 ms
Wall time: 4.5 ms


Unnamed: 0,address,neighbourhood,locality,formattedAddress,postalCode,latitude,longitude


**Observations**
1. There are no records in this table with a missing value in any of the specified geocoding attribute columns.

**Notes**
1. During the initial run of geocoding, a few addresses could not be geocoded completely and so had to retried using the steps below (see the comments for explanatory details)
   ```python
   # 0. Create list of incompletely geocoded addresses
   incomplete_addresses = [
       '1922 Queen St E, Toronto, ON, Canada'
   ]

   # 1. Delete rows from database table with incompletely geocoded addresses
   for incomplete_address in incomplete_addresses:
       _ = conn.execute(
           f"DELETE FROM addressinfo WHERE address = '{incomplete_address}'"
       )

   # 2. Create a new data structure with addresses for which geocoding will be retried
   unique_addresses_missing_lat_lon = pd.Series(incomplete_addresses)
   print(unique_addresses_missing_lat_lon)
   > 0          1922 Queen St E, Toronto, ON, Canada
     dtype: object

   # 3. Re-run geocoding
   geocode_missing_lat_lon(unique_addresses_missing_lat_lon, "addressinfo", URI, 1, 3)
   ```

## Replace Missing Latitude and Longitude with Geocoded Values

Finally, we can replace the missing `latitude` and `longitude` with the values retrieved from geocoding.

First, we'll query the database table with the geocoding records to get the unique geocoded addresses and their latitude and longitude

In [20]:
%%time
df_query = pd.read_sql(
    """
    SELECT UCASE(REPLACE(address, ', Toronto, ON, Canada', '')) AS establishment_address,
           latitude AS latitude_geo,
           longitude AS longitude_geo
    FROM addressinfo
    """,
    con=conn,
)
df_query

CPU times: user 55.6 ms, sys: 0 ns, total: 55.6 ms
Wall time: 56.5 ms


Unnamed: 0,establishment_address,latitude_geo,longitude_geo
0,1 BALMORAL AVE,43.6856,-79.3932
1,1 BAXTER ST,43.6753,-79.3886
2,1 BLUE JAYS WAY,43.6417,-79.3892
3,1 BYNG AVE,43.7766,-79.4141
4,1 CENTRE ISLAND PK,43.6152,-79.3773
...,...,...,...
3989,996 QUEEN ST W,43.6444,-79.4185
3990,997 EGLINTON AVE W,43.6996,-79.4313
3991,998 ST CLAIR AVE W,43.6795,-79.4372
3992,999 EGLINTON AVE W,43.6996,-79.4314


**Notes**
1. The `latitude` and `longitude` columns contain the suffix `_geo` to indicate they came from geocoding.

Now, we'll merge this with the transformed data containing the latitude and longitude columns (`df_with_lat_lon`)

In [21]:
df_with_lat_lon_filled = df_with_lat_lon.merge(
    df_query, on=["establishment_address"], how="left"
)

We'll now replace missing values in the `latitude` and `longitude` columns with the geocoded values (respective columns ending with the suffix `_geo`)

In [22]:
df_with_lat_lon_filled["latitude"] = df_with_lat_lon_filled["latitude"].fillna(
    df_with_lat_lon_filled["latitude_geo"]
)
df_with_lat_lon_filled["longitude"] = df_with_lat_lon_filled["longitude"].fillna(
    df_with_lat_lon_filled["longitude_geo"]
)

Now, we will drop the unwanted geocoded `latitude` and `longitude` columns (ending with the suffix `_geo`)

In [23]:
df_with_lat_lon_filled = df_with_lat_lon_filled.drop(
    columns=["latitude_geo", "longitude_geo"]
)

As we can see, there are now no missing values in the `latitude` and `longitude` columns

In [24]:
display(df_with_lat_lon_filled)
display(df_with_lat_lon_filled.isna().sum().rename("missing_values").to_frame())

Unnamed: 0,establishment_id,establishmenttype,establishment_address,inspection_id,inspection_date,infractions_summary,num_significant,num_crucial,num_minor,num_infractions,is_crucial,latitude,longitude
0,1222579,Food Take Out,870 MARKHAM RD,103015258,2013-06-26,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,4,4,8,16,1,43.7680,-79.2290
1,1222579,Food Take Out,870 MARKHAM RD,103133558,2013-12-20,Food handler fail to wear headgear. Operator f...,0,0,6,6,0,43.7680,-79.2290
2,1222579,Food Take Out,870 MARKHAM RD,103329697,2014-09-09,FAIL TO PROVIDE TOWELS IN FOOD PREPARATION ARE...,3,0,12,15,0,43.7680,-79.2290
3,1222579,Food Take Out,870 MARKHAM RD,103420091,2015-01-08,Operator fail to properly wash equipment. Oper...,3,0,6,9,0,43.7680,-79.2290
4,1222579,Food Take Out,870 MARKHAM RD,103868579,2016-12-21,Operator fail to properly wash equipment,0,0,1,1,0,43.7680,-79.2290
...,...,...,...,...,...,...,...,...,...,...,...,...,...
83697,10690581,Restaurant,3560 VICTORIA PARK AVE,104594294,2019-10-22,FAIL TO ENSURE EQUIPMENT SURFACE SANITIZED AS ...,0,0,3,3,0,43.8060,-79.3375
83698,10690642,Bake Shop,20 ST PATRICK ST,104594681,2019-10-23,FAIL TO PROVIDE THERMOMETER IN REFRIGERATION E...,1,0,0,1,0,43.6509,-79.3890
83699,10690660,Restaurant,549 BLOOR ST W,104594800,2019-10-23,FAIL TO MAINTAIN HANDWASHING STATIONS (LIQUID ...,1,0,1,2,0,43.6652,-79.4102
83700,10690679,Food Take Out,1175 ST CLAIR AVE W,104594954,2019-10-23,SANITIZE UTENSILS IN WATER FOR LESS THAN 45 SE...,1,0,0,1,0,43.6777,-79.4434


Unnamed: 0,missing_values
establishment_id,0
establishmenttype,0
establishment_address,0
inspection_id,0
inspection_date,0
infractions_summary,0
num_significant,0
num_crucial,0
num_minor,0
num_infractions,0


We'll now export this to a CSV file so that we have access to the transformed data, with latitude and longitude columns that don't contain missing values, for further analysis

In [25]:
%%time
time_now  = datetime.now().strftime('%Y%m%d_%H%M%S')
df_with_lat_lon_filled.to_csv(f"data/processed/filtered_transformed_filledmissing_data__{time_now}.csv", index=False)

CPU times: user 594 ms, sys: 15.9 ms, total: 610 ms
Wall time: 611 ms


In the next notebook (`4_get_stats_by_neighbourhood.ipynb`), we will
- use the `geopandas` library to determine the name of the neighbourhood containing each establishment in the above exported inspections data
- aggregate population, crimes and land area by neighbourhood and append these columns of aggregated counts to each inspection

## Disconnect from the MySQL Database

Close database connection and dispose the SQLAlchemy engine

In [26]:
conn.close()
engine.dispose()