# [Get the data](#get-the-data)

In [None]:
%load_ext nb_black
%load_ext autoreload
%autoreload 2

In [None]:
from pathlib import Path

import pandas as pd

In [None]:
%aimport src.data_helpers
from src.data_helpers import get_data, get_shapefiles, get_geojson_files

<a id="toc"></a>

## [Table of Contents](#table-of-contents)
0. [About](#about)
1. [User Inputs](#user-inputs)
2. [Get crime listings data](#get-crime-listings-data)
3. [Get weather data](#get-weather-data)
4. [Get Chicago boundary shapefiles](#get-chicago-boundary-shapefiles)

<a id="about"></a>

## 0. [About](#about)

In this notebook, we will retrieve the following data
- crime listing data from the Chicago Open data portal
- weather data for the city of Chicago from the NOAA database for the [O'Hare Airport station in the city](https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094846/detail)
- [boundary files](https://en.wikipedia.org/wiki/GIS_file_formats#Vector) for the city of Chicago, to allow for producing [choropleth maps](https://en.wikipedia.org/wiki/Choropleth_map)

<a id="user-inputs"></a>

## 1. [User Inputs](#user-inputs)

We'll define below the variables and helper functions that are to be used throughout the code.

In [None]:
data_dir = str(Path().cwd() / "data" / "raw")
crime_data_urls = {"2018": "3i3m-jwuy", "2019": "w98m-zvie"}
crime_data_prefix = (
    "https://data.cityofchicago.org/api/views/{}/rows.csv?accessType=DOWNLOAD"
)
shapefiles = {
    "Boundaries - Police Beats (current).zip": "https://data.cityofchicago.org/api/geospatial/aerh-rz74?method=export&format=Shapefile",
    "Boundaries - Community Areas (current).zip": "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=Shapefile",
    "Boundaries - Neighborhoods.zip": "https://data.cityofchicago.org/api/geospatial/bbvz-uum9?method=export&format=Shapefile",
}

geojsonfiles = {
    "Boundaries - Community Areas (current).geojson": "https://data.cityofchicago.org/api/geospatial/bbvz-uum9?method=export&format=GeoJSON",
    "Boundaries - Neighborhoods.geojson": "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON",
    "Boundaries - CPD districts.geojson": "https://data.cityofchicago.org/api/geospatial/7hhi-ktqw?method=export&format=GeoJSON",
}
force_download_crime_data = True
force_download_shape_files = True
force_download_geojson_files = True

In [None]:
data_dir = Path(data_dir)

<a id="get-crime-listings-data"></a>

## 2. [Get crime listings data](#get-crime-listings-data)

We will begin by retrieving crime listings data from the [Chicago open data portal](https://data.cityofchicago.org/browse?limitTo=datasets) for the years [2018](https://data.cityofchicago.org/Public-Safety/Crimes-2018/3i3m-jwuy) and [2019](https://data.cityofchicago.org/Public-Safety/Crimes-2019/w98m-zvie).

In [None]:
for year, id in crime_data_urls.items():
    file_url = crime_data_prefix.format(year, id)
    get_data(
        file_path=data_dir / f"Crimes_-_{year}.csv",
        url=crime_data_prefix.format(id),
        msg=f"Downloading crime data for {year} from {file_url}...",
        force_download=force_download_crime_data,
    )

<a id="get-weather-data"></a>

## 3. [Get weather data](#get-weather-data)

Next, we'll retrieve the weather data from the NOAA website for the city of Chicago as recorded at the Chicago O'Hare weather station.

1. First, we'll load the NOAA page for the [Chicago O'Hare weather station](https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094846/detail)
2. Next, we will retrieve GHCN weather data, starting with the page for the weather station at Chicago O'Hare International Airport.
   - As we are making a (free) purchase from the NOAA website, we first add the station record to our Shopping Cart by clicking the **ADD TO CART** button. Next, we load our shopping cart and, from the drop-down menus and text input box, we will make the following specifications and then click **Continue**
     - Output format
       - select "Custom GHCN-Daily CSV"
     - Date Range (click on the dropdown menu)
       - we want all dates from January 1, 2018 to the curent date of October 12, 2019
3. Next, from the Custom Options available, we'll make the following selections from the listed checkboxes and then click **Continue**
   - Precipitation
   - Air Temperature
   - Wind
   - Weather Type
4. Finally, from the Review Order page, we'll simply enter our email address into the two required fields and click **Submit Order**

<a id="get-chicago-boundary-shapefiles-and-geojson-files"></a>

## 4. [Get Chicago boundary shapefiles and `geojson` files](#get-chicago-boundary-shapefiles-and-geojson-files)

Finally, we'll download various boundary files for the city of Chicago from the city's open data portal.

To do this, we'll loop over all listed boundary files in the `shapefiles` and `geojsonfiles` dictionaries and
1. download the zipped (for `shapefiles`) or `geojson` file
2. (for `shapefiles`) create a dedicated folder for each type of boundary file
3. (for `shapefiles`) [unzip with Python](https://stackoverflow.com/a/3451150/4057186) into a dedicated folder for each type of boundary file

We'll start by retrieving the `shapefiles`

In [None]:
get_shapefiles(
    data_dir=data_dir, shapefiles=shapefiles, force_download=force_download_shape_files
)

Finally, we'll retrieve `geojson` files

In [None]:
get_geojson_files(
    data_dir=data_dir,
    geojsonfiles=geojsonfiles,
    force_download=force_download_geojson_files,
)