# Prepare Data for Analysis and Visualization 
- Download data sources 
- Clean data 
- Integrate into final dataset for exploration

In [1]:
import sys
import os
import pandas as pd
import time

In [2]:
# map this file so we can import the modules 
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [4]:
from wacollisions.read_clean_integrate_data import *

## Collision Data 
- The collision data for Seattle is from [Seattle GIS Open Data](https://data-seattlecitygis.opendata.arcgis.com/datasets/collisions/data?geometry=-122.526%2C47.676%2C-122.198%2C47.717).
- The data is farily clean from the original source. We added some indicator variables for our visualizations. We decreased the number of columns in order to decrease the size of the data. 
- The following code will download the data from Seattle Open Data and then write it to the wa_collisions/wacollisions/data folder. The Collisions.csv is part of the .gitignore and not included in the repository. 
- We do include a test file: wa_collisions/wacollisions/Collisions_test.csv that is a subset of the file downloaded from Seattle Open Data. This can be used for testing purposes. 

In [5]:
# download the data and same to the data folder
collision_data_download = pd.read_csv('https://opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv')
collision_path = '../wa_collisions/data/Collisions.csv'
collision_data_download.to_csv(collision_path)

KeyboardInterrupt: 

## Seattle Neighborhood Data 
- Download Seattle neighborhood shape file from https://data.seattle.gov/dataset/Neighborhoods/2mbt-aqqx in the form of a zip file.
- Use https://ogre.adc4gis.com/ to convert the downloaded zip file into a geojson file.
- Save the .json file as 'wacollisions/data/Neighborhoods/Neighborhoods.json'

*Note: This github contains the converted json file already.*

In [None]:
geo_json_path = '../wacollisions/data/Neighborhoods/Neighborhoods.json'

## Weather Data 
- The weather data is sourced from [Iowa State University ASOS](https://mesonet.agron.iastate.edu/request/download.phtml?network=WA\_ASOS) ('automated airport weather observations'). 
- Download the data from Iowa State using the following prompts:
    - Select the Network "Washington ASOS"
    - Select data from the following 4 airport weather stations near the city of Seattle:
         1. [BFI] SEATTLE/BOEING FIELD
         2. [PAE] EVERETT/PAINE FIELD
         3. [RNT] RENTION MUNICIPAL
         4. [SEA] SEATTLE-TACOMA INTL
    - Select all available attributes
    - Select date range of data: Jan 1, 2014 - May 14, 2018
    - Select timezone: America/Los_Angeles(WST/WDT)
    - Select download options: 
         1. Comma Delimited format
         2. Include Latitude and Longitude
         3. Save result data to file on computer
    - Select Limit Report Types to include all types
    - Download the data
- We saved the data "asos.txt" in the wa_collisions/wacollisions/data folder. The "asos.txt" file is part of the .gitignore and not included in the repository.
- We also included a test file: wa_collisions/wacollisions/Weather_test.csv which is a subset of the original dataset. This can be used for testing purposes.

In [None]:
weather_path = '../wacollisions/data/asos.txt'

## Create the Final Dataset
- Now that you have the collision, neighboorhood, and weather data downloaded, you can create an integrated dataset. This dataset is used in the visualizations and analysis modules. 
- This one function does it all. The final dataset is written to wacollisions/data/

In [None]:
t0 = time.time()
final_data = integrate_data(collision_path, 2014, weather_path, geo_json_path)
t1 = time.time()
print("time to prepare the dataset: ", t1 - t0, " seconds")
final_path = '../wacollisions/data/Collisions_Combined.csv'
final_data.to_csv(final_path)