<IMG SRC="https://github.com/jacquesroy/byte-size-data-science/raw/master/images/Banner.png" ALT="BSDS Banner" WIDTH=1195 HEIGHT=200>

<table align="left">
    <tr><td>
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a></td><td>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.</td>
    </tr>
    <tr><td>Jacques Roy, Byte Size Data Science</td><td> </td></tr>
    </table>

# Displaying Spatial Data
Here we use the Chicago accident data and display it in multiple ways.


### 044-Introduction to Spatial Data
Execute the next cell if you want to see the `Byte Size Data Science` youtube channel video

In [None]:
from IPython.display import IFrame

IFrame(src="https://www.youtube.com/embed/A0rjUgDGo88?rel=0&amp;controls=0&amp;showinfo=0", width=560, height=315)


## Read the data

In [None]:
import sys
import types
import pandas as pd
import urllib.request
import zipfile

url = 'https://github.com/jacquesroy/byte-size-data-science/raw/master/data/ChicagoTrafficCrashes20180917.csv.zip'
# get the filename from the url: "ChicagoTrafficCrashes20180917.csv"
filename = url.rsplit('/', 1)[-1].rsplit('.', 1)[0]

urllib.request.urlretrieve(url, filename)
compressed_file = zipfile.ZipFile(filename)
csv_file = compressed_file.open(filename)
collisions_pd = pd.read_csv(csv_file)

print("Number of records: {}".format(collisions_pd['RD_NO'].count()))
collisions_pd.head(1)

In [None]:
# Extract the spatial information
location_pd = collisions_pd[['LATITUDE', 'LONGITUDE']].dropna()
print('Number of accidents with location: ' + str(location_pd.LATITUDE.count()))
location_pd.head(10)

## Display the Accident locations using matplotlib
We use longitude and latitude as (X,Y) coordinates on a graph. This is not a map.

In [None]:
import matplotlib.pyplot as plt

%matplotlib inline

#create scatterplots
plt.figure(figsize=(15,10))
plt.scatter(location_pd.LONGITUDE, location_pd.LATITUDE, alpha=0.05, s=4, color='darkseagreen')

#adjust more settings
plt.title('Motor Vehicle Collisions in Chicago', size=25)
plt.xlim((-87.92,-87.52))
plt.ylim((41.64,42.03))
plt.xlabel('Longitude',size=20)
plt.ylabel('Latitude',size=20)

plt.show()

## Cluster accidents
Create 100 clusters

In [None]:
from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm

In [None]:
# This takes a while to execute since it hasd to go through the 220 thousand points multiple time
# K Means Cluster
k=100
model = KMeans(n_clusters=k)
kmeans = model.fit(location_pd[['LONGITUDE','LATITUDE']])
vals=[0] * k
for i in kmeans.labels_ :
    vals[i] = vals[i] + 1

In [None]:
# Create a Panda dataframe for display
d = {'longitude': kmeans.cluster_centers_[:,0], 'latitude': kmeans.cluster_centers_[:,1], 'total' : vals}
k_pd = pd.DataFrame(data=d)

In [None]:
k_pd.head()

## Use a Map
We are using Folium to display the same information on a map.

In [None]:
!pip install folium
import folium

In [None]:
# df_ll = location_pd.iloc[0:100]
latlong = location_pd.mean()
chi_map = folium.Map(location=[latlong[0], latlong[1]], zoom_start=11, width="80%", height="80%")
incidents = folium.map.FeatureGroup()
for lat, lng, tot in zip(k_pd.latitude, k_pd.longitude, k_pd.total):
    incidents.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            tooltip=str(tot),
            fill_opacity=0.6
        )
    )
chi_map.add_child(incidents)
chi_map

## Use PixieDust
Another library that can help with Maps.

**It requires at least a free mapbox or google account**

In [None]:
# PixieDust is an open source library that was contributed by IBM
!pip install --user --upgrade pixiedust

In [None]:
import pixiedust

In [None]:
display(k_pd)