# LEGALST-123 Folium Heatmaps Lab

---

In this lab, students will learn how to construct a heatmap, as well as an interactive heat map. This will also be a component of the take-home problem set. This builds ontop of the folium labs from last week.


In [1]:
# dependencies
# from datascience import *
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import folium
import json
import os

In [2]:
!pip install folium --upgrade
import folium.plugins # The Folium Javascript Map Library
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ac/59de211624b5c89337a79e118f886072d47f195745df15a39c6c9933beba/folium-0.8.0-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 3.4MB/s ta 0:00:01
Installing collected packages: folium
  Found existing installation: folium 0.7.0
    Uninstalling folium-0.7.0:
      Successfully uninstalled folium-0.7.0
Successfully installed folium-0.8.0


ImportError: cannot import name 'deep_copy'

---

## The Data <a id='data'></a>
---

Today we'll be working with data on Berkeley crime calls, courtesy of the Berkeley Police department. Take a look at the metadata [here.](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5)

Note: this data set has already undergone a fair amount of cleaning to format it for our purposes (e.g. extracting the longitude and latitude, removing null values, and dropping irrelevant columns). You can see the original data at the source website.

Then, run the cell below to load the data into a Table.  

### need to get out of Tables and into Pandas syntax

In [None]:
calls = pd.read_csv('data/berkeley_crime_0218.csv', index_col=0)
calls.head(5)

When working with any new data set, it's a good idea to get to know it first. Use the following cell and the information on cityofberkeley.org to answer some basic questions:
- What information does this table contain? What are the different columns?
- How large is the data set? 
- What kinds of questions could we answer using this data set?

In [None]:
# what are dimensions of dataframe
print('shape of dataframe as rows, columns is ',calls.shape)
# what are the columns
print('variables: ', calls.columns)

## Heatmap <a id='data'></a>

Let's see if we can figure out what heatmap does and why it is useful.But first, we're going to quickly review how to use folium.Map. Again, you should consult the [python visualizer](https://python-visualization.github.io/folium/quickstart.html) for a refresher in case you forget how folium works!

Plot a map of the United States again using folium.Map.

<b>Reminder</b>: It is in the order of lat, log and the larger the zoom_start is the closer the map is.

In [None]:
# First, we create a folium Map
example_map1 = folium.Map([39.83, -98.59], zoom_start=6, tiles='Stamen Toner')
example_map1

### Key Note <br> _(we need to de- Table this part as well)_

Heatmaps do not take Tables so you will need to provide a list of lat, lons, i.e. a list of lists. 

Imagine that it looks something like this: `[[lat, lon],[lat, lon],[lat, lon],[lat, lon],[lat, lon]]`. This means if you were given a Table, there are a few steps you'd have to take.

1. Make sure the lat and lon are floats.
2. Filter the Table for the correct rows and columns.

What is something else you believe you'll need to check for to make sure that Heatmap will work?

**check for missing data!**

Our data set today has already had the NaNs filtered out, but that might not be true for data you work with in the future...

Run the next cell to generate a set of dummy `[[lat, lon]]` pairs for the HeatMap. Don't worry about the information itself. Instead, note how the array is formatted.

In [None]:
# The first two lines generate an array of small random numbers.
# The third line adds the random numbers to the pair [48, 5] to get 100 latitude, longitude pairs near [48, 5]
data = (np.random.normal(size=(100, 2)) *
        np.array([[1, 1]]) +
        np.array([[39.83, -98.59]])).tolist()
data

Then we can plot it on the map! The function is pretty simple: 
1. Create a Heatmap using the function `Heatmap(your_lat_lon_data)`
2. Add that Heatmap to your existing map with `add_to(your_map)`

In [None]:
# Add the HeatMap to the map
HeatMap(data).add_to(example_map1)

example_map1

Play around with your new Heatmap. What is it plotting? What kinds of things would a Heatmap be useful for?

**This heatmap is plotting a set of points normally distributed around the geographic center of the 

### Try It Out

Now, try making your own Heatmap using the Berkeley PD call data. First, plot a Folium Map of the Bay Area, just like you did last week.

In [None]:
#Plot the map of Berkeley
berk_coords = [37.8716, -122.2727]
berk_map = folium.Map(berk_coords, zoom_start=13, tiles='Stamen Toner')
berk_map

Next, extract your latitude and longitude data from the `calls` Table and save each to the variables `lat` and `lon`. We want the data as a numpy array, so don't use the `select` function; instead, index the Table by the correct column (e.g. `calls["Column_I_Want"]`).

In [None]:
lat = calls['Lat']
lon = calls['Lon']
lat

We have the right data, but it isn't in the right shape: we want an array of arrays, where the first column is latitudes, the second column is longitudes, and each row is a `[lat, lon]` pair (see the example above). We can do this by:
1. **Stacking** the `lat` array on top of the `lon` array into one larger array with `np.vstack`
2. **Transposing** our stacked array so the latitude and longitude are vertical columns, not horizontal rows.

Hint 1: the stacking function call looks something like `np.vstack((top_array, bottom_array))`
Hint 2: you can transpose an array by calling `.transpose()` on the array

In [None]:
# it seems like since we are using Pandas we could just do this from the original dataframe
call_locs = np.vstack((lat, lon)).transpose()
call_locs[:4]

Now, you have everything you need to make your HeatMap! Do so in the cell below.

In [None]:
#Create a Heatmap with the call data.
heatmap = HeatMap(call_locs).add_to(berk_map)

# Add it to your Berkeley map.
berk_map

What conclusions can you draw from this Heatmap?

**The heatmap shows a bunch of things. First, the calls seem to be recorded by intersection location, more or less. Second, the calls are concentrated along the main streets, especially south of campus, and at major intersections along University and San Pablo. The more residential parts of Berkeley both north and south are pretty quiet, except for a few hot spots (like California at Derby-Ward or thereabouts). Third, there are very few calls in North Berkeley and the Hills, except at Solano and Colusa and Marin and Euclid--traffic calls?**

## HeatMapwithTime <a id='data'></a>

Now what do you think is different with HeatMapwithTime?

**I suppose it shows some kind of animation that shows frequency with color changes. Fancy!**

In this example, we'll again use dummy data to show how it works. It follows a similar process to HeatMap. First, create another Folium Map centered at the geographical center of the USA.<br><br>
## We don't really need this second USA example map since the example heatmap with time is of Western Europe

In [None]:
# Create a folium Map at the USA's center
example_map2 = folium.Map([39.83, -98.59], zoom_start=6, tiles='Stamen Toner')
example_map2

Next, we will create more dummy location data to simulate locations associated with different dates. Don't worry too much about the code here, but you do need to understand how the output is shaped and why it needs to be shaped like that.

In [None]:
# This cell builds together an array of initial data to display on our HeatMapwithTime. Just as before, these are dummy 
# variables that are 100 copies of the center of the USA meant to simulate different locations in the area.
# Again, we have to use lon and lat in addition to time.
np.random.seed(3141592)
initial_data = (
    np.random.normal(size=(100, 2)) * np.array([[1, 1]]) +
    np.array([[48, 5]])
)

# Create even more random lat/lon pairs and group into 100 lists
# You don't need to know how to write this code
move_data = np.random.normal(size=(100, 2)) * 0.01

data = [(initial_data + move_data * i).tolist() for i in range(100)]
data[1]

Since we're using HeatMapWithTime, we need an extra parameter: the dates for each list of lat/lon pairs. Run the next cell to create one.

In [None]:
# Generate a set of dates for this dummy data.
# Luckily for us, when you test this out for yourself, dates came with your data set.
# You don't need to write out this code, but do look it over and see if you can understand it.
from datetime import datetime, timedelta

time_index = [
    (datetime.now() + k * timedelta(1)).strftime('%Y-%m-%d') for
    k in range(len(data))
]
print('first 5 elements of time_index: ', time_index[:4])
type(time_index[1])

Finally, create the HeatMapWithTime by calling the constructor function on the data and settng the index to the set of dates you generated. Then, add it to your Map.

In [None]:
# This is the code on how to run HeatMapwithTime. Looks similar to code we saw above right?
m = folium.Map([48., 5.], zoom_start=6)

hm = HeatMapWithTime(
    data,
    index=time_index,
    auto_play=True,
)

hm.add_to(m)

m

Now try for yourself using the Berkeley `calls` data set.

## _The following needs to be taken out of Tables and put into Pandas_

_The first step is to get the data into the correct format. Create a new Table with two columns: Date, containing the data in the calls "timestamp" column, and Location, containing the call location data you used to make your HeatMap (the stacked and transposed latitudes and longitudes)._

*Hint: check your 1-18 lab, or the Datascience Table documentation for [creating](http://data8.org/datascience/_autosummary/datascience.tables.Table.with_columns.html#datascience.tables.Table.with_columns) and [grouping](http://data8.org/datascience/_autosummary/datascience.tables.Table.group.html) Tables. You're going to want to call `group` with the `list` function as the aggregator.*


In [None]:
# Create a new dataframe with the locations and timestamps of calls, grouped by date

locs_and_dates = calls.loc[:,['Lat', 'Lon', 'timestamp']]
dti = pd.to_datetime(locs_and_dates['timestamp'])
locs_and_dates['dti'] = dti

locs_and_dates.head()

In [None]:
# what data type is the timestamp?
print(type(locs_and_dates.loc[17076632,'dti']))

In [None]:
# now group the locations and timestamps by timestamp
# in pandas, what we really want to do is sort the values by datetime inplace=True
locs_and_dates.sort_values('dti', inplace=True)
locs_and_dates.head()

## This needs to be Pandas-ized <br>
Next, extract the dates and the grouped locations into two variables to put in your HeatMapWithTime. Note:

* HeatMapWithTime needs lists, so you'll need to convert your dates to a list using `.tolist()`
* The Table Group function converts everthing to arrays, and each array needs to be converted to a list. This is super annoying, so we've given you the code to do it. Just extract the grouped locations from the correct column and put the extracted data in the ellipses on the second line.

In [None]:
# I think we can skip this step by using what is in the dataframe with the .tolist() method on Series data
# berk_dates = ...
# berk_loc_by_date = [[x.tolist() for x in y] for y in ...]

Finally, create a Folium map of Berkeley, then create a [HeatMapwithTime](https://python-visualization.github.io/folium/docs-v0.5.0/plugins.html) and add it to your Berkeley map. The call looks like `HeatMapWithTime(<grouped locations>, index=<dates>`). Click the link for more documentation. And, try adding the argument `autoplay=True`.

In [None]:
#Plot the heatmap of Berkeley crime
berk_coords = [37.8716, -122.2727]
berk_map2 = folium.Map(berk_coords, zoom_start=13, tiles='OpenStreetmap')

# create a bunch of lists from the df Series data to pass to Heat Map With Time
lat_list = locs_and_dates['Lat'].tolist()
lon_list = locs_and_dates['Lon'].tolist()
time_index = locs_and_dates['dti'].tolist()
print('first five lines of time_index: ', time_index[:4])
time_index2 = [time_index[k].strftime('%Y-%m-%d') for       # get rid of  %H:%M:%S for now
    k in range(len(lat_list))
]

# take latitude and longitude and make it into a list of [lat, lon] pairs
data = np.vstack((lat_list, lon_list)).transpose().tolist()
print('data is of type ', type(data))
print('first five lines of coordinate data: ', data[:4])
print('first five lines of time_index2: ', time_index2[:4])
print('time_index2 type: ', type(time_index2[1]))
print('length of data list: ', len(data), ' length of time_index2 list: ', len(time_index2))


#finally, pass all the lists to HeatMapWithTime and then add that to basemap
hmwt_berk = HeatMapWithTime(
    data,
    index=time_index2,
    auto_play=True,
    max_speed=100
)
hmwt_berk.add_to(berk_map2)

berk_map2

What conclusions can you draw from this Heatmap?