# Part 5: Heatmaps of geo-data

In [1]:
import numpy as np
import pandas as pd
import folium
from folium.plugins import HeatMap, HeatMapWithTime

### Exercise: A new take on geospatial data using Folium (see the Week 4 exercises for full info and tutorials).

In [2]:
df = pd.read_csv('data/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv')

Now we look at studying geospatial data by plotting raw data points as well as heatmaps on top of actual maps.

First start by plotting a map of San Francisco with a nice tight zoom. Simply use the command folium.Map([lat, lon], zoom_start=13), where you'll have to look up San Francisco's longitude and latitude.

In [3]:
san_francisco = None
lat_start, lon_start = 37.7749, -122.4194

def create_empty_map():
    lat_start, lon_start = 37.7749, -122.4194
    global san_francisco
    san_francisco = folium.Map([lat_start, lon_start], zoom_start=13, tiles='Stamen Toner')

Next, use the the coordinates for SF City Hall 37.77919, -122.41914 to indicate its location on the map with a nice, pop-up enabled maker. (In the screenshot below, I used the black & white Stamen tiles, because they look cool).

In [4]:
create_empty_map()
lat_city_hall, lon_city_hall = 37.77919, -122.41914
folium.Marker([lat_city_hall, lon_city_hall], popup='City Hall').add_to(san_francisco)
san_francisco

![caption](screenshots/map_marker.png)

Now, let's plot some more data (no need for popups this time). Select a couple of months of data for 'DRUG/NARCOTIC' and draw a little dot for each arrest for those two months. You could, for example, choose June-July 2016, but you can choose anything you like - the main concern is to not have too many points as this uses a lot of memory and makes Folium behave non-optimally. We can call this a kind of visualization a point scatter plot.

In [5]:
df_drugs = df.loc[df['Category'] == 'DRUG/NARCOTIC'].copy()
df_drugs['datetime'] = pd.to_datetime(df_drugs['Date'] + ' ' + df_drugs['Time'])
df_drugs = df_drugs.set_index('datetime')
df_drugs = df_drugs.loc['2016-06-01':'2016-07-31']

create_empty_map()

for lon, lat in zip(df_drugs['X'], df_drugs['Y']):
    folium.CircleMarker(
        location=[lat, lon],
        radius=5,
        popup='Drug/narcotic crime',
        color='#FFA028',
        fill=True,
        fill_color='#FFA028'
    ).add_to(san_francisco)
    
san_francisco

![caption](screenshots/map_scatter.png)

Next, let's play with heatmaps. You can figure out the appropriate commands by grabbing code from the main tutorial) and modifying to suit your needs.

To create your first heatmap, grab all arrests for the category 'SEX OFFENSES, NON FORCIBLE' across all time. Play with parameters to get plots you like.

In [6]:
sex_offenses = df.loc[df['Category'] == 'SEX OFFENSES, NON FORCIBLE']

create_empty_map()

heat_data = [[row['Y'], row['X']] for index, row in sex_offenses.iterrows()]

HeatMap(
    heat_data,
    min_opacity=0.4,  # The minimum opacity the heat will start at.
    max_zoom=18,
    max_val=0.1,
    radius=30,
    blur=15,
    gradient={
        0.20: '#0000FF',
        0.40: '#0C41E8',
        0.60: '#007CFF',
        0.80: '#0CABE8',
        1.00: '#0DF7FF',
    },
).add_to(san_francisco)

san_francisco

![caption](screenshots/map_heat.png)

Now, comment on the differences between scatter plots and heatmaps.

What can you see using the scatter-plots that you can't see using the heatmaps?

> While using the scatter-plots, one can see exact places where the crimes occurred. The accuracy strictly depends on the data provided in the dataset — in the scatter-plot created above, one can see that each point lies either on the road crossing or in the middle of the street. 

And vice versa: what does the heatmaps help you see that's difficult to distinguish in the scatter-plots?

> In scatter-plots, it is hard to tell how often the crimes were committed, especially when the points are overlapping. However, using heatmaps allows ones to asses the areas where crimes took place much quicker, thanks to the intensity, colour, and the shape of the cloud of heat.

Play around with the various parameter for heatmaps. You can find a list here: https://python-visualization.github.io/folium/plugins.html

Comment on the effect on the various parameters for the heatmaps. How do they change the picture? (at least talk about the radius and max_zoom). For one combination of settings, my heatmap plot looks like this.

> Folium heatmaps actually consist of circle points (just like scatter-plot) but they may be transformed (e.g. blurred) using parameters passed to the constructor function arguments:
> * min_opacity – the bigger the value, the quicker the spot gets its maximum colour,
> * max_zoom — defines how much the map has to be zoomed in to see each point with the maximum colour of heat. If the value is too small, every cloud gets its maximum quickly and it is not possible to distinguish medium heat areas,
> * max_val — smaller values increases the intensity of the maximum colours. In this case, the changes are visible while going below 0.1,
> * radius — the radius of the circle. Low value creates more dense and smaller clouds of heat,
> * blur — a low value of blur creates sharper edges of the circles, while high value smooths the points which create the cloud,
> * gradient – defines colours of the gradient for steps in the interval (0, 1).

In that screenshot, I've (manually) highlighted a specific hotspot for this type of crime. Use your detective skills to find out what's going on in that building on the 800 block of Bryant street ... and explain in your own words.

> In this place, the Sex Defender Unit was established. Probably, a lot of offenses were registered there instead of the exact place of the crimes. In the source below, it is written that "[The Unit] ensures that sex offenders are properly registered".
> Source: https://sfgov.org/policecommission/investigations-1-sex-offender-unit


### Exercise: Heat map movies. This exercise is a bit more independent than above - you get to make all the choices.


Start by choosing your favorite crimetype. Prefereably one with spatial patterns that change over time (use your data-exploration from the previous lectures to choose a good one).

In [7]:
category_df = df.loc[df['Category'] == 'DRUNKENNESS'].copy()

Now, choose a time-resolution. You could plot daily, weekly, monthly datasets to plot in your movie. Again the goal is to find interesting temporal patterns to display. We want at least 20 frames though.

In [8]:
category_df['datetime'] = pd.to_datetime(category_df['Date'] + ' ' + category_df['Time'])
category_df['Weight'] = category_df['datetime'].dt.strftime('%H').astype(int)

Create the movie using HeatMapWithTime.

In [1]:
heat_data = [[[row['Y'], row['X']] for index, row in category_df[category_df['Weight'] == i].iterrows()] for i in range(24)]

create_empty_map()
HeatMapWithTime(heat_data, auto_play=True, max_opacity=0.8).add_to(san_francisco)
san_francisco

NameError: name 'category_df' is not defined

![caption](screenshots/map_movie.png)

Comment on your results:
What patterns does your movie reveal?

> The movie reveals multiple areas with crimes concerning drunkenness, such as Haight District, Mission District, Broadway street and Market street. Those places are quite popular amongst the people because of the atmosphere. For example, Haight and Mission Districts are culturally diverse environments influenced by Mexicans. Those are artistically independent places with murals, clubs, and restaurants. Moreover, Broadway Street was full of strip clubs and bars, which also may indicate a large number of people. Market Street is in the Civic Centre area, where there are also a lot of art venues.
> Another fact is that the number of drunkenness crimes increases from 6 PM and dramatically decreases at 4 AM and remains constant throughout the day. This is explained by the fact that people start going out after work, especially during the weekends and come back home late during the night.

Motivate/explain the reasoning behind your choice of crimetype and time-resolution.

> The drunkenness crime type and hourly sampling was chosen because of the visible patterns throughout the day. As it was mentioned above, much fewer people drink during the day, so consequentially, there are fewer crimes committed during working hours. What is more, the crime was chosen to see the places when the crime is committed regularly, such as city centres and places culturally active.

> Sources:
> * https://en.wikipedia.org/wiki/Mission_District,_San_Francisco
> * https://en.wikipedia.org/wiki/Broadway_(San_Francisco)
> * https://theculturetrip.com/north-america/usa/california/articles/looking-at-the-mission-district-through-precita-eyes/