> # Vacant Buildings: Escape from LA

_Version 1: 05/03/18_

## Introduction

We present a visualization of information compiled about vacant buildings in Los Angeles by the LA Times.  The data is available [here](https://www.kaggle.com/la-times/la-vacant-building-complaints).

I wanted to learn how to plot interesting things on maps.  In particular, I wanted to make a heat map of the data.  I played around with `basemap`, `GeoPandas`, and some GIS shapefiles originally.  While I was able to produce a heatmap, I wanted something that looked more polished.  I then decided to try out folium. 

In [9]:
import numpy as np
import pandas as pd

import ast
import time
import datetime

import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

In [10]:
vacant = pd.read_csv('../input/combined_complaints.csv')

## Latitude and Longitude

The data set includes coordinates of each complaint in the column `Latitude/Longitude`.  We need to turn this column into a usable format.  First we check to see how many missing coordinates there are.

In [8]:
vacant['Latitude/Longitude'].isnull().sum()

We will end up simply dropping the null rows.

The python function `ast.literal_eval()` will convert the latitude/longitude tuple, currently stored as a string, into a float.  We opt to make two new columns in the DataFrame, storing the longitude and latitude.

In [12]:
def lat(point):
    try:
        lat = ast.literal_eval(point)[0]
    except:
        lat = np.nan
    return lat
def lon(point):
    try:
        lat = ast.literal_eval(point)[1]
    except:
        lat = np.nan
    return lat

In [14]:
vacant['lon'] = vacant['Latitude/Longitude'].transform(lon)
vacant['lat'] = vacant['Latitude/Longitude'].transform(lat)

Now we remove all cells for which latitude or longitude is null.

In [16]:
clean = vacant[(~vacant['lon'].isnull()) | (~vacant['lat'].isnull())]

The data includes a column `CSR Priority`. Most of them are numerical values between 1 and 4, with '1' highest priority and '4' lowest priority.  There are a handful of entries in `CSR Priority` that are not classified this way, so we remove them.

In [17]:
clean = clean[clean['CSR Priority']<='4']

Now we convert the `CSR Priority` column to integer type.

In [18]:
clean['CSR Priority'] = clean['CSR Priority'].astype(int)

## Heat maps

We will use the folium library to make a heatmap of the complaint coordinates.  
The format will be `[lon, lat, intensity]`.  Here, 'intensity' will be determined by the column `CSR Priority`.  First we create a map with the rough coordinates we want.

In [11]:
hmap = folium.Map(location=[34, -118],tiles='Stamen Toner', zoom_start=10)
hmap

folium makes this look really nice, without having to do much work.

The HeatMap function expects a list of lists, with each sublist having at least latitude and longitude. We add an additional column to represent the classification severity.

In [30]:
heat_data = [[row['lat'],row['lon'],5-row['CSR Priority']] for index, row in clean.iterrows()]

Jupyter has a problem displaying the heatmap inline, so we'll store the heatmap as an HTML file.

In [32]:
h = HeatMap(heat_data, radius=10, max_val=3, min_opacity=0.2).add_to(hmap)
h.save('Heatmap.html')

## Heat maps with time

folium can also do heat maps with time, but it's a little more complicated than just a normal heat map. Each element of the data passed to `HeatMapWithTime` should be a list of lists representing a single layer of the heat map. For example, a heap map with 3 time layers could look like 
```
[[[lat, lon],[lat,lon]], [[lat,lon]], [[lat,lon],[lat,lon],[lat,lon]]]
```
Here, the first time layer has 2 points, the second has 1 point, and the third has 3 points.

For this data set, our times will be determined by the Date Received column in the DataFrame. First we make sure we're not missing any values

In [12]:
clean[clean['Date Received'].isnull()]

We make a new `Begin_date` column that stores the `Date Received` values as a local timestamp.

In [33]:
def datest_to_timestamp(series):
    res = []
    for st in series:
        res.append(datetime.datetime.strptime(st, "%m/%d/%Y"))
    return res

In [34]:
clean['Begin_date'] = clean['Date Received'].transform(datest_to_timestamp)

Next, we make an ordered list of the unique time stamps.

In [35]:
all_dates = clean['Begin_date'].values
all_dates = list(set(all_dates))
all_dates.sort()

Here's what the first few entries of `all_dates` looks like. 

In [36]:
all_dates[:4]

These entries aren't that nice for display, so we'll make a separate list `time_labels` that the heat map will show us in a nice format.

In [37]:
time_labels = [str(x)[:10] for x in all_dates]

Now we create the data that can be passed to HeatMapWithTime.  This takes a little while, but once it's done, we're going to use it in a custom heat map function that will be fast.

In [38]:
data = [[[row['lat'], row['lon'], 5-row['CSR Priority']] for index, row in clean[clean['Begin_date'] == i].iterrows()] for i in all_dates]

A custom function will display the heat map with time between `start_date` and `end_date`.

In [39]:
def heat_map_time(start_date, end_date):
    # min_day is the date representation of the first index of the data array
    min_day = datetime.datetime.strptime(time_labels[0], "%Y-%m-%d")
    s_day = datetime.datetime.strptime(start_date, "%m/%d/%Y")
    e_day = datetime.datetime.strptime(end_date, "%m/%d/%Y")
    
    # the number of days relative to min_day are the indices in the data array we want
    s_idx = (s_day-min_day).days
    e_idx = (e_day-min_day).days
    
    # the data and times passed to HeatMapWithTime
    d = data[s_idx:e_idx+1]
    times = time_labels[s_idx:e_idx+1]
    
    #the Stamen Toner tile looks cool
    hmaptime = folium.Map(location=[34, -118], tiles='Stamen Toner',zoom_start=10)
    
    HeatMapWithTime(data=d,index=times).add_to(hmaptime)
    display(hmaptime)

Try it out.

In [41]:
heat_map_time('06/12/2013','08/11/2013')

Kaggle, for some reason, does not render the left toolbar correctly.  This means the dates and buttons aren't shown correctly as if this were a native Jupyter notebook. But at least If you hover over it, you'll find the "Play" button.   Try it out.