## How to make a time lapse heatmap with Folium using NYC Bike Share Data

The following is an exercise in working with time series data from <a href="https://s3.amazonaws.com/tripdata/index.html" target=blank>Citibike</a> 
<br>
I chose to work with one month, however a web scraper could be built to continually scrape data as its released monthly.
<br>
It will take _Feb 2020_ data and return a time lapse heat map with aggregated times of day within that month of each stations activity. This will then be displayed on a color spectrum that correlates certain colors with higher activity

In [2]:
#import the packages
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import lxml
import os
import zipfile
import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

Write funtion that generates a Folium base map. It will have certain default values that can be changed if needed. Lat/Long location will be only necessary agrument.

In [20]:
# function to generate base map, has default values for zoom and tiles
def generateBaseMap(loc, zoom=12, tiles='Stamen Toner', crs='ESPG2263'):
    '''
    Function that generates a Folium base map
    Input location lat/long
    Zoom level default 12
    Tiles default to Stamen Toner
    CRS default 2263 for NYC
    '''
    return folium.Map(location=loc, 
                      control_scale=True, 
                      zoom_start=zoom,
                      tiles=tiles)

### Generate Base map

Generate base map with custom function. Pass in list with NYC lat/long.

In [21]:
nyc = [40.7400, -73.985880]
base_map = generateBaseMap(nyc)
base_map

### Web Scrape 1 Month of Data

Read in one month of latest bikeshare data by scraping from web.

In [3]:
# define url parameter
url = 'https://s3.amazonaws.com/tripdata/'

r = requests.get(url) # send request
soup = BeautifulSoup(r.text, 'xml') # instantiate beautiful soup object

# extract file name from soup
files = soup.find_all('Key')
clean_files = []
for i in range(len(files)-1):
    clean_files.append(files[i].get_text())

# create list of file names only for nyc
nyc_files = []
for file in clean_files:
    if not file.startswith('JC'):
        nyc_files.append(file)

In [4]:
nyc_files[-1]

'202102-citibike-tripdata.csv.zip'

In [5]:
# isolate latest month of data file name
last_month = nyc_files[-1]

# create file url
file_url = url + last_month
file_url

'https://s3.amazonaws.com/tripdata/202102-citibike-tripdata.csv.zip'

In [6]:
# download file
with open(last_month, "wb") as f:
    response = requests.get(file_url)
    f.write(response.content)

In [7]:
# unzip file
with zipfile.ZipFile(last_month, "r") as zip_ref:
    zip_ref.extractall("tripdata")

In [8]:
df = pd.read_csv('tripdata/202102-citibike-tripdata.csv')
df

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,304,2021-02-01 00:04:23.0780,2021-02-01 00:09:27.7920,3175,W 70 St & Amsterdam Ave,40.777480,-73.982886,4045,West End Ave & W 60 St,40.772370,-73.990050,27451,Subscriber,1996,2
1,370,2021-02-01 00:07:08.8080,2021-02-01 00:13:19.4670,3154,E 77 St & 3 Ave,40.773142,-73.958562,3725,2 Ave & E 72 St,40.768762,-73.958408,35000,Subscriber,1991,1
2,635,2021-02-01 00:07:55.9390,2021-02-01 00:18:31.0390,502,Henry St & Grand St,40.714211,-73.981095,411,E 6 St & Avenue D,40.722281,-73.976687,49319,Subscriber,1980,2
3,758,2021-02-01 00:08:42.0960,2021-02-01 00:21:20.7820,3136,5 Ave & E 63 St,40.766368,-73.971518,3284,E 88 St & Park Ave,40.781411,-73.955959,48091,Customer,1969,0
4,522,2021-02-01 00:09:32.6820,2021-02-01 00:18:15.4100,505,6 Ave & W 33 St,40.749013,-73.988484,3687,E 33 St & 1 Ave,40.743227,-73.974498,48596,Subscriber,1988,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
634626,135,2021-02-28 23:57:55.5560,2021-03-01 00:00:10.8970,3555,28 St & 41 Ave,40.751047,-73.937970,3129,Queens Plaza North & Crescent St,40.751102,-73.940737,47853,Subscriber,1988,1
634627,358,2021-02-28 23:58:44.3410,2021-03-01 00:04:42.9870,526,E 33 St & 5 Ave,40.747659,-73.984907,540,Lexington Ave & E 29 St,40.743116,-73.982154,45496,Customer,1969,0
634628,289,2021-02-28 23:59:12.6970,2021-03-01 00:04:02.4910,519,Pershing Square North,40.751873,-73.977706,367,E 53 St & Lexington Ave,40.758281,-73.970694,41038,Customer,1969,0
634629,166,2021-02-28 23:59:17.8860,2021-03-01 00:02:04.4920,3134,3 Ave & E 62 St,40.763126,-73.965269,3141,1 Ave & E 68 St,40.765005,-73.958185,37383,Subscriber,1986,1


In [9]:
# replace all space in column headers with underscore
df.columns = [col.replace(' ', '_') for col in df.columns]

In [10]:
df.shape

(634631, 15)

Need to turn `starttime` into a datetime object so that I can pull an hour column from it. 

In [11]:
df['starttime'] = pd.to_datetime(df['starttime'], format='%Y-%m-%d %H:%M:%S')

Extract hours from datetime column

In [12]:
df['hour'] = df['starttime'].dt.hour

In [36]:
df['hour'].value_counts()

17    59981
16    55498
15    51849
18    51707
14    47985
13    43282
12    39969
19    38112
11    34359
8     33240
9     31274
10    30595
20    24829
7     22793
21    15940
6     13217
22    12949
23     9272
0      5453
5      4334
1      3234
2      2066
4      1402
3      1291
Name: hour, dtype: int64

Add a count column to count how many of rides during each hour were taken from a given station.

In [13]:
df['count'] = 1

Create new df with groupby `start_station_id`, `start_station_latitude`, `start_station_longitude` and sum up `count` column.

In [14]:
df2 = pd.DataFrame(df.groupby(['start_station_id', 'start_station_latitude', 'start_station_longitude'])['count']\
                        .sum().sort_values(ascending=False))

df2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count
start_station_id,start_station_latitude,start_station_longitude,Unnamed: 3_level_1
3141,40.765005,-73.958185,4251
435,40.74174,-73.994156,3762
497,40.73705,-73.990093,3230
492,40.7502,-73.990931,3213
3711,40.729667,-73.98068,3044


In [28]:
# create list of lat/long and count (as weight)
lst = df2.groupby(['start_station_latitude', 'start_station_longitude']).sum().reset_index().values.tolist()

In [29]:
lst

[[40.644512, -74.021506, 75.0],
 [40.645921, -74.005708, 2.0],
 [40.646377, -74.023087, 43.0],
 [40.647105, -74.004483, 5.0],
 [40.649983, -74.005144, 23.0],
 [40.651354, -74.007168, 10.0],
 [40.651654, -73.981231, 13.0],
 [40.652502, -74.013587, 2.0],
 [40.652512, -74.008906, 3.0],
 [40.652657, -74.002356, 10.0],
 [40.653368, -73.976291, 46.0],
 [40.654098, -74.001131, 20.0],
 [40.654798, -74.014372, 8.0],
 [40.655278, -74.003101, 34.0],
 [40.65539977447831, -74.01062786579132, 86.0],
 [40.65629, -73.977335, 82.0],
 [40.656326, -74.009627, 11.0],
 [40.656633, -73.983864, 56.0],
 [40.656986, -73.998194, 25.0],
 [40.65708866668485, -74.00870203971863, 205.0],
 [40.657743, -74.001141, 22.0],
 [40.658029, -73.989605, 3.0],
 [40.659053, -73.98854, 21.0],
 [40.659176, -74.006584, 12.0],
 [40.659555, -73.995068, 24.0],
 [40.66016, -73.990974, 64.0],
 [40.660906, -73.983074, 57.0],
 [40.6610633719006, -73.97945255041122, 1108.0],
 [40.662584, -73.995554, 57.0],
 [40.662611, -73.998623, 10.0],

### Create Heat Map

In [31]:
# add data to basemap 
HeatMap(data=lst, radius=8).add_to(base_map);

# save base map as .html
base_map.save('./images/bike_station_HeatMap.html')

# call map 
base_map

## Create Heat Map with Time

The data passed to HeatMapWithTime, needs to be lists within lists, with each list representing an hour. Currenlty this is not happening. Need to figure out why and reformat code.

In [55]:
for hour in df['hour'].sort_values().unique():
    new_df = df.loc[df['hour'] == hour, 
    ['start_station_latitude', 'start_station_longitude', 'count']]\
    .groupby(['start_station_latitude', 'start_station_longitude'])\
    .sum().reset_index().values.tolist()
new_df[0][1]
    

-74.023087

In [34]:
df_hour_list = []
for hour in df['hour'].sort_values().unique():
    df_hour_list.append(df.loc[df['hour'] == hour, 
    ['start_station_latitude', 'start_station_longitude', 'count']]\
    .groupby(['start_station_latitude', 'start_station_longitude'])\
    .sum().reset_index().values.tolist())
df_hour_list

[[[40.65629, -73.977335, 2.0],
  [40.656633, -73.983864, 1.0],
  [40.65708866668485, -74.00870203971863, 1.0],
  [40.659555, -73.995068, 1.0],
  [40.660906, -73.983074, 1.0],
  [40.6610633719006, -73.97945255041122, 4.0],
  [40.662584, -73.995554, 2.0],
  [40.6627059, -73.9569115, 3.0],
  [40.6630619, -73.9538746, 2.0],
  [40.66407983678161, -73.96025128666679, 3.0],
  [40.66514681533792, -73.97637605667114, 10.0],
  [40.665816, -73.956934, 1.0],
  [40.6662078, -73.98199886, 3.0],
  [40.6663181, -73.9854617, 1.0],
  [40.6679411, -73.9588, 2.0],
  [40.668127, -73.98377641, 2.0],
  [40.668132, -73.97363831, 1.0],
  [40.668603, -73.9904394, 2.0],
  [40.6686273, -73.98700053, 1.0],
  [40.6691783, -73.9554162, 1.0],
  [40.6703837, -73.97839676, 2.0],
  [40.6705135, -73.98876585, 1.0],
  [40.6707767, -73.9576801, 8.0],
  [40.6711978, -73.97484126, 5.0],
  [40.6716493, -73.9631145, 3.0],
  [40.671907, -73.993612, 1.0],
  [40.6721683, -73.9609, 1.0],
  [40.672695, -73.954131, 3.0],
  [40.67281

In [33]:
# instantiate HeatMapWithTime
HeatMapWithTime(df_hour_list,radius=8,
                gradient={0.1: 'blue', 0.5: 'lime', 0.7: 'orange', 1: 'red'}, 
                min_opacity=0.4, 
                max_opacity=0.8, 
                use_local_extrema=True).add_to(base_map)

# save as html
base_map.save('./images/heatmapwithtime_bikeshare.html')

# call result
base_map