## How to make a time lapse heatmap with Folium using NYC Bike Share Data

The following is an exercise in working with time series data from <a href="https://s3.amazonaws.com/tripdata/index.html" target=blank>Citibike</a> 
<br>
I chose to work with one month, however a web scraper could be built to continually scrape data as its released monthly.
<br>
It will take _Feb 2020_ data and return a time lapse heat map with aggregated times of day within that month of each stations activity. This will then be displayed on a color spectrum that correlates certain colors with higher activity

In [1]:
#import the packages
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import lxml
import os
import zipfile
import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

Write funtion that generates a Folium base map. It will have certain default values that can be changed if needed. Lat/Long location will be only necessary agrument.

In [2]:
# function to generate base map, has default values for zoom and tiles
def generateBaseMap(loc, zoom=12, tiles='Stamen Toner', crs='ESPG2263'):
    '''
    Function that generates a Folium base map
    Input location lat/long
    Zoom level default 12
    Tiles default to Stamen Toner
    CRS default 2263 for NYC
    '''
    return folium.Map(location=loc, 
                      control_scale=True, 
                      zoom_start=zoom,
                      tiles=tiles)

### Generate Base map

Generate base map with custom function. Pass in list with NYC lat/long.

In [3]:
nyc = [40.7400, -73.985880]
base_map = generateBaseMap(nyc)
base_map

### Web Scrape 1 Month of Data

Read in one month of latest bikeshare data by scraping from web.

In [4]:
# define url parameter
url = 'https://s3.amazonaws.com/tripdata/'

r = requests.get(url) # send request
soup = BeautifulSoup(r.text, 'xml') # instantiate beautiful soup object

# extract file name from soup
files = soup.find_all('Key')
clean_files = []
for i in range(len(files)-1):
    clean_files.append(files[i].get_text())

# create list of file names only for nyc
nyc_files = []
for file in clean_files:
    if not file.startswith('JC'):
        nyc_files.append(file)

In [5]:
# isolate latest month of data file name
last_month = nyc_files[-1]

# create file url
file_url = url + last_month

# download file
with open(last_month, "wb") as f:
    response = requests.get(file_url)
    f.write(response.content)

# unzip file
with zipfile.ZipFile(last_month, "r") as zip_ref:
    zip_ref.extractall("tripdata")
    
# rename file
directory = 'tripdata/'
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith('.csv'):
        new_filename = filename.replace(' ', '').lower().split('ci',1)[0]\
        .strip('-').replace('-','_')
        os.rename(os.path.join(directory, filename), os.path.join(directory, new_filename + '.csv'))

# load file into Dataframe
# df = {}
# for file in os.listdir(directory):
#     filename = os.fsdecode(file)
#     if filename.endswith('.csv'):
#         [filename.split('.')[0]] = pd.read_csv(os.path.join(directory, filename))
#         df = [filename.split('.')[0]]
 

In [6]:
df = pd.read_csv('tripdata/202102.csv')
df

FileNotFoundError: [Errno 2] No such file or directory: 'tripdata/202102.csv'

In [None]:
# replace all space in column headers with underscore
df.columns = [col.replace(' ', '_') for col in df.columns]

In [None]:
df.shape

Need to turn `starttime` into a datetime object so that I can pull an hour column from it. 

In [None]:
df['starttime'] = pd.to_datetime(df['starttime'], format='%Y-%m-%d %H:%M:%S')

Extract hours from datetime column

In [None]:
df['hour'] = df['starttime'].dt.hour

Add a count column to count how many of rides during each hour were taken from a given station.

In [None]:
df['count'] = 1

Create new df with groupby `start_station_id`, `start_station_latitude`, `start_station_longitude` and sum up `count` column.

In [None]:
df2 = pd.DataFrame(df.groupby(['start_station_id', 'start_station_latitude', 'start_station_longitude'])['count']\
                        .sum().sort_values(ascending=False))

df2.head()

In [None]:
# create list of lat/long and count (as weight)
lst = df2.groupby(['start_station_latitude', 'start_station_longitude']).sum().reset_index().values.tolist()

### Create Heat Map

In [None]:
# add data to basemap 
HeatMap(data=lst, radius=12).add_to(base_map);

# save base map as .html
base_map.save('./images/bike_station_HeatMap.html')

# call map 
base_map

## Create Heat Map with Time

In [None]:
df_hour_list = []
for hour in df['hour'].sort_values().unique():
    df_hour_list.append(df.loc[df['hour'] == hour, ['start_station_latitude', 'start_station_longitude', 'count']].groupby(['start_station_latitude', 'start_station_longitude']).sum().reset_index().values.tolist())
df_hour_list

In [None]:
# instantiate HeatMapWithTime
HeatMapWithTime(df_hour_list,radius=8,
                gradient={0.1: 'blue', 0.5: 'lime', 0.7: 'orange', 1: 'red'}, 
                min_opacity=0.4, 
                max_opacity=0.8, 
                use_local_extrema=True).add_to(base_map)

# save as html
base_map.save('./images/heatmapwithtime_bikeshare.html')

# call result
base_map