# Prediction of hourly bike usage for a bike station

The analysis of the hourly bike usage for a specific bike station with a high traffic volume (bike station number 31245) will be performed. From this analysis a prediction of the future bike usage per hour will be done.

The analysis will first focus on the bikes that leave the bike station.

In [32]:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime

## Prepare the data for analysis

The data is prepared for analysis by making sure every column has got the right type. Some new columns are added like the hour and the day for the start of each bike usage.

In [2]:
# Open the data file
trips = pd.read_csv("../data/trips.csv", index_col=0)

  mask |= (ar1 == a)


In [3]:
# Change type of start_time and end_date to datetime
trips = trips.astype({"start_date": "datetime64"})
trips = trips.astype({"end_date": "datetime64"})

In [4]:
# Check the types of the trips dataframe
trips.dtypes

duration                         int64
start_date              datetime64[ns]
end_date                datetime64[ns]
start_station_number             int64
start_station                   object
end_station_number               int64
end_station                     object
bike_number                     object
member_type                     object
dtype: object

In [9]:
# Create new columns with the day of the week and the hour of the day using start_date.
trips["start_day"] = trips["start_date"].dt.weekday_name

In [10]:
trips["start_hour"] = trips["start_date"].dt.hour

In [11]:
# Select the data specific to the bike station number 31245
trips_31245 = trips[trips['start_station_number']==31245] 

In [12]:
# Check the head of trips_31245
trips_31245.head()

Unnamed: 0,duration,start_date,end_date,start_station_number,start_station,end_station_number,end_station,bike_number,member_type,start_day,start_hour
132,594,2017-01-01 01:37:53,2017-01-01 01:47:47,31245,7th & R St NW / Shaw Library,31600,5th & K St NW,W22093,Member,Sunday,1
137,521,2017-01-01 01:42:59,2017-01-01 01:51:40,31245,7th & R St NW / Shaw Library,31201,15th & P St NW,W22403,Member,Sunday,1
142,373,2017-01-01 01:46:55,2017-01-01 01:53:08,31245,7th & R St NW / Shaw Library,31214,17th & Corcoran St NW,W01005,Member,Sunday,1
215,280,2017-01-01 02:58:05,2017-01-01 03:02:46,31245,7th & R St NW / Shaw Library,31266,11th & M St NW,W22523,Member,Sunday,2
239,493,2017-01-01 04:08:27,2017-01-01 04:16:40,31245,7th & R St NW / Shaw Library,31102,11th & Kenyon St NW,W00247,Member,Sunday,4


In [33]:
# Select the only rows that are interesting for the analysis 
trips_31245 = trips_31245[["duration", "start_date", "start_station_number", "start_day"]]

In [35]:
# Convert the start_date column to a new column with the date and the hour (without minutes and second) as the prediction of use
# will be done by hour
trips_31245["start_date_hour"] = trips_31245["start_date"].apply(lambda x: x.strftime("%Y-%m-%d %H:00"))

In [36]:
trips_31245.head()

Unnamed: 0,duration,start_date,start_station_number,start_day,start_date_hour
132,594,2017-01-01 01:37:53,31245,Sunday,2017-01-01 01:00
137,521,2017-01-01 01:42:59,31245,Sunday,2017-01-01 01:00
142,373,2017-01-01 01:46:55,31245,Sunday,2017-01-01 01:00
215,280,2017-01-01 02:58:05,31245,Sunday,2017-01-01 02:00
239,493,2017-01-01 04:08:27,31245,Sunday,2017-01-01 04:00


In [37]:
# Change type of start_date_hour to datetime as it is a string.
trips_31245 = trips_31245.astype({"start_date_hour": "datetime64"})

In [38]:
# Check the types of the dataframe
trips_31245.dtypes

duration                         int64
start_date              datetime64[ns]
start_station_number             int64
start_day                       object
start_date_hour         datetime64[ns]
dtype: object