# Feb-2019 Ford GoBike Trip Dataset Exploration
## by Walid Ismail

## Introduction

Our dataset includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. In the following sections, we will analyze the information in the dataset and see what insights we can discover from it about rides and cyclists demographics.

## Import Modules and Load Dataset

In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
plt.style.use('default')
%matplotlib inline

In [None]:
df_trips = pd.read_csv("/kaggle/input/201902fordgobiketripdataprocessed/201902-fordgobike-tripdata-processed.csv")
df_trips.head()

In [None]:
# how many rows and columns
df_trips.shape

In [None]:
# what are the columns data types and do we have columns with null values
df_trips.info()

## Data Wrangling
### Create New Columns

In [None]:
# Convert date / time fields to datetime data type
df_trips['start_time'] = pd.to_datetime(df_trips['start_time'])
df_trips['end_time'] = pd.to_datetime(df_trips['end_time'])

In [None]:
# Order it
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

weekdays_classes = pd.api.types.CategoricalDtype(ordered=True, categories=weekdays)

df_trips['day'] = df_trips['day'].astype(weekdays_classes);

In [None]:
# Create age groups column
age_groups = ['Under 12', '12-17 years old', '18-24 years old', '25-34 years old', '35-44 years old', '45-54 years old', '55-64 years old', '65-74 years old', '75 years or older']

age_groups_type = pd.api.types.CategoricalDtype(ordered=True, categories=age_groups)

df_trips['age_group'] = df_trips['age_group'].astype(age_groups_type)
df_trips.head()

In [None]:
# What are the start / end cities in our dataset?
df_trips['start_city'].unique(), df_trips['end_city'].unique()

In [None]:
df_trips.info()

### Summary Statistics / Outliers Detection

In [None]:
df_trips.describe()

From the above descriptive statistics, it appears that the trip duration and cyclist age columns have outliers. Lets investigate those columns in more depth.

In [None]:
# Visualize duration_min column
plt.figure(figsize=(20, 4))

plt.subplot(1, 2, 1)
bins = np.arange(0, df_trips.duration_min.max()+1, 5)
plt.hist(data = df_trips, x = 'duration_min', bins = bins);
plt.xlabel("duration_min")
plt.yscale("log")
plt.axhline(10, linestyle = '--')

plt.subplot(1, 2, 2)
plt.boxplot(df_trips['duration_min'], vert=False)
plt.xlabel("duration_min")
plt.xscale('log')

In [None]:
# How many points are considered outliers by the boxplot? top_whisker = 1.5 * IQR + Q3
duration_min_top_whisker = 1.5 * (df_trips['duration_min'].describe()['75%'] - df_trips['duration_min'].describe()['25%']) + df_trips['duration_min'].describe()['75%']
duration_min_top_whisker

In [None]:
(df_trips['duration_min'] > duration_min_top_whisker).sum()

In [None]:
# drop rows with duration_min outliers
#df_trips = df_trips[df_trips['duration_min'] <= duration_top_whisker]

In [None]:
plt.figure(figsize=(20, 4))

plt.subplot(1, 2, 1)
bins = np.arange(0, df_trips.age.max()+1, 2)
plt.hist(data = df_trips, x = 'age', bins = bins);
plt.xlabel("age")
#plt.yscale("log")

plt.subplot(1, 2, 2)
plt.boxplot(df_trips['age'].dropna(), vert=False) # age columns contains NaN so will need to filter those out to get the boxplot to render correctly
plt.xlabel("age");
#plt.xscale('log')

In [None]:
# How many age points are outliers?
age_top_whisker = 1.5 * (df_trips['age'].describe()['75%'] - df_trips['age'].describe()['25%']) + df_trips['age'].describe()['75%']
age_top_whisker
(df_trips['age'] > age_top_whisker).sum()

In [None]:
# Drop rows with age outliers
#df_trips = df_trips[df_trips['age'] <= age_top_whisker]

In [None]:
# Check for NaN values
df_trips.isna().sum()

In [None]:
# Check how many "Other" as gender type and drop if relatively few
df_group = df_trips.groupby("member_gender").size()
df_group

In [None]:
#df_trips.drop(index = df_trips.index[df_trips['member_gender'] == 'Other'], inplace = True)

df_group = df_trips.groupby("member_gender").size()
df_group

### Summary Statistics

In [None]:
df_trips.info()

In [None]:
df_trips.describe()

In [None]:
df_trips.shape

### What is the structure of your dataset?

The dataset consists of approx 180k observations and 23 columns (included derived columns) 

### What is/are the main feature(s) of interest in your dataset?

Each observation includes information about trip start/end time as well as start/end station. In addition we get information about the bike used and the gender/birth year of the cyclist taking the trip.

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

The following feature are good subjects for analysis:
- Trip start time
- Trip day
- Trip city
- Trip duration (in minutes)
- Cyclist age/age group
- Cyclist gender
- Stations latitude / longitude

## Univariate Exploration
In this section we use histograms / bar charts to discover distributions of numeric / categorical features.

In [None]:
plt.figure(figsize=(24, 4))
ax = plt.subplot(1, 4, 1)
sb.countplot(data=df_trips, x='member_gender', color=sb.color_palette()[0])
plt.title("Trips by Gender")

plt.subplot(1, 4, 2)
sb.countplot(data=df_trips, x='user_type', color=sb.color_palette()[0])
plt.title("Trips by User Type")

plt.subplot(1, 4, 3)
sb.countplot(data=df_trips, x='bike_share_for_all_trip', color=sb.color_palette()[0])
plt.title("Trips by Bike Sharing Membership");

The above charts show that our cyclists are mostly males, and mostly subscribers. Also note that most trips are for cyclists not enrolled in the <b>Bike Share for All</b> program for low-income residents

In [None]:
plt.figure(figsize=(24, 4))

# What is the number of trips by age group?
plt.subplot(1, 3, 1)
sb.countplot(data=df_trips, x='age_group', color=sb.color_palette()[0])
plt.xticks(rotation=90);
plt.title("Trips by Age Group");

plt.subplot(1, 3, 2)
sb.countplot(data=df_trips, x='day', color=sb.color_palette()[0])
plt.xticks(rotation=30);
plt.title("Trips by Day of the Week");

plt.subplot(1, 3, 3)
sb.countplot(data=df_trips, x='start_city', color=sb.color_palette()[0])
plt.title("Trips by City");

The above charts show that most of our cyclists are in <b>San Francisco</b> and in the <b>25-34 age group</b>. We can also see that most trips are taken on <b>Thursdays</b> and notice a sharp decline in the number of trip during the weekend. Maybe our cyclists are using their rides to commute to work??

In [None]:
plt.figure(figsize=(24, 4))

plt.subplot(1, 3, 1)
bins = np.arange(0, df_trips.duration_min.max()+1, 1)
plt.hist(data = df_trips, x = 'duration_min', bins = bins);
plt.xlabel("Duration of Trip (minutes)")
plt.yscale("log")
plt.title("Trips by Duration");
plt.axhline(10, linestyle = '--')
print("Mean trip duration = {} minutes".format(df_trips['duration_min'].mean()))
print("Number of trips above 50 minutes = {}".format((df_trips['duration_min'] > 50).sum()))
print("Number of trips above 100 minutes = {}".format((df_trips['duration_min'] > 100).sum()))

We can see from the above distribution that most of the trips are in the <b>5-10 minutes range</b> with a <b>trip mean of 12 minutes</b> and a long skew to the right of long individual trips.

In [None]:
def get_city_coordinates(name):
    from geopy.geocoders import Nominatim

    # Visualize Stations
    address = name

    geolocator = Nominatim(user_agent="to_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(name, latitude, longitude))

    return latitude, longitude

In [None]:
df_start_stations = df_trips[['start_station_id', 'start_station_name', 'start_station_latitude', 'start_station_longitude']]
df_start_stations = df_start_stations.drop_duplicates()
len(df_start_stations)

In [None]:
color_map = {0: '#FFC300', 1: '#FF5733', 2: '#C70039', 3: '#900C3F', 4: '#581845'}
#color_map = {0: '#AED6F1', 1: '#85C1E9', 2: '#3498DB', 3: '#2E86C1', 4: '#2874A6'}

In [None]:
df_start_stations = df_trips.groupby(['start_station_id', 'start_station_name', 'start_station_latitude', 'start_station_longitude']).size().to_frame('trips')
df_start_stations = df_start_stations.reset_index()
df_start_stations['trip_cat'] = pd.cut(df_start_stations['trips'], bins = 5, labels = [0, 1, 2, 3, 4])
df_start_stations['color'] = df_start_stations['trip_cat'].apply(lambda x: color_map[x])
df_start_stations

In [None]:
import folium # map rendering library

f = folium.Figure(width=1200, height=450)

address = 'Bay Area, California'

#geolocator = Nominatim(user_agent="to_explorer")
#location = geolocator.geocode(address)
latitude = 37.773972
longitude = -122.431297
print('The geograpical coordinate of San Francisco Bay Area are {}, {}.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_sanfrancisco = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, id, name, color, radius in zip(df_start_stations['start_station_latitude'], df_start_stations['start_station_longitude'], df_start_stations['start_station_id'], df_start_stations['start_station_name'], df_start_stations['color'], df_start_stations['trip_cat']):
    label = '{}, {}'.format(id, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=(radius+2)*2,
        popup=label,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        parse_html=False).add_to(map_sanfrancisco)  
    
f.add_child(map_sanfrancisco)
f

By zooming on the map we can see that most trips start from the stations on <b>Market Street</b> and <b>Townsend Street</b> in San Francisco. For Berkeley most trips start on <b>Bancroft Way</b>. 

In [None]:
df_end_stations = df_trips.groupby(['end_station_id', 'end_station_name', 'end_station_latitude', 'end_station_longitude']).size().to_frame('trips')
df_end_stations = df_end_stations.reset_index()
df_end_stations['trip_cat'] = pd.cut(df_end_stations['trips'], bins = 5, labels = [0, 1, 2, 3, 4])
df_end_stations['color'] = df_end_stations['trip_cat'].apply(lambda x: color_map[x])
df_end_stations

In [None]:
f = folium.Figure(width=1200, height=450)

# create map of New York using latitude and longitude values
map_sanfrancisco = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, id, name, color, radius in zip(df_end_stations['end_station_latitude'], df_end_stations['end_station_longitude'], df_end_stations['end_station_id'], df_end_stations['end_station_name'], df_end_stations['color'], df_end_stations['trip_cat']):
    label = '{}, {}'.format(id, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=(radius+2)*2,
        popup=label,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        parse_html=False).add_to(map_sanfrancisco)  
    
f.add_child(map_sanfrancisco)
f

The above map shows that most trips end at stations on <b>Market Street</b> and <b>Townsend Street</b> too. For Berkeley most trips end on <b>Bancroft Way</b>.

#### We can summarize the findings from our univariate plots as follows:
- Our cyclists are mostly males, and mostly subscribers.
- The vast majority of trips are for cyclists not enrolled in the <b>Bike Share for All</b> program for low-income residents.
- The majority of trips are in San Francisco followed by Oakland and Berkeley
- Most of our cyclists are in the  <b>25-34 age group</b>.
- Most trips are taken on <b>Thursdays</b>.
- We see a sharp decline in the number of trip during the weekend.
- The average trip duration is 12 minutes with vast majority of trips falling the 5-10 minutes range.


#### We observed the following unusual characteristics in the dataset:
- All rows in the dataset contain the most important trip data like start/end station/datetime and trip duration.

- The duration of trips range from low single digits to 1400 minutes. However, the total number of trips that are above 200 minutes is approx 900 trips of more than 180K trips in our dataset. We decided against removing these and other outliers above 25 minutes since they are approx 10,000 trips or 5% of the observations with valuable trip data and many of them contain trip span of more than 1 day so maybe they are just our cyclists returning bikes late.
- There are approx 8265 observations with missing birth year values. We did not remove those rows as the main trip data like start/end station/timestamp are not missing for the entire dataset so we can still get valuable information from these trip although we are missing the age value for them.

- There are 197 stations with missing ID and name, however the stations latitude/longitude values are not missing so we could still get the station location and city so these rows were not removed as well.

- We derived a column to store trip duration in minutes for more practical analysis of trip duration.

- We divided the age of cyclists to 9 age groups to identify which age group uses our bikes the most.

- Using station latitude and longitude information we created start_city and end_city columns to where most of the trips take place.

## Bivariate Exploration
In this section we explore relationships between different sets of two variables.

In [None]:
plt.figure(figsize=(24, 4))
#plt.subplots(2, 3)

ax = plt.subplot(1, 3, 1)
# Number of trips by gender and user_type
df_trips.groupby(['user_type', 'bike_share_for_all_trip']).size().unstack()['Yes'].plot(kind='bar', ax=ax, title = '"Bike Share for All" Trips by User Type', rot=0)

ax = plt.subplot(1, 3, 2)
# Number of trips by gender and user_type
df_trips.groupby(['member_gender', 'bike_share_for_all_trip']).size().unstack()['Yes'].plot(kind='bar', ax=ax, title = '"Bike Share for All" Trips by Gender', rot=0)

ax = plt.subplot(1, 3, 3)
# In which city are most "Bike Share for All" rides taken??
df_trips.groupby(['start_city', 'bike_share_for_all_trip']).size().unstack()['Yes'].plot(kind='bar', ax=ax, title = '"Bike Share for All" Trips by City', rot=0);

We can see from charts above:
- All <b>"Bike Share for All"</b> trips are taken by <b>Subscribers</b>.
- Also, the vast majority of them are <b>males</b> and their trips are mainly in <b>San Jose, San Francisco and Berkeley</b>.

In [None]:
plt.figure(figsize=(24, 4))

ax = plt.subplot(1, 3, 1)
# Number of trips by gender and user_type
df_trips.groupby(['member_gender', 'user_type']).size().unstack()['Customer'].plot(kind='bar', ax=ax, title = 'Customer Trips by Gender', rot=90)

ax = plt.subplot(1, 3, 2)
# In which city are most "Bike Share for All" rides taken??
df_trips.groupby(['start_city', 'user_type']).size().unstack()['Customer'].plot(kind='bar', ax=ax, title = 'Customer Trips by City', rot=0);

We can see from charts above:
- The vast majority of Customer trips are taken by males.
- The vast majority of Customer trip are in San Francisco.

In [None]:
plt.figure(figsize=(24, 4))

plt.subplot(1,4,1)
sb.barplot(data=df_trips, x='member_gender', y='duration_min', color=sb.color_palette()[0])
#plt.xticks(rotation=90);
plt.title("Mean Trip Duration");

plt.subplot(1,4,2)
sb.barplot(data=df_trips, x='user_type', y='duration_min', color=sb.color_palette()[0])
#plt.xticks(rotation=90);
plt.title("Mean Trip Duration");

plt.subplot(1,4,3)
sb.barplot(data=df_trips, x='bike_share_for_all_trip', y='duration_min', color=sb.color_palette()[0])
#plt.xticks(rotation=90);
plt.title("Mean Trip Duration");

We observe the following insights with regards to mean duration of trips:
- Average trip duration is highest for "Other" gender category. 
- Trip duration is slightly higher for females than males.
- Average trips duration for Customers is more than double that for Subscribers. 
- Cyclists in the "Bike Share for All" program take slightly shorter trips than other Cyclists.

In [None]:
plt.figure(figsize=(24, 4))

plt.subplot(1,3,1)
sb.barplot(data=df_trips, x='day', y='duration_min', color=sb.color_palette()[0])
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

plt.subplot(1,3,2)
sb.barplot(data=df_trips, x='age_group', y='duration_min', color=sb.color_palette()[0])
plt.xticks(rotation=90);
plt.title("Mean Trip Duration");

plt.subplot(1,3,3)
sb.barplot(data=df_trips, x='start_city', y='duration_min', color=sb.color_palette()[0])
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

Note the following:
- Mean trip duration is approx the same across weekdays (with a notable increase in the weekend).
- Trip duration is practically the same across age groups with those in above 75 years age group taking shorter trip on average.
- Average trip durations in Santa Clara is almost double trip durations in other cities.

In [None]:
plt.figure(figsize=(24, 4))

ax = plt.subplot(1, 4, 1)
df_trips.groupby(['day', 'member_gender']).size().unstack().plot(ax = ax)
plt.title("Trips by Gender");
plt.xticks(rotation = 30)
plt.yscale('log')
ticks = [10, 30, 100, 300, 1000, 3000, 10000, 30000]
# Convert ticks into string values, to be displayed along the x-axis
labels = ['{}'.format(v) for v in ticks]
plt.yticks(ticks, labels)

ax = plt.subplot(1, 4, 2)
df_trips.groupby(['day', 'user_type']).size().unstack().plot(ax = ax)
plt.title("Trips by User Type");
plt.xticks(rotation = 30);

ax = plt.subplot(1, 4, 3)
df_trips.groupby(['day', 'bike_share_for_all_trip']).size().unstack().plot(ax = ax)
plt.title("Trips by Bike Share for All Membership");
plt.xticks(rotation = 30);

The above plots confirm the pattern of sharp decline in number of trips on Saturday and Sundays continues across gender, user type, bike share program and age groups. We can also see that Thursday is the day for most trips taken across the four categories.

In [None]:
fig = plt.figure(figsize = [20,5])

ax = plt.subplot(1, 2, 1)
df_trips.groupby([df_trips['start_time'].dt.hour]).size().plot()
plt.title("Trips by Hour of Day");

plt.axvline(8, linestyle = '--', color='r')
plt.xticks([_ for _ in range(24)])
plt.axvline(17,linestyle = '--', color='r');

The line plot above show a very interesting trend. We can clearly see that <b>most of the trips start at 8AM and at 5PM</b> which is a strong sign that cyclists use  our bikes to commute to/from work. We also see a smaller flat peek of around a 10000 trips during the whole week between 10 AM and 3PM.

#### We can summarize our bivariate findings as follows:
- Average trip duration is highest for "Other" gender category. 
- Trip duration is slightly higher for females tha males.
- Average trips duration for Customers is more than double that for Subscribers. 
- Cyclists in the "Bike Share for All" program take slightly shorter trips than other Cyclists.
- Mean trip duration is approx the same across weekdays (with a notable increase in the weekend days).
- Trip duration is practically the same across age groups with those in above 75 years age group taking shorter trip on average.
- The vast majority of them are >males and their trips are mainly in San Jose, San Francisco and Berkeley.

#### The following relationships are interesting:
- The vast majority of Customer trip are in San Francisco.
- Average trip durations in Santa Clara is almost double trip durations in other cities.
- ALL <b>"Bike Share for All"</b> trips are taken by <b>Subscribers</b>.
- Most of the trips start at 8AM and at 5PM which is a strong sign that cyclists use  our bikes to commute to/from work.

## Multivariate Exploration
In this section we explore relationships between different sets of three or more variables.

In [None]:
fig, axes = plt.subplots(ncols = 3, figsize = [24,4])

df_trips[df_trips['member_gender'] == "Male"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Number of Trips by Hour of Day for Males", ax = axes[0]);

df_trips[df_trips['member_gender'] == "Female"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Number of Trips by Hour of Day for Females", ax = axes[1]);

df_trips[df_trips['member_gender'] == "Other"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Number of Trips by Hour of Day for Others", ax = axes[2]);

Note the aggregate hourly trend for trips continues across gender (male/female) but it different for Others with one big peak at 5 PM, a smaller peak at 8 AM and peaks in-between. 

In [None]:
fig, axes = plt.subplots(ncols = 2, figsize = [16,4])

df_trips[df_trips['user_type'] == "Customer"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Number of Trips by Hour of Day for Customer User Type", ax = axes[0]);

df_trips[df_trips['user_type'] == "Subscriber"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Number of Trips by Hour of Day for Subscriber User Type", ax = axes[1]);

Note the difference in the trend for Customers which is has peak during the weekend and is not flat unlike Subscribers

In [None]:
fig, axes = plt.subplots(ncols = 2, figsize = [16,4])

df_trips[df_trips['bike_share_for_all_trip'] == "No"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Trips by Hour of Day, Bike Share = No", ax = axes[0]);

df_trips[df_trips['bike_share_for_all_trip'] == "Yes"].pivot_table(index=df_trips['start_time'].dt.hour, 
                     columns='day', 
                     values='user_type', 
                     aggfunc='count').plot(title = "Trips by Hour of Day, Bike Share = Yes", ax = axes[1]);

The hourly trend for cyclists enrolled in the "Bike Share for All" program looks nothing like the one for other cyclists. It appears cyclists in the program use their bike more evenly through the day unlike the bi-modal pattern of other cyclists.

In [None]:
plt.figure(figsize=(24, 4))

plt.subplot(1,4,1)
sb.pointplot(data=df_trips, x='day', y='duration_min', hue='member_gender')
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

plt.subplot(1,4,2)
sb.pointplot(data=df_trips, x='day', y='duration_min', hue='user_type')
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

plt.subplot(1,4,3)
sb.pointplot(data=df_trips, x='day', y='duration_min', hue='bike_share_for_all_trip')
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

plt.subplot(1,4,4)
sb.pointplot(data=df_trips, x='day', y='duration_min', hue='start_city')
plt.title("Mean Trip Duration");
plt.xticks(rotation=90);

The above plot shows the mean trip duration across weekdays for gender, user type, and Bike Share program, and across cities in the Bay area. Notice the jump for mean duration during the weekend for Customers and Bike Share members.

#### We can summarize our multivariate findings as follows:
- The aggregate hourly trend for trips continues across gender (male/female).
- The aggregate hourly trend for trips continues for Subscribers.
- The aggregate hourly trend for trips continues for those not enrolled in the "Bike Share for All" program.
- The high average duration of trips continues for "Other" gender across the days of the week compared to males and females.
- "Customers" continue to take longer trips across the days of the week.
- The trip duration is practically the same for those enrolled in the "Bike Share" program across the week, while it increases considerably for those not enrolled in the program during the weekend compared to the weekdays.

#### The following relationships are interesting:
- There is a noticeable difference in the hourly trend for "Others" gender type.
- It is noticeably different for Customer user type too with relatively high number of trips in the weekend compared to the trend observed for Subscribers.
- It is also different for those enrolled in the "Bike Share for All" program with trips relatively high across the day.

## Conclusions
The analysis of the dataset provides the following insights:
- The vast majority of trips are taken by males.
- The vast majority of trips are for cyclists not enrolled in the <b>Bike Share for All</b> program.
- The majority of trips are in San Francisco.
- Most of our cyclists are in the 25-34 age group.
- Most trips are taken on Thursdays.
- We see a sharp decline in the number of trip during the weekend.
- The average trip duration is 12 minutes.
- ALL "Bike Share for All" trips are taken by Subscribers. Most of their trips are taken in San Jose followed by San Francisco and then Berkeley.
- Most of the trips start at 8AM and at 5PM which is a strong sign that cyclists use  our bikes to commute to/from work.

## Thank you!