# Project: 2019 Ford Gobike's Bay Wheel Data Visualization 

## Table of Contents 
<a name="top"></a>
<ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#q">Question</a></li>
    <li><a href="#dw">Data Wrangling</a></li>
    <li><a href="eda">Exploratory Data Analysis</a></li>
    <li><a href="edas">Explanatory Data Analysis</a></li>
    <li><a href="limit">Limitation</a></li>
    <li><a href="conclu">Conclusion</a></li>
</ul>

<a id='intro'></a>
## Introduction
<a href="#top">Back to Table of Content</a>
> I have chosen a dataset on Ford Gobik's Bay Wheel Data. I am interested to see the trends such as consumer's age range e.g. customer segmentation, their starting and ending point of rental as well as frequencies of services usage. I will visualize the data by using seaborn and matplotlib.   

<a id='dw'></a>
## Data Wrangling
<a href="#top">Back to Table of Content</a>

In [69]:
#import libraries
import pandas as pd
import glob
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
sns.set(style="darkgrid")

In [87]:
#combine all the year 2019 files in the dataset folder 
path = r'C:\Users\isaph\Desktop\Dataset' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df) 

df = pd.concat(li, axis=0, ignore_index=True)

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


In [80]:
df.to_csv('2019bikedata.csv')

In [81]:
df = pd.read_csv('2019bikedata.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [88]:
df.head()

Unnamed: 0,duration_sec,start_time,end_time,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bike_id,user_type,bike_share_for_all_trip,rental_access_method
0,80825,2019-01-31 17:57:44.6130,2019-02-01 16:24:49.8640,229.0,Foothill Blvd at 42nd Ave,37.775745,-122.213037,196.0,Grand Ave at Perkins St,37.808894,-122.25646,4861,Subscriber,No,
1,65900,2019-01-31 20:58:33.8860,2019-02-01 15:16:54.1730,4.0,Cyril Magnin St at Ellis St,37.785881,-122.408915,134.0,Valencia St at 24th St,37.752428,-122.420628,5506,Subscriber,No,
2,62633,2019-01-31 18:06:52.9240,2019-02-01 11:30:46.5300,245.0,Downtown Berkeley BART,37.870139,-122.268422,157.0,65th St at Hollis St,37.846784,-122.291376,2717,Customer,No,
3,44680,2019-01-31 19:46:09.7190,2019-02-01 08:10:50.3180,85.0,Church St at Duboce Ave,37.770083,-122.429156,53.0,Grove St at Divisadero,37.775946,-122.437777,4557,Customer,No,
4,60709,2019-01-31 14:19:01.5410,2019-02-01 07:10:51.0650,16.0,Steuart St at Market St,37.79413,-122.39443,28.0,The Embarcadero at Bryant St,37.787168,-122.388098,2100,Customer,No,


In [89]:
#adding start_date and start_time
df['start_date'] = pd.to_datetime(df['start_time']).dt.date
df['start_time'] = pd.to_datetime(df['start_time']).dt.time

In [90]:
#adding ending_time
df['end_date'] = pd.to_datetime(df['end_time']).dt.date
df['end_time'] = pd.to_datetime(df['end_time']).dt.time

In [92]:
#convert duration seconds to hours 
df['duration_min']=df['duration_sec']/60
df['duration_hr']=df['duration_min']/60
df['duration_min']=df['duration_min'].astype(float)
df['duration_hr']=df['duration_hr'].astype(float)

In [94]:
#drop unnessesary columns
df.drop(['start_station_latitude','start_station_longitude', 'end_station_latitude', 'end_station_longitude','bike_share_for_all_trip'], axis=1, inplace=True)

In [95]:
#adding day and month colum extracting from start_date
df['day'] = df['start_date'].apply(lambda r:r.day).astype(int)
df['month'] = df['start_date'].apply(lambda r:r.month).astype(int)

In [98]:
#drop any NAN/missing value columns in the dataframe
df.dropna(axis=0,how='any',inplace=True)

In [107]:
#convert data type from float to str
df['start_station_id'] = df['start_station_id'].astype(str)
df['end_station_id'] = df['end_station_id'].astype(str)

In [108]:
df.isnull().sum()

duration_sec            0
start_time              0
end_time                0
start_station_id        0
start_station_name      0
end_station_id          0
end_station_name        0
bike_id                 0
user_type               0
rental_access_method    0
start_date              0
end_date                0
duration_min            0
duration_hr             0
day                     0
month                   0
dtype: int64

In [109]:
#df overview
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21859 entries, 1241040 to 2506660
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   duration_sec          21859 non-null  int64  
 1   start_time            21859 non-null  object 
 2   end_time              21859 non-null  object 
 3   start_station_id      21859 non-null  object 
 4   start_station_name    21859 non-null  object 
 5   end_station_id        21859 non-null  object 
 6   end_station_name      21859 non-null  object 
 7   bike_id               21859 non-null  int64  
 8   user_type             21859 non-null  object 
 9   rental_access_method  21859 non-null  object 
 10  start_date            21859 non-null  object 
 11  end_date              21859 non-null  object 
 12  duration_min          21859 non-null  float64
 13  duration_hr           21859 non-null  float64
 14  day                   21859 non-null  int32  
 15  month      

In [110]:
#rows and columns 
df.shape

(21859, 16)

In [111]:
df.head(10)

Unnamed: 0,duration_sec,start_time,end_time,start_station_id,start_station_name,end_station_id,end_station_name,bike_id,user_type,rental_access_method,start_date,end_date,duration_min,duration_hr,day,month
1241040,395,12:19:51,12:26:27,425.0,Bird Ave at Willow St,312.0,San Jose Diridon Station,102949,Customer,app,2019-06-17,2019-06-17,6.583333,0.109722,17,6
1241041,750,13:27:27,13:39:57,425.0,Bird Ave at Willow St,296.0,5th St at Virginia St,655088,Subscriber,app,2019-06-27,2019-06-27,12.5,0.208333,27,6
1241128,540,20:09:37,20:18:38,406.0,Parkmoor Ave at Race St,300.0,Palm St at Willow St,382355,Subscriber,app,2019-06-19,2019-06-19,9.0,0.15,19,6
1241147,446,07:59:58,08:07:24,300.0,Palm St at Willow St,313.0,Almaden Blvd at San Fernando St,295032,Subscriber,app,2019-06-17,2019-06-17,7.433333,0.123889,17,6
1241165,458,08:06:32,08:14:11,301.0,Willow St at Vine St,313.0,Almaden Blvd at San Fernando St,314345,Subscriber,app,2019-06-18,2019-06-18,7.633333,0.127222,18,6
1241200,81,11:47:27,11:48:49,423.0,South San Jose State (7th St at Humboldt St),423.0,South San Jose State (7th St at Humboldt St),861099,Customer,app,2019-06-26,2019-06-26,1.35,0.0225,26,6
1241201,1516,20:46:14,21:11:30,423.0,South San Jose State (7th St at Humboldt St),417.0,Park Ave at Race St,295032,Subscriber,clipper,2019-06-20,2019-06-20,25.266667,0.421111,20,6
1241207,457,20:25:06,20:32:43,423.0,South San Jose State (7th St at Humboldt St),327.0,5th St at San Salvador St,163217,Subscriber,app,2019-06-20,2019-06-20,7.616667,0.126944,20,6
1241208,429,20:25:23,20:32:32,423.0,South San Jose State (7th St at Humboldt St),327.0,5th St at San Salvador St,638483,Subscriber,app,2019-06-20,2019-06-20,7.15,0.119167,20,6
1241214,829,16:18:15,16:32:05,427.0,Auzerais Ave at Lincoln Ave,282.0,Market St at Park St,363691,Subscriber,app,2019-06-23,2019-06-23,13.816667,0.230278,23,6


<a id='eda'></a>
## Exploratory Data Analysis
<a href="#top">Back to Table of Content</a>

##

<a id='edas'></a>
## Explanatory Data Analysis
<a href="#top">Back to Table of Content</a>

<a id='limit'></a>
## Limitation
<a href="#top">Back to Table of Content</a>

<a id='conclu'></a>
## Conclusion
<a href="#top">Back to Table of Content</a>