# Part I - Ford Bike Dataset Explanatory
## By Rellika Kisyula

## Investigation Overview
For this analysis, I want to see how users utilize the bike sharing system. The main focus is on when the people ride the bikes in terms hours of the day. Is there a peak time during the day when people use the bikes? 

For this analysis, I would like to see the most popular starting points for bike rides.

Finally, would also like to explore the characteristics of the users who used the bike sharing system. I would like to understand when certain categories of users (gender, user type, and age) use the bike sharing system.

## Dataset Overview
The Ford GoBike dataset contains anonymized trip data for the bike-sharing system from June 2017 to April 2019. **However, I decided to only use the data in the year 2018 (January 2018 to December 2018).** The data includes information on individual bike rides such as trip duration, start and end time, start and end station, bike ID, and user type. Additionally, demographic data such as age, gender, and membership type is provided for some users.

I manually downloaded the datasets from the [System Data | Bay Wheels | Lyft](https://www.lyft.com/bikes/bay-wheels/system-data) page. The datasets were in the form of a zip file. I extracted the zip files and saved the csv files in the `data` folder as this notebook. The zip files are in `data/zip_files` folder.

## Preparation of Data
I then unzipped the files using `zipfile` and saved them in the `data/data_files` folder. I then read the csv files into a pandas dataframe and concatenated them into one dataframe. I then saved the dataframe as a csv file in the `data` folder as `bike_data.csv`. 

After performing the **wrangling processes**, **data preparation** such as adding new columns **filtering** the outlier age, I saved the dataframe as a csv file in the `data` folder as `part_II_bike_data.csv`. This is the dataset that I will use for the analysis.

In [21]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [22]:
# Set the default color palette
base_color = sb.color_palette()[0]

### Load the specifically prepared dataset for this analysis. `part_II_bike_data.csv`

In [23]:
# Read the bike_data.csv file into a dataframe.
bike_data = pd.read_csv('data/part_II_bike_data.csv')

# If you get an error, make sure that you have run the code in the part I notebook.

In [24]:
bike_data.sample(5)

Unnamed: 0,duration_sec,start_time,end_time,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,...,member_birth_year,member_gender,bike_share_for_all_trip,distance,member_age,month_of_year,day_of_week,hour,period_of_day,member_age_group
749109,667,2018-12-29 22:11:33.302,2018-12-29 22:22:40.395,368.0,Myrtle St at Polk St,37.785434,-122.419622,87.0,Folsom St at 13th St,37.769757,...,1993.0,Male,No,1.777378,25.0,December,Saturday,22,Late Night,20-30
642845,356,2018-05-19 11:46:32.961,2018-05-19 11:52:29.835,195.0,Bay Pl at Vernon St,37.812314,-122.260779,178.0,Broadway at 30th St,37.819381,...,1991.0,Female,No,0.792315,27.0,May,Saturday,11,Morning,20-30
192613,2414,2018-11-06 15:21:24.722,2018-11-06 16:01:38.845,6.0,The Embarcadero at Sansome St,37.80477,-122.403234,108.0,16th St Mission BART,37.76471,...,1987.0,Male,No,4.690628,31.0,November,Tuesday,15,Evening,30-40
556341,573,2018-06-05 06:39:51.418,2018-06-05 06:49:25.065,109.0,17th St at Valencia St,37.763316,-122.421904,58.0,Market St at 10th St,37.776619,...,1983.0,Male,No,1.531649,35.0,June,Tuesday,6,Morning,30-40
1191703,372,2018-08-22 13:09:23.579,2018-08-22 13:15:35.875,53.0,Grove St at Divisadero,37.775946,-122.437777,74.0,Laguna St at Hayes St,37.776435,...,1992.0,Male,No,1.01513,26.0,August,Wednesday,13,Afternoon,20-30
