# Insights Of The Ford GoBike System Data Exploration
## by Michal Chiagoziem Ezeh

## Investigation Overview

From the Ford GoBike dataset exploration, I was able to observe the relationship of different features and I answered my question of how the time duration is dependent on variables such as user type, birth year and the time and day etc.


## Dataset Overview

This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. The data was obtained from the dataset options 
presented by Udacity in section 1.1. The dataset contains 183,412 records and 16 features:
<li>start_time
<li>end_time
<li>start_station_id
<li>start_station_name
<li>start_station_latitude
<li>start_station_longitude
<li>end_station_id
<li>end_station_name
<li>end_station_latitude
<li>end_station_longitude
<li>bike_id
<li>user_type
<li>member_birth_year
<li>member_gender
<li>bike_share_for_all_trip.

In [None]:
# Run this cell if you encounter errors with seaborn later on
#!pip install --upgrade seaborn



In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import os
import requests
import csv

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

%matplotlib inline

In [None]:
# load in the dataset into a pandas dataframe
df = pd.read_csv ('201902-fordgobike-tripdata.csv')
df.head()

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## Visualization 1

## Univariate Exploration

This exploration has to do with investigating distributions of individual variables. The visualization below is showing the distribution of ages in the dataset. We can see from the distribution of the age of users that most users are between 20 to 45 years old.

In [None]:
# Plotting the distribution of user age, derived from member's birth year.
binsize = 1
bins = np.arange(0, df['member_birth_year'].astype(float).max()+binsize, binsize)
tick = [1939, 1949, 1959, 1969, 1979, 1989, 1999, 2009]
label = [2019 - i for i in tick]

plt.figure(figsize=[8, 5])
plt.hist(data = df.dropna(), x = 'member_birth_year', bins = bins)
plt.axis([1939, 2009, 0, 12000])
plt.xticks(tick, label)
plt.gca().invert_xaxis()
plt.title('Distribution of User Age')
plt.xlabel('Age (years)')
plt.ylabel('Number of Users');

## Visualization 2

## Univariate Exploration

The visualization below is showing the start station id distribution in the dataset. We can see from the distribution that some stations in this dataset see more activity than others.

In [None]:
# Plotting start station id distribution.
binsize = 1
bins = np.arange(0, df['start_station_id'].astype(float).max()+binsize, binsize)

plt.figure(figsize=[20, 8])
plt.xticks(range(0, 401, 10))
plt.hist(data = df.dropna(), x = 'start_station_id', bins = bins)
plt.title('Distribution of Start Stations')
plt.xlabel('Start Station')
plt.ylabel('Number of Stations');

## Visualization 3

## Bivariate Exploration

This exploration has to do with investigating relationships between two pairs of variables in this dataset. The visualization below is showing the relationship between the various user type and the time duration. It is observed that higher percentage of customers are taking longer trips then compared to subscribers.

In [None]:
plt.figure(figsize = [8, 5])
base_color = sb.color_palette()[1]
sb.boxplot(data = df, x = 'user_type', y = 'duration_sec', color = base_color)
plt.ylim([-10, 2500])
plt.xlabel('User Type')
plt.ylabel('Duration (sec)');

## Conclusion

From the Ford GoBike dataset exploration, I was able to get insights on how different variable features had effect on the time duration using the univariate to bivariate and multivariate exploration. i.e, the time duration is dependednt on varibales such as user type, birth year and the time and day etc.

### Generate Slideshow
Once you're ready to generate your slideshow, use the `jupyter nbconvert` command to generate the HTML slide show.  

In [None]:
# Use this command if you are running this file in local
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt

> In the classroom workspace, the generated HTML slideshow will be placed in the home folder. 

> In local machines, the command above should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel. 

### Submission
If you are using classroom workspace, you can choose from the following two ways of submission:

1. **Submit from the workspace**. Make sure you have removed the example project from the /home/workspace directory. You must submit the following files:
   - Part_I_notebook.ipynb
   - Part_I_notebook.html or pdf
   - Part_II_notebook.ipynb
   - Part_I_slides.html
   - README.md
   - dataset (optional)


2. **Submit a zip file on the last page of this project lesson**. In this case, open the Jupyter terminal and run the command below to generate a ZIP file. 
```bash
zip -r my_project.zip .
```
The command abobve will ZIP every file present in your /home/workspace directory. Next, you can download the zip to your local, and follow the instructions on the last page of this project lesson.
