# Part II - FordGo Bike sharing system user habits
## by Fatima Brazao

## Investigation Overview

> this presentation aims at providing insight into the users habits of Ford GO bike sharing system in the month February 2019.

>From the dataset exploration I observe that:

>This bike sharing system is used for short rides 

>Most bikes rides are done by subscribers 91%

>Males did 75% of those rides while females did 25% and other gender 2%

> Most trips are performed by users in the age range of 30s to 50s(1980-2000)

>Throught the day most rides happen in the morning and afternoon some in the evening and little at night

> Most popular stations have ids between 50 and 100

> Most popular days in February 2019 were workdays namely: 5,6,7, 11,12,19,20,21,22,28! Weekends are less popular!

> Somehow users tend to take more frequently bikes with a higher id!! Newer??

>Hipothesis: The bike sharing system is used mostly for transport to and from work for short distances!!


## Dataset Overview

> The dataset from Ford Go Bike sharing system has attributes from  183 412 individual rides. The attributes are the duration(in seconds)(NUMERIC variable), the start and end date and time, as well as start and end location by station id and name(ORDINAL variable), and by longitude and latitude (removed for this study). Additionally data about the user_type(costumer or subscriber), member birth year and gender(CATEGORICAL variables).


In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import calendar
import datetime

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [None]:
# load in the dataset into a pandas dataframe
df=pd.read_csv("final-fordgobike-tripdata.csv")


## Hypothesis: 
### The bike sharing system is used mostly by subscribers for short duration transport  

#### This hypothesis is supported by data. Following we will present a set of visualizations that show who are the users of the bike system and their habits in February 2019.

>>Most users are subscribers and their gender

>>February usage highlighting the workdays

>>Show the usage across the day showing that morning and afternoon have the highest usage

>>That the duration tends to be short


##  Who are the users? 

>Which user type?

>Distribution of gender?

>Distribution of gender by type of user?

>Age range of the frequent users?


In [None]:
#let's see the user_types with a donut plot
#df['user_type'].value_counts()

sorted_counts=df['user_type'].value_counts()
sorted_counts
fig=plt.figure(figsize=[7,7])
fig.patch.set_facecolor('white')
fontsize=16
plt.title("USER TYPE RELATIVE FREQUENCY", fontdict={'fontsize': fontsize})
plt.pie(sorted_counts, labels=sorted_counts.index, startangle=90, counterclock=False, colors=['saddlebrown','orangered'],autopct='%1.0f%%', wedgeprops={'width':0.7})
plt.legend(bbox_to_anchor=(1, 1), loc='upper left', borderaxespad=2,title="User type frequency")



In [None]:
# Use a pie chart to visualize the relative frequency for the categorical variable member_gender
sorted_counts=df['member_gender'].value_counts()
sorted_counts
fig=plt.figure(figsize=[7,7])
fig.patch.set_facecolor('white')
plt.title("USER GENDER RELATIVE FREQUENCY",fontdict={'fontsize': fontsize})
plt.pie(sorted_counts, labels=sorted_counts.index, startangle=90, counterclock=False, colors=['powderblue','beige','gold'],autopct='%1.0f%%')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left', borderaxespad=2,title="Gender frequency")



In [None]:
#Proportions between user_type and gender

CrosstabResult=pd.crosstab(index=df['user_type'],columns=df['member_gender'])
print(CrosstabResult)

# Grouped bar chart between user_type and member_gender

CrosstabResult.plot.bar(figsize=[10,8])
 
plt.title('PORPORTIONS BETWEEN USER TYPE AND GENDER',fontdict={'fontsize': fontsize})
plt.ylabel('Count')

In [None]:
#Let's see now how what the age range by the birth year frequency 
base_color=sb.color_palette()[0]
fig=plt.figure(figsize=[12,10])
fig.patch.set_facecolor('white')
sb.countplot(data=df, x='member_birth_year', color=base_color).set(title="Frequency of member birth year in the rides sample")
plt.title("AGE RANGE BY USER BIRTH YEAR DISTRIBUTION ",fontdict={'fontsize': fontsize} )
plt.xticks(rotation=90)


##  What are the users habits?

>What is the usual duration of the bike rides?

>At what part of the day?

>What's the weekly pattern of usage?

## Bike rides duration

> The trend is beteween  8 and 16 minutes

In [None]:
#what is the bike ride duration data distribution?

# in seconds
base_color=sb.color_palette()[0]
fig=plt.figure(figsize=[10,8])
fig.patch.set_facecolor('white')
plt.xscale('log')
plt.hist(data=df, x='duration_sec', bins=9000, color=base_color)
plt.legend()
plt.ylabel('Counts ')
plt.xlabel('Duration (sec)')
plt.title('DISTRIBUTION OF THE DURATION OF BIKE RIDES IN SECONDS',fontdict={'fontsize': fontsize}) 



In [None]:

df['start_time'] = df['start_time'].astype('datetime64[ns]')

def get_part_of_the_day(Datetime):
    a = Datetime
    h = a.hour
    return (
        "morning"
        if 5 <= h <= 11
        else "afternoon"
        if 12 <= h <= 17
        else "evening"
        if 18 <= h <= 22
        else "night"
    )    
df['day_part']=df['start_time'].apply(get_part_of_the_day)
df['day_part'].value_counts()
plt.figure(figsize=[10,8])
plt.hist(data=df, x='day_part', color=base_color)
plt.title('COUNTS OF BIKE RIDES AT DIFFERENT PARTS OF THE DAY',fontdict={'fontsize': fontsize})
plt.xlabel('Part of the day')
plt.ylabel('Count')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left', borderaxespad=2,title="Bike rides trough the day")

In [None]:
#which days were more popular in February 2019 our dataset time period?
workdays=[1,4,5,6,7,8,11,12,13,14,15,18,19,20,21,22,25,26,27,28]
weekends=[2,3,9,10,16,17,23,24]

def get_weekday_type(int):
    d=int
    daytype=''
    if d in workdays:
       daytype='workday'
    elif d in weekends: 
       daytype='weekend'
    return daytype
        
df['weekday_type']=df['day'].apply(get_weekday_type)

plt.figure(figsize=[10,8])

sb.countplot(data=df, x='day', hue='weekday_type')
plt.title('FEBRUARY 2019 DAILY BIKE RIDES',fontdict={'fontsize': fontsize})

In [None]:
# Use this command if you are running this file in local
!jupyter nbconvert Part_II_slide_deck_FordGoBikeSharingSystem.ipynb --to slides --post serve --no-input --no-prompt

### Submission
If you are using classroom workspace, you can choose from the following two ways of submission:

1. **Submit from the workspace**. Make sure you have removed the example project from the /home/workspace directory. You must submit the following files:
   - Part_I_notebook.ipynb
   - Part_I_notebook.html or pdf
   - Part_II_notebook.ipynb
   - Part_I_slides.html
   - README.md
   - dataset (optional)


2. **Submit a zip file on the last page of this project lesson**. In this case, open the Jupyter terminal and run the command below to generate a ZIP file. 
```bash
zip -r my_project.zip .
```
The command abobve will ZIP every file present in your /home/workspace directory. Next, you can download the zip to your local, and follow the instructions on the last page of this project lesson.


In [None]:
zip -r my_project.zip