# CSC8001: Assignment 2  (30%) [100 marks]

<img src="data/CitiBike_Logo_p.svg" width=225 height=100 align=left style="padding-right: 20px; padding-bottom: 20px;" />

New York City's Citi Bike program has 10,000 bikes and 600 stations across Manhattan, Brooklyn, Queens and Jersey City. It was designed for quick trips with convenience in mind, and provides a fun and affordable way for visitors and locals to get around New York City.

For this assignment, you will be analysing a dataset from the [**Citi Bike Trip Histories**](https://www.citibikenyc.com/system-data) data. The data file <sup>1</sup> is available in the assignments data folder on the course StudyDesk.  

As part of your analysis, you will be coding various functions. In addition to including the necessary code, be sure to **add useful comments** to your functions to explain what your code is doing and why.  Comments are most useful when they document non-obvious features of the code. It is reasonable to assume that the reader can figure out *what* the code does; it is more useful to explain *why*.  Comments were discussed in notebook *1.1 Intro Python 3*.

**Citi Bike Trip Histories** data includes:
- Trip Duration (seconds)
- Start Time and Date
- Stop Time and Date
- Start Station ID
- Start Station Name
- End Station Name
- End Station ID
- Station Lat/Long
- Bike ID
- User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
- Gender (Zero=unknown; 1=male; 2=female)
- Year of Birth

<sup>1</sup>Citi bike data is provided according to the [NYCBS Data Use Policy](https://www.citibikenyc.com/data-sharing-policy).

In [27]:
# Load required modules for notebook
import pandas as pd
import numpy as np

from datetime import datetime
from dateutil import parser

# Display all plots inline using the
%matplotlib inline
import seaborn; seaborn.set()
import matplotlib.pyplot as plt

## Load the data set   [5 marks]
Load the Citi Bike data set provided:
- Load the data set into a DataFrame called `rides`.  
- Set the `starttime` column as the index and remember to parse the dates.
- Rename the `usertype` column to `User Type`.

**NOTE**: The data set may take a few minutes to load.

In [28]:
## Load Citi Bike Tripdata 
rides = "YOUR CODE HERE"

## Exploring Data 
Let's get familiar with the data set by asking it some questions.  

### Most popular stations for all riders [5 marks]
Each ride starts and ends at a bike sharing station. What are the five (5) most popular stations to end a trip? <br>Function `a1()` should return a Series object indexed by station names in descending order of popularity.  

In [29]:
def a1():
    """ YOUR CODE AND COMMENTS HERE """

In [30]:
a1()

### Most popular stations for Customers [5 marks]
What are the five (5) most popular destinations for Customers? <br>Function `a2()` should return a Series object indexed by station names in descending order of popularity.  

In [31]:
def a2():
    """ YOUR CODE AND COMMENTS HERE """

In [32]:
a2()

### Visiting Central Park [5 marks]
Central Park is a big draw for tourists.  How many *Customer* rides end at a **Central Park** bike sharing station?  <br>Function `a3()` should return a Series object indexed by station names in descending order of popularity. <br><br>NOTE: Many station names indicate that the station is located at the intersection of two streets: **E 17 St & Broadway** or **Broadway & E 14 St**.  Your answer should include any end station whose name contains *Central Park*. 

In [33]:
def a3():
    """ YOUR CODE AND COMMENTS HERE """

In [34]:
a3()

### Average trip duration for Subscribers [5 marks]
Many subscribers use the Citi Bikes to commute to work.  What is the mean trip duration for Subscribers on any workday (Monday - Friday)? <br>Function `a4()` should return the mean value (float to two decimals). 

In [35]:
def a4():
    """ YOUR CODE AND COMMENTS HERE """

In [36]:
a4()

### Longest trip duration of any rider [5 marks]
What is the longest trip duration for any rider?  <br>Function a5() should return an integer. 

In [37]:
def a5():
    """ YOUR CODE AND COMMENTS HERE """

In [38]:
a5()

### What is the breakdown of the rides in our data set by user type? [5 marks]
How many of the rides in our data set are for Customers and how many for Subscribers? 
<br>Function `a6()` should return a Series object indexed by the rider type.

In [39]:
def a6():
    """ YOUR CODE AND COMMENTS HERE """

In [40]:
a6()

## Weekday usage [10 marks]

Does Citi Bike rider usage vary by the day of the week?  Are there some day's of the week  which have more Citi Bike trips?  Is their a difference in usage between Customers and Subscribers? 

- Create a pandas DataFrame with the number of rides by User Type for each day of the week.  Use  `starttime` to determine the rides week day.  Your DataFrame should be similar to the one show below, but your data values will vary.

User Type<br>Week Day |	Customer<br>&nbsp; |	Subscriber<br>&nbsp;
---: | ---: | ---:
0  |	    1679  |	53569
1  |		1222  |		59323
2  |		1765  |		74118
3  |		1981  |		85190
4  |		5403  |		99728
5  |		7004  |	60580
6  |		5489  |		52427

In [41]:
def a7():
    """ YOUR CODE AND COMMENTS HERE """


In [42]:
a7()

## Plotting weekday rider usage [10 marks]

Provide a plot which shows the weekday usage pattern by user type.  Include a plot line for All weekday rides.

Your plot should be similar to the example below but your values will vary.

<img src="data/plt-day_of_week.png" width=524 height=343 align=center style="padding-right: 20px; padding-bottom: 20px;" />

In [43]:
def a8():
    """ YOUR CODE AND COMMENTS HERE """


In [44]:
a8()

## Time of day usage [10 marks]
What does the rider usage look like as a function of the time of day?  

- Create a pandas DataFrame with the number of rides by User Type for each hour of the day. Use starttime to determine each ride's hour. Your DataFrame should be similar to the one show below, but your data values will vary.

User Type<br>Hour |	Customer<br>&nbsp; |	Subscriber<br>&nbsp;
---: | ---: | ---:
0  | 	276  | 	4073
1  | 	202  | 	2460
2  | 	98  | 	1434
3  | 	64  | 	951
4  | 	54  | 	1010
5  | 	37  | 	3280
... | ... | ...


In [45]:
def a9():
    """ YOUR CODE AND COMMENTS HERE """


In [46]:
a9()

## Plotting time of day usage [10 marks]

Provide a plot which shows the time of day  usage pattern by user type.  Include a plot line for All hourly rides.

Your plot should be similar to the example below but your values will vary.

<img src="data/plt-hour_of_day.png" width=509 height=357 align=left style="padding-right: 20px; padding-bottom: 20px;" />

In [47]:
def a10():
    """ YOUR CODE AND COMMENTS HERE """


In [48]:
a10()


## Workdays vs Weekends [15 marks]
Hourly traffic seems to be very bimodal for Subscribers.  It peaks around 8:00 in the morning and then again around 5:00 at night.

- Create a pandas DataFrame with the number of rides by  Hour and User Type for Workdays and Weekends. Use starttime to determine each ride's hour. Your DataFrame should be similar to the one show below, but your data values will vary.

&nbsp;<br>&nbsp; | User Type<br>Hour |	Customer<br>&nbsp; |	Subscriber<br>&nbsp;
---: | ---: | ---: | ---:
Weekday | 0  | 	124  | 	2194
&nbsp; | 1  | 	120  | 	1238
&nbsp; | 2  | 	53  | 	716
&nbsp; | 3  | 	30  | 	520
.... | .... | .... | ....
Weekend | 0  | 	152  | 	1879
&nbsp; | 1  | 	82  | 	1222
&nbsp; | 2  | 	45  | 	718
&nbsp; | 3  | 	34  | 	431
&nbsp; | 4  | 	29  | 	288
.... | .... | .... | ....


In [49]:
def a11():
    """ YOUR CODE AND COMMENTS HERE """


In [50]:
a11()

## Plotting workdays vs weekends [10 marks]

Provide a plot which shows the number of rides by Hour and User Type.  One plot for Workdays and the second for Weekends.
Include a plot line for All hourly rides.  Your plots should be similar to the example below but your values will vary.

<div style="float: left; "><img src="data/plt-weekday-rides.png" width=509 height=357 style="padding: 5px; "/></div>
<div style="float: left; "><img src="data/plt-weekend-rides.png" width=509 height=357  style="padding: 5px; "/></div>

In [51]:
def a12():
    """ YOUR CODE AND COMMENTS HERE """


In [52]:
a12()