## ISOM5160 Exercise 02 - City Bike Rental

---
This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system (Washington DC) with the corresponding weather and seasonal information. The raw dataset can be downloaded at [http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset](http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset).

The dataset contains 10886 rows and 12 columns. In this notebook, we will only work on the following columns.

* ```datetime``` - date time in the format of mm/dd/yyyy hh:00
* ```season``` - spring, summer, fall or winter
* ```holiday``` - whether the day is a holiday or not
* ```workingday``` - whether the day is a working day or not
* ```weather``` - weather of the hour
 - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
 - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
 - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
 - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
* ```casual``` - rental count of casual users
* ```registered``` -  rental count of registered users
* ```count``` -  total bike rentals including both casual and registered


#### Task 2.0: Imports packages and the data file

In [1]:
import numpy as np
import pandas as pd

# To print multiple outputs for one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

rentals = pd.read_csv('bike_rentals.csv')

rentals.shape
rentals.dtypes

# Display the row 15 to row 24
rentals.iloc[15:25, :]  

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
15,1/1/2011 15:00,spring,0,0,2,18.04,21.97,77,19.9995,40,70,110
16,1/1/2011 16:00,spring,0,0,2,17.22,21.21,82,19.9995,41,52,93
17,1/1/2011 17:00,spring,0,0,2,18.04,21.97,82,19.0012,15,52,67
18,1/1/2011 18:00,spring,0,0,3,17.22,21.21,88,16.9979,9,26,35
19,1/1/2011 19:00,spring,0,0,3,17.22,21.21,88,16.9979,6,31,37
20,1/1/2011 20:00,spring,0,0,2,16.4,20.455,87,16.9979,11,25,36
21,1/1/2011 21:00,spring,0,0,2,16.4,20.455,87,12.998,3,31,34
22,1/1/2011 22:00,spring,0,0,2,16.4,20.455,94,15.0013,11,17,28
23,1/1/2011 23:00,spring,0,0,2,18.86,22.725,88,19.9995,15,24,39
24,1/2/2011 0:00,spring,0,0,2,18.86,22.725,88,19.9995,4,13,17


#### Task 2.1: Split the column 'datetime' into two columns (one for 'date', one for 'time'), and then find out the number of unique days in this dataset.

In [2]:
date_list = []
time_list = []

for i in range(len(rentals)):
    lst = rentals.iloc[i, 0].split(" ")
    date_list.append(lst[0])
    time_list.append(lst[1])
    

rentals.insert(loc=1, column='date', value=date_list, allow_duplicates=True)
rentals.insert(loc=2, column='time', value=time_list, allow_duplicates=True)

# Verify by examining a subset
rentals.iloc[320:330, :]


print("The dataset contains %d days of data.\n" % len(rentals['date'].unique()))

Unnamed: 0,datetime,date,time,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
320,1/14/2011 20:00,1/14/2011,20:00,spring,0,1,1,7.38,12.12,59,0.0,0,68,68
321,1/14/2011 21:00,1/14/2011,21:00,spring,0,1,1,6.56,11.365,69,0.0,4,48,52
322,1/14/2011 22:00,1/14/2011,22:00,spring,0,1,2,6.56,11.365,69,0.0,2,34,36
323,1/14/2011 23:00,1/14/2011,23:00,spring,0,1,2,7.38,12.12,55,0.0,1,26,27
324,1/15/2011 0:00,1/15/2011,0:00,spring,0,0,1,7.38,12.12,55,0.0,3,25,28
325,1/15/2011 1:00,1/15/2011,1:00,spring,0,0,2,6.56,9.85,59,6.0032,2,18,20
326,1/15/2011 2:00,1/15/2011,2:00,spring,0,0,2,6.56,9.85,59,6.0032,0,12,12
327,1/15/2011 3:00,1/15/2011,3:00,spring,0,0,2,6.56,11.365,59,0.0,1,7,8
328,1/15/2011 4:00,1/15/2011,4:00,spring,0,0,2,6.56,11.365,59,0.0,0,5,5
329,1/15/2011 5:00,1/15/2011,5:00,spring,0,0,1,6.56,11.365,59,0.0,0,1,1


The dataset contains 456 days of data.



#### Task 2.2 Add a new column 'daytype' into the dataframe to show each row’s day type
 -  The new column 'daytype' will contain 3 possible values: holiday, workingday, 'weekend'.
 -  A date is a weekend if it is not a holiday nor a workingday.

In [3]:
daytype = []

for ind in rentals.index:
    if (rentals.workingday[ind]== 1):
        daytype.append('workingday')
    elif (rentals.holiday[ind] == 1):
        daytype.append('holiday')
    else: 
        daytype.append('weekend')
        continue

# Insert the list into the dataframe as a column
rentals.insert(loc=2, column='daytype', value=daytype, allow_duplicates=True)

# Verify by examining a subset
rentals.iloc[320:329, :]


Unnamed: 0,datetime,date,daytype,time,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
320,1/14/2011 20:00,1/14/2011,workingday,20:00,spring,0,1,1,7.38,12.12,59,0.0,0,68,68
321,1/14/2011 21:00,1/14/2011,workingday,21:00,spring,0,1,1,6.56,11.365,69,0.0,4,48,52
322,1/14/2011 22:00,1/14/2011,workingday,22:00,spring,0,1,2,6.56,11.365,69,0.0,2,34,36
323,1/14/2011 23:00,1/14/2011,workingday,23:00,spring,0,1,2,7.38,12.12,55,0.0,1,26,27
324,1/15/2011 0:00,1/15/2011,weekend,0:00,spring,0,0,1,7.38,12.12,55,0.0,3,25,28
325,1/15/2011 1:00,1/15/2011,weekend,1:00,spring,0,0,2,6.56,9.85,59,6.0032,2,18,20
326,1/15/2011 2:00,1/15/2011,weekend,2:00,spring,0,0,2,6.56,9.85,59,6.0032,0,12,12
327,1/15/2011 3:00,1/15/2011,weekend,3:00,spring,0,0,2,6.56,11.365,59,0.0,1,7,8
328,1/15/2011 4:00,1/15/2011,weekend,4:00,spring,0,0,2,6.56,11.365,59,0.0,0,5,5


#### Task 2.3: Find out the average daily rentals of holidays, workingdays and weekends respectively

In [4]:
# Get the unique dates
unique_days = rentals.loc[:, ['date', 'daytype']].drop_duplicates()

#unique_days

unique_workingday = sum(unique_days.daytype == 'workingday')
unique_holiday = sum(unique_days.daytype == 'holiday')
unique_weekend = sum(unique_days.daytype == 'weekend')


unique_days_count = rentals.loc[:, ['date', 'daytype', 'count']]

workingday_rentals = unique_days_count.loc[unique_days_count.daytype == 'workingday', ['count']].sum()
weekend_rentals = unique_days_count.loc[unique_days_count.daytype == 'weekend', ['count']].sum()
holiday_rentals = unique_days_count.loc[unique_days_count.daytype == 'holiday', ['count']].sum()
      



# Display the daily average rentals of each day type
print("Average daily rentals for %s is %.1f" % ('workingday',(workingday_rentals/unique_workingday))) # to be completed
print("Average daily rentals for %s is %.1f" % ('weekend',(weekend_rentals/unique_weekend))) # to be completed
print("Average daily rentals for %s is %.1f" % ('holiday',(holiday_rentals/unique_holiday))) # to be completed


Average daily rentals for workingday is 4600.0
Average daily rentals for weekend is 4523.2
Average daily rentals for holiday is 4446.8


#### Task 2.4: Define a function *num_of_rentals(date)* to return the number of bike rentals on a particular date

In [5]:
def num_of_rentals(date): 
    rentals_count = rentals[rentals['date']==date]['count'].sum()
    return rentals_count
    

# To test the function
num_of_rentals('1/1/2011')

985

#### Task 2.5: Find out the date having the most number of rentals (must use the function defined in 2.5)

In [7]:
unique_days = unique_days.reset_index()

count_rentals = []

for i in range(len(unique_days)):
    count_rentals.append(num_of_rentals(unique_days['date'][i]))

top_rental_date = unique_days['date'][count_rentals.index(max(count_rentals))]
top_rental = count_rentals[count_rentals.index(max(count_rentals))]

    
print("%s has %d rentals, which is the hightest among all dates" % (top_rental_date, top_rental)) # to be completed

9/15/2012 has 8714 rentals, which is the hightest among all dates
