## Question 1
Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)

I get
```
season_desc
Fall           0.711445
Spring         0.321700
Summer         0.554557
Winter         0.419368
```
Which clearly looks wrong. Figure out what's wrong with the original data and fix it.

In [1]:
from pandas import DataFrame, Series

In [2]:
import pandas as pd

In [3]:
import numpy as np

In [4]:
weather_data = pd.read_table('data/daily_weather.tsv')

In [5]:
season_mapping = {'Spring': 'Winter', 'Winter': 'Fall', 'Fall': 'Summer', 'Summer': 'Spring'}

In [6]:
def fix_seasons(x):
    return season_mapping[x]

In [7]:
weather_data['season_desc'] = weather_data['season_desc'].apply(fix_seasons)

In [8]:
weather_data.pivot_table(index='season_desc', values='temp', aggfunc=np.mean)

season_desc
Fall      0.419368
Spring    0.554557
Summer    0.711445
Winter    0.321700
Name: temp, dtype: float64

In this case, a pivot table is not really required, so a simple use of `groupby` and `mean()` will do the job.

In [9]:
weather_data.groupby('season_desc')['temp'].mean()

season_desc
Fall      0.419368
Spring    0.554557
Summer    0.711445
Winter    0.321700
Name: temp, dtype: float64

##Question 2
Various of the columns represent dates or datetimes, but out of the box pd.read_table won't treat them correctly. This makes it hard to (for example) compute the number of rentals by month. Fix the dates and compute the number of rentals by month.

In [13]:
weather_data['Month'] = pd.DatetimeIndex(weather_data.date).month

In [14]:
weather_data.groupby('Month')['total_riders'].sum()

Month
1      96744
2     103137
3     164875
4     174224
5     195865
6     202830
7     203607
8     214503
9     218573
10    198841
11    152664
12    123713
Name: total_riders, dtype: int64

##Question 3

Investigate how the number of rentals varies with temperature. Is this trend constant across seasons? Across months?


In [16]:
pd.concat([weather_data['temp'], weather_data['total_riders']], axis=1).corr()

Unnamed: 0,temp,total_riders
temp,1.0,0.713793
total_riders,0.713793,1.0


Check how correlation between temp and total riders varies across months.

In [44]:
weather_data[['total_riders', 'temp', 'Month']].groupby('Month').corr()

Unnamed: 0_level_0,Unnamed: 1_level_0,temp,total_riders
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,temp,1.0,0.689495
1,total_riders,0.689495,1.0
2,temp,1.0,0.716206
2,total_riders,0.716206,1.0
3,temp,1.0,0.735575
3,total_riders,0.735575,1.0
4,temp,1.0,0.533387
4,total_riders,0.533387,1.0
5,temp,1.0,0.065599
5,total_riders,0.065599,1.0


Investigate total riders by month versus average monthly temp.

In [34]:
pd.concat([weather_data.groupby('Month')['total_riders'].sum(), weather_data.groupby('Month')['temp'].mean()], axis=1)

Unnamed: 0_level_0,total_riders,temp
Month,Unnamed: 1_level_1,Unnamed: 2_level_1
1,96744,0.275181
2,103137,0.315337
3,164875,0.449411
4,174224,0.468809
5,195865,0.612366
6,202830,0.675111
7,203607,0.752366
8,214503,0.711801
9,218573,0.620083
10,198841,0.500049


Investigate total riders by season versus average seasonal temp.

In [36]:
pd.concat([weather_data.groupby('season_desc')['total_riders'].sum(), weather_data.groupby('season_desc')['temp'].mean()], axis=1)

Unnamed: 0_level_0,total_riders,temp
season_desc,Unnamed: 1_level_1,Unnamed: 2_level_1
Fall,515476,0.419368
Spring,571273,0.554557
Summer,641479,0.711445
Winter,321348,0.3217


##Question 4
There are various types of users in the usage data sets. What sorts of things can you say about how they use the bikes differently?

Investigate correlations between casual and reg riders on work days and holidays.

In [39]:
weather_data[['no_casual_riders', 'no_reg_riders', 'is_work_day', 'is_holiday']].corr()

Unnamed: 0,no_casual_riders,no_reg_riders,is_work_day,is_holiday
no_casual_riders,1.0,0.274984,-0.539919,0.02972
no_reg_riders,0.274984,1.0,0.437003,-0.16419
is_work_day,-0.539919,0.437003,1.0,-0.258418
is_holiday,0.02972,-0.16419,-0.258418,1.0


Investigate correlations between casual and reg riders and windspeed.

In [41]:
weather_data[['no_casual_riders', 'no_reg_riders', 'windspeed']].corr()

Unnamed: 0,no_casual_riders,no_reg_riders,windspeed
no_casual_riders,1.0,0.274984,-0.158371
no_reg_riders,0.274984,1.0,-0.265985
windspeed,-0.158371,-0.265985,1.0
