In [1]:
import pandas as pd

As a more involved example of working with some time series data, let's take a look at bicycle counts on Seattle's [Fremont Bridge](http://www.openstreetmap.org/#map=17/47.64813/-122.34965).
This data comes from an automated bicycle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge.
The hourly bicycle counts can be downloaded from http://data.seattle.gov/; here is the [direct link to the dataset](https://data.seattle.gov/Transportation/Fremont-Bridge-Hourly-Bicycle-Counts-by-Month-Octo/65db-xm6k).


In [2]:
data = pd.read_csv('data/FremontBridge.csv')
data.columns = ['Date', 'West', 'East']
data['Total'] = data.eval('West + East')
data.head()

Unnamed: 0,Date,West,East,Total
0,01/01/2019 12:00:00 AM,0.0,9.0,9.0
1,01/01/2019 01:00:00 AM,2.0,22.0,24.0
2,01/01/2019 02:00:00 AM,1.0,11.0,12.0
3,01/01/2019 03:00:00 AM,1.0,2.0,3.0
4,01/01/2019 04:00:00 AM,2.0,1.0,3.0


- index data by date

In [5]:
dates = pd.to_datetime(data['Date'])
data2 = data.set_index(dates)
data2.head()

Unnamed: 0_level_0,Date,West,East,Total
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-01 00:00:00,01/01/2019 12:00:00 AM,0.0,9.0,9.0
2019-01-01 01:00:00,01/01/2019 01:00:00 AM,2.0,22.0,24.0
2019-01-01 02:00:00,01/01/2019 02:00:00 AM,1.0,11.0,12.0
2019-01-01 03:00:00,01/01/2019 03:00:00 AM,1.0,2.0,3.0
2019-01-01 04:00:00,01/01/2019 04:00:00 AM,2.0,1.0,3.0


- extract the observations in the month of September 2019

In [6]:
data2['2019-09']

Unnamed: 0_level_0,Date,West,East,Total
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-09-01 00:00:00,09/01/2019 12:00:00 AM,13.0,7.0,20.0
2019-09-01 01:00:00,09/01/2019 01:00:00 AM,3.0,2.0,5.0
2019-09-01 02:00:00,09/01/2019 02:00:00 AM,6.0,4.0,10.0
2019-09-01 03:00:00,09/01/2019 03:00:00 AM,1.0,8.0,9.0
2019-09-01 04:00:00,09/01/2019 04:00:00 AM,1.0,3.0,4.0
2019-09-01 05:00:00,09/01/2019 05:00:00 AM,2.0,11.0,13.0
2019-09-01 06:00:00,09/01/2019 06:00:00 AM,8.0,9.0,17.0
2019-09-01 07:00:00,09/01/2019 07:00:00 AM,14.0,25.0,39.0
2019-09-01 08:00:00,09/01/2019 08:00:00 AM,42.0,34.0,76.0
2019-09-01 09:00:00,09/01/2019 09:00:00 AM,60.0,61.0,121.0


In [10]:
#using string operations
data[data['Date'].str.match('09/\d\d/2019')]

Unnamed: 0,Date,West,East,Total
6414,09/01/2019 12:00:00 AM,13.0,7.0,20.0
6415,09/01/2019 01:00:00 AM,3.0,2.0,5.0
6417,09/01/2019 02:00:00 AM,6.0,4.0,10.0
6418,09/01/2019 03:00:00 AM,1.0,8.0,9.0
6420,09/01/2019 04:00:00 AM,1.0,3.0,4.0
6421,09/01/2019 05:00:00 AM,2.0,11.0,13.0
6422,09/01/2019 06:00:00 AM,8.0,9.0,17.0
6423,09/01/2019 07:00:00 AM,14.0,25.0,39.0
6424,09/01/2019 08:00:00 AM,42.0,34.0,76.0
6425,09/01/2019 09:00:00 AM,60.0,61.0,121.0


- What is the total bicycle count per year? Are people using bikes more?

In [15]:
data2.resample('Y').mean()['Total']

Date
2012-12-31     70.699537
2013-12-31    105.992121
2014-12-31    114.875671
2015-12-31    112.659130
2016-12-31    111.860412
2017-12-31    109.959470
2018-12-31    120.091335
2019-12-31    142.904151
Freq: A-DEC, Name: Total, dtype: float64

In [24]:
data.groupby(dates.apply(lambda x: x.year))['Total'].mean()

Date
2012     70.699537
2013    105.992121
2014    114.875671
2015    112.659130
2016    111.860412
2017    109.959470
2018    120.091335
2019    142.904151
Name: Total, dtype: float64

In [30]:
years = data['Date'].str.findall('\d{4}').str[0]
data.groupby(years)['Total'].mean()

Date
2012     70.699537
2013    105.992121
2014    114.875671
2015    112.659130
2016    111.860412
2017    109.959470
2018    120.091335
2019    142.904151
Name: Total, dtype: float64

- What is the average bicycle count per week? What explains this variation?

In [32]:
#data2.resample('W').mean()['Total']

In [33]:
data.groupby(dates.apply(lambda x: x.week))['Total'].mean()

Date
1      58.937925
2      74.956633
3      78.281463
4      82.080782
5      85.829082
6      66.720238
7      70.382653
8      71.728741
9      83.660714
10     86.564475
11     93.755952
12     97.486395
13    109.017007
14    107.072279
15    102.554422
16    112.081633
17    125.511073
18    154.352041
19    160.861395
20    157.006803
21    154.072279
22    150.373299
23    158.652211
24    154.076661
25    156.095238
26    154.784864
27    149.030612
28    165.645408
29    162.301020
30    164.693027
31    162.926020
32    167.434524
33    156.020408
34    147.658163
35    145.374150
36    128.766156
37    139.542517
38    130.992347
39    131.439626
40    123.988715
41    110.176020
42    108.528912
43    101.362245
44     93.768707
45     96.911565
46     88.735544
47     67.048469
48     69.831633
49     72.989796
50     70.574830
51     59.348639
52     34.243197
53     41.535714
Name: Total, dtype: float64

- What time of the day is the busiest on the bridge?

In [34]:
data.groupby(dates.apply(lambda x: x.hour))['Total'].mean()

Date
0      11.850039
1       6.273688
2       4.129463
3       3.037588
4       6.694205
5      26.426782
6      92.447142
7     233.166406
8     329.569695
9     192.353702
10    100.786526
11     88.680768
12     93.817078
13    100.061864
14    108.577525
15    138.455756
16    231.271339
17    394.513704
18    269.644871
19    132.611981
20     76.611590
21     49.132341
22     32.420517
23     21.408771
Name: Total, dtype: float64

In [39]:
hourly_data = data2.resample('3h')['Total'].sum()

In [41]:
hourly_data.index.map(lambda x: x.hour)

Int64Index([ 0,  3,  6,  9, 12, 15, 18, 21,  0,  3,
            ...
            18, 21,  0,  3,  6,  9, 12, 15, 18, 21],
           dtype='int64', name='Date', length=20432)

In [42]:
hourly_data.groupby(hourly_data.index.map(lambda x: x.hour)).mean()

Date
0      22.245106
3      36.158575
6     655.183242
9     381.671496
12    302.419734
15    764.240799
18    478.868442
21    102.961629
Name: Total, dtype: float64