## Time series



-   Important form of structured data
-   *Fixed* frequency or *Irregular* intervals
-   pandas provides many built-in time series tools and data algorithms



## Date and Time Data



-   The Python standard library includes data types for date and time data and calendar-related functionality
-   `datetime`, `time`, and `calendar` modules



In [None]:
from datetime import datetime
now = datetime.now()
now.year, now.month, now.day

`timedelta` represents the temporal difference between two `datetime` objects



In [None]:
delta = datetime(2019, 1, 7) - datetime(2017, 6, 24, 8, 15)
delta

In [None]:
delta.days, delta.seconds

We can add/subtract a `timedelta` object from a `datetime`



In [None]:
from datetime import timedelta
start = datetime(2019, 1, 7)
start + timedelta(12)

### Converting between string and datetime



We can format `datetime` objects as strings



In [None]:
stamp = datetime(2019, 1, 3)
str(stamp)

In [None]:
stamp.strftime('%Y-%m-%d')

Datetime format specifications
![img](images/strftime1.png)



![img](images/strftime2.png)

Instead of converting dates to string, we can perform the opposite

In [None]:
value = '2019-01-03'
datetime.strptime(value, '%Y-%m-%d')

-   `datetime.strptime` is a good way to parse dates
-   `parser.parse` can handle common date formats



In [None]:
from dateutil.parser import parse
parse('2019-01-03')

In [None]:
parse('Jan 31, 2019 10:45 PM')

`pandas` makes it even easier!



In [None]:
import pandas as pd
datestrs = ['2018-07-06 12:00:00' ,'2018-08-06 00:00:00']
pd.to_datetime(datestrs)

Can also handle missing data



In [None]:
idx = pd.to_datetime(datestrs + [None])
idx

## Time series basics



A `Series` is a basic type of time series object in pandas



In [None]:
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
datetime(2011, 1, 7), datetime(2011, 1, 8),
datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

We can perform arithmetic operations



In [None]:
ts + ts[::2]

`pandas` automatically aligns the dates



### Indexing, Selection, Subsetting



Time series behaves like any other pandas.Series when you are indexing and select‐
ing data based on label



In [None]:
stamp = ts.index[2]
ts[stamp]

In [None]:
ts['1/10/2011']

In [None]:
ts['20110110']

For longer times series only a year or month can be passed to select a slice of data



In [None]:
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts

In [None]:
longer_ts['2001']

Another way to slice time series



In [None]:
ts['1/6/2011':'1/11/2011']

### Time series with duplicate indices



In [None]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
'1/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

In [None]:
dup_ts.is_unique

Indexing can either produce scalar values or a slice



In [None]:
dup_ts['1/3/2000']

In [None]:
dup_ts['1/2/2000']

## Date ranges



`DatetimeIndex` objects can be generated with `date_range`



In [None]:
index = pd.date_range('2012-04-01', '2012-06-01')
index

By default, daily timestamps are generated but we can define the periods directly



In [None]:
pd.date_range(start='2012-04-01', periods=20)

Different frequencies can be used



In [None]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

![img](images/date_range.png)



## Shifting data



We can move data backward and forward through time



In [None]:
ts = pd.Series(np.random.randn(4), index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts

In [None]:
ts.shift(2)

## Resampling and Frequency conversion



Let's create a `Series` object and figure out the different resampling operations:

-   Aggregate higher frequency data to lower frequency, *downsampling*
-   Converting from lower to higher frequency, *upsampling*



In [None]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

1.  Calculate the monthly mean.
2.  Look at the resulting index, does a day appear? Convert to actual month notation.
3.  Calculate the mean every 2 weeks.
4.  Reindex the series to the end of 2000. Recalculate the monthly mean and interpolate.



## Moving window functions



An important class of array transformations operate on a sliding window. Let's use some sample stock exchange data



In [None]:
close_px_all = pd.read_csv('examples/stock_px_2.csv', parse_dates=True, index_col=0)
close_px = close_px_all.resample('B').ffill()
close_px

Let's calculate and plot the rolling mean over a 150-day window



In [None]:
%matplotlib inline
close_px.AAPL.plot()
close_px.AAPL.rolling(150).mean().plot()

The expression `rolling(150)` behaves similarly to `groupby`. Can you explain its functionality?

By default rolling functions require all values in the window to be non-missing but that can be changed



In [None]:
appl_std250 = close_px.AAPL.rolling(250, min_periods=10).std()
appl_std250[5:12]

The expanding operator starts the time window from the beginning of the times series



In [None]:
expanding_mean = appl_std250.expanding().mean()
expanding_mean

Calling a moving window function on a `DataFrame` applies on each column



In [None]:
close_px.rolling(60).mean().plot(logy=True)

Rolling functions also accept strings indicating a fixed-size time offset



In [None]:
close_px.rolling('20D').mean()

## Homework



1.  [Download](https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/) the files for storm locations and corresponding fatalities for a **specific year only**.
    
    -   Create a map showing the locations of the storm events.
    -   Calculate the number of fatalities per state.
    -   Create a map with each U.S. state colored according to the number of fatalities. Look at the [Brexit](http://geoviews.org/gallery/bokeh/brexit_choropleth.html#bokeh-gallery-brexit-choropleth) and [Katrina track](http://geoviews.org/gallery/bokeh/katrina_track.html#bokeh-gallery-katrina-track) examples for some help.

2. Download the data for the Arctic Oscillation (AO) and North Atlantic Oscillation (NAO) from http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii and http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii.
     - Construct time series objects from the datasets.
     - Calculate the annual means, maximum and minimum values.
     - Create an interactive plot with a widget that selects the year or the statistic that you have just calculated.
     - Create a bar plot of the AO and NAO values from 1980 to 1989 but only show the times when AO > 0 and NAO < 0.
     - Smooth the time series for both AO and NAO, and create a plot the shows the effect of longer smoothing windows (3 to 12 months).

