![rmotr](https://user-images.githubusercontent.com/7065401/39119486-4718e386-46ec-11e8-9fc3-5250a49ef570.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39120280-4a20fdc8-46ee-11e8-81a3-fd4640621cb8.jpg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# TimeSeries Operations

In this lesson we'll explore time shifting and resampling (grouping). Two of the most common operations with Time Series.

![separator2](https://user-images.githubusercontent.com/7065401/39119518-59fa51ce-46ec-11e8-8503-5f8136558f2b.png)

## Hands on!

In [1]:
import pandas as pd
import numpy as np

### Time Shifting

In [2]:
ts = pd.Series(
    np.random.randn(10) * 10 + 500,
    index=pd.date_range(start='2018-01-01', periods=10, freq='D'))

In [4]:
ts

2018-01-01    496.379126
2018-01-02    508.486283
2018-01-03    494.991265
2018-01-04    514.996805
2018-01-05    503.292494
2018-01-06    490.999937
2018-01-07    500.798803
2018-01-08    516.550106
2018-01-09    520.670866
2018-01-10    484.571455
Freq: D, dtype: float64

In [3]:
ts.shift(1)

2018-01-01           NaN
2018-01-02    496.379126
2018-01-03    508.486283
2018-01-04    494.991265
2018-01-05    514.996805
2018-01-06    503.292494
2018-01-07    490.999937
2018-01-08    500.798803
2018-01-09    516.550106
2018-01-10    520.670866
Freq: D, dtype: float64

In [6]:
pd.DataFrame({
    'Original': ts,
    'Shfit (1)': ts.shift(1),
    'Shift (2)': ts.shift(2)
})

Unnamed: 0,Original,Shfit (1),Shift (2)
2018-01-01,496.379126,,
2018-01-02,508.486283,496.379126,
2018-01-03,494.991265,508.486283,496.379126
2018-01-04,514.996805,494.991265,508.486283
2018-01-05,503.292494,514.996805,494.991265
2018-01-06,490.999937,503.292494,514.996805
2018-01-07,500.798803,490.999937,503.292494
2018-01-08,516.550106,500.798803,490.999937
2018-01-09,520.670866,516.550106,500.798803
2018-01-10,484.571455,520.670866,516.550106


These operations are usually employed to compare the timeseries with previous values of the same time series. For example, calculating the percent change over the previous period:

In [9]:
df = pd.DataFrame({
    'Original': ts,
    'Shifted': ts.shift(1)
})
df

Unnamed: 0,Original,Shifted
2018-01-01,496.379126,
2018-01-02,508.486283,496.379126
2018-01-03,494.991265,508.486283
2018-01-04,514.996805,494.991265
2018-01-05,503.292494,514.996805
2018-01-06,490.999937,503.292494
2018-01-07,500.798803,490.999937
2018-01-08,516.550106,500.798803
2018-01-09,520.670866,516.550106
2018-01-10,484.571455,520.670866


In [12]:
(df['Original'] / df['Shifted']) - 1

2018-01-01         NaN
2018-01-02    0.024391
2018-01-03   -0.026540
2018-01-04    0.040416
2018-01-05   -0.022727
2018-01-06   -0.024424
2018-01-07    0.019957
2018-01-08    0.031452
2018-01-09    0.007977
2018-01-10   -0.069332
Freq: D, dtype: float64

You can see how much sales grew or shrank vs the previous month.

This is a particularly silly example, because there's a pandas method specially intended for percentage changes: [`pct_change()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pct_change.html), so we don't even need `shift`:

In [14]:
ts.pct_change()

2018-01-01         NaN
2018-01-02    0.024391
2018-01-03   -0.026540
2018-01-04    0.040416
2018-01-05   -0.022727
2018-01-06   -0.024424
2018-01-07    0.019957
2018-01-08    0.031452
2018-01-09    0.007977
2018-01-10   -0.069332
Freq: D, dtype: float64

Shifting also works with smaller periods, just changing the time of the original timestamps:

In [18]:
ts.shift(1, freq='15Min')

2018-01-01 00:15:00    496.379126
2018-01-02 00:15:00    508.486283
2018-01-03 00:15:00    494.991265
2018-01-04 00:15:00    514.996805
2018-01-05 00:15:00    503.292494
2018-01-06 00:15:00    490.999937
2018-01-07 00:15:00    500.798803
2018-01-08 00:15:00    516.550106
2018-01-09 00:15:00    520.670866
2018-01-10 00:15:00    484.571455
Freq: D, dtype: float64

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

### Resampling

Resampling a timeseries is converting it to another time frequency. If you're going from high frequency to low frequency, the process is called "downsampling", and it involves an aggregation process. For example, you have daily sales data, and you want to aggregate it by month. You'll be "grouping" your daily sales per month, and you need to decide the aggregation operation to perform. For example, `sum` to get the total sales per month, or `mean` to get the average sale. Let's use an example:

In [22]:
all_days_2018 = pd.date_range(start='2018-01-01', end='2018-12-31', freq='D')
ts = pd.Series(
    np.random.randn(20) * 10 + 500,
    index=np.random.choice(all_days_2018, size=20))

ts.sort_index(inplace=True)
ts

2018-01-28    482.975020
2018-01-29    495.227480
2018-02-05    501.988900
2018-02-09    506.196500
2018-03-06    497.051719
2018-03-16    487.656984
2018-03-24    495.164477
2018-04-01    491.829092
2018-04-04    497.946275
2018-05-17    476.168025
2018-05-19    496.358924
2018-06-05    506.559144
2018-07-14    511.039181
2018-08-16    510.209949
2018-08-25    513.408112
2018-10-08    512.482402
2018-10-10    514.724281
2018-10-31    487.246486
2018-11-07    487.890876
2018-12-10    513.732042
dtype: float64

January sales:

In [23]:
ts['2018-01']

2018-01-28    482.97502
2018-01-29    495.22748
dtype: float64

In [24]:
ts['2018-01'].sum()

978.2024997596479

February sales:

In [25]:
ts['2018-02']

2018-02-05    501.9889
2018-02-09    506.1965
dtype: float64

In [26]:
ts['2018-02'].sum()

1008.1854002097817

**Downsampling**: We'll now use `resample` to "group" the sales monthly (downsampling our TimeSeries), and calculate the total sales per month:

In [27]:
ts.resample('M').sum()

2018-01-31     978.202500
2018-02-28    1008.185400
2018-03-31    1479.873180
2018-04-30     989.775367
2018-05-31     972.526950
2018-06-30     506.559144
2018-07-31     511.039181
2018-08-31    1023.618061
2018-09-30       0.000000
2018-10-31    1514.453168
2018-11-30     487.890876
2018-12-31     513.732042
Freq: M, dtype: float64

The parameter `M` means "month end frequency. We could instead choose "Month Start":

In [40]:
ts.resample('MS').sum()

2018-01-01     978.202500
2018-02-01    1008.185400
2018-03-01    1479.873180
2018-04-01     989.775367
2018-05-01     972.526950
2018-06-01     506.559144
2018-07-01     511.039181
2018-08-01    1023.618061
2018-09-01       0.000000
2018-10-01    1514.453168
2018-11-01     487.890876
2018-12-01     513.732042
Freq: MS, dtype: float64

Which would of course yield the same results, but the index contains the first day of each month. More correctly speaking, in this example, we're collecting sales of _"the period January 2018"_. Pandas also has a `Period` type, which we can use with the `kind` parameter:

In [41]:
monthly_sales = ts.resample('M', kind='period').sum()
monthly_sales

2018-01     978.202500
2018-02    1008.185400
2018-03    1479.873180
2018-04     989.775367
2018-05     972.526950
2018-06     506.559144
2018-07     511.039181
2018-08    1023.618061
2018-09       0.000000
2018-10    1514.453168
2018-11     487.890876
2018-12     513.732042
Freq: M, dtype: float64

In [42]:
monthly_sales.index

PeriodIndex(['2018-01', '2018-02', '2018-03', '2018-04', '2018-05', '2018-06',
             '2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12'],
            dtype='period[M]', freq='M')

As you can see, the Index is a `PeriodIndex`. Each entry in the index is of type `pd.Period`: 

In [45]:
monthly_sales.index[0]

Period('2018-01', 'M')

Period support basic arithmetic operations which makes them convenient to express these time ranges:

In [46]:
pd.Period('2018-01') + 5

Period('2018-06', 'M')

In [49]:
pd.Period('2018-01', freq='H') + 9

Period('2018-01-01 09:00', 'H')

**Upsampling**: With upsampling we'll convert a low-frequency time series to a higher frequency time series. We'll add more "time points". Let's use an example:

We'll start with 3 months of sales, only 3 data points:

In [51]:
ts = pd.Series(
    np.random.randn(3) * 10 + 500,
    index=pd.date_range(start='2018-01-01', periods=3, freq='MS'))
ts

2018-01-01    499.495834
2018-02-01    503.130978
2018-03-01    516.939849
Freq: MS, dtype: float64

We'll now `resample` it to be "Semi Month", every 15 days:

In [59]:
ts.resample('SMS').asfreq()

2018-01-01    499.495834
2018-01-15           NaN
2018-02-01    503.130978
2018-02-15           NaN
2018-03-01    516.939849
Freq: SMS-15, dtype: float64

And as you can see, we have a few missing values, because we don't have data for those specific time periods. What can you do with that missing data? One option is to fill it with previous data:

In [60]:
ts.resample('SMS').ffill()

2018-01-01    499.495834
2018-01-15    499.495834
2018-02-01    503.130978
2018-02-15    503.130978
2018-03-01    516.939849
Freq: SMS-15, dtype: float64

![separator2](https://user-images.githubusercontent.com/7065401/39119518-59fa51ce-46ec-11e8-8503-5f8136558f2b.png)