## Pandas time series
A series or dataframe using a datetime index instead of a range index.

Used for storing events/data that fits on a timeline.
- Weather data.
- Temperature readings.
- Heart rate monitoring (EKG).
- Quarterly sales.
- Stock prices.

In [1]:
import pandas as pd
import numpy as np

### Creating a datetimeindex
Use date_range() with 3 of the 4 parameters:
- start
- end
- periods
- freq

In [22]:
datetimeindex = pd.date_range(start='2018-01-01', end='2019-12-31', freq="D")
datetimeindex

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10',
               ...
               '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
               '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
               '2019-12-30', '2019-12-31'],
              dtype='datetime64[ns]', length=730, freq='D')

In [35]:
df = pd.DataFrame(
    dict(
        n = range(len(datetimeindex)),
        rand = np.random.random(len(datetimeindex))
    ),
    index=datetimeindex
)

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 730 entries, 2018-01-01 to 2019-12-31
Freq: D
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   n       730 non-null    int64  
 1   rand    730 non-null    float64
dtypes: float64(1), int64(1)
memory usage: 33.3 KB


In [34]:
df.loc["2018-03-15"]
df.loc[:"2018-01-15"]
df.loc["2018-03-15":"2018-03-20"]
df.loc["2018-03"]
df.loc["2019"]
df.loc["2019-03-29":"2019-05"]

Unnamed: 0,n,rand
2019-03-29,452,0.277300
2019-03-30,453,0.612097
2019-03-31,454,0.738044
2019-04-01,455,0.613258
2019-04-02,456,0.064672
...,...,...
2019-05-27,511,0.058865
2019-05-28,512,0.305488
2019-05-29,513,0.407736
2019-05-30,514,0.758114


### Resampling
Resampling is the practice of creating new samples with a lower or higher frequency than the orignal data.

#### Downsampling
When the sample frequency is lower than the orignal we can aggregate values from the original data.

In [44]:
from helpers import hdisplay

resample_method = "Y"

hdisplay([
    df.head(10),
    df.resample(resample_method).agg({"n": "max", "rand": "sum"}).head(10)],
    ["Orignal", f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.284395
2018-01-02 00:00:00,1,0.710964
2018-01-03 00:00:00,2,0.634303
2018-01-04 00:00:00,3,0.627769
2018-01-05 00:00:00,4,0.248145
2018-01-06 00:00:00,5,0.506077
2018-01-07 00:00:00,6,0.315071
2018-01-08 00:00:00,7,0.350274
2018-01-09 00:00:00,8,0.14397
2018-01-10 00:00:00,9,0.981547

Unnamed: 0,n,rand
2018-12-31 00:00:00,364,177.466576
2019-12-31 00:00:00,729,187.091181


#### Upsampling
When the sample frequency is higher than the original, we can fill between values:
- ffill
- bfill
- nearest
- interpolation
- fillna

In [63]:
resample_method = "6H"

hdisplay([
    df.head(10),
    df.resample(resample_method).max().fillna(df[["n", "rand"]].mean()).head(10)],
    ["Orignal", f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.284395
2018-01-02 00:00:00,1,0.710964
2018-01-03 00:00:00,2,0.634303
2018-01-04 00:00:00,3,0.627769
2018-01-05 00:00:00,4,0.248145
2018-01-06 00:00:00,5,0.506077
2018-01-07 00:00:00,6,0.315071
2018-01-08 00:00:00,7,0.350274
2018-01-09 00:00:00,8,0.14397
2018-01-10 00:00:00,9,0.981547

Unnamed: 0,n,rand
2018-01-01 00:00:00,0.0,0.284395
2018-01-01 06:00:00,364.5,0.499394
2018-01-01 12:00:00,364.5,0.499394
2018-01-01 18:00:00,364.5,0.499394
2018-01-02 00:00:00,1.0,0.710964
2018-01-02 06:00:00,364.5,0.499394
2018-01-02 12:00:00,364.5,0.499394
2018-01-02 18:00:00,364.5,0.499394
2018-01-03 00:00:00,2.0,0.634303
2018-01-03 06:00:00,364.5,0.499394
