# Time series in Pandas

- toc: true
- badges: true
- comments: true
- categories: [python, pandas]

In [1]:
import pandas as pd
from imports import *

%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

## Number of periods observed

Say I want to know the number of months for which each user in my dataset is observed (allowing for possible gaps in their data).

In [9]:
df = pd.read_parquet(SAMPLEDATA, columns=['user_id', 'transaction_date'])
print(df.shape)
df.head()

(438853, 2)


Unnamed: 0,user_id,transaction_date
0,60777,2014-11-27
1,60777,2014-11-27
2,60777,2014-11-28
3,60777,2014-12-08
4,60777,2014-12-12


In [17]:
def num_periods(g, freq='M'):
    """Return number of periods observed."""
    return (g.transaction_date.max().to_period(freq)
            - g.transaction_date.min().to_period(freq)).n + 1
    
df.groupby('user_id').apply(num_periods, freq='M').head(3)

user_id
777     103
1777     28
7777     90
dtype: int64

In [18]:
df.groupby('user_id').apply(num_periods, freq='W').head(3)

user_id
777     447
1777    117
7777    387
dtype: int64

## Groupby vs resample

In [5]:
df = pd.DataFrame({'data': [1, 2, 3, 4, 5]},
                  pd.date_range('2020-01-01', '2020-01-10', freq='2d'))
df

Unnamed: 0,data
2020-01-01,1
2020-01-03,2
2020-01-05,3
2020-01-07,4
2020-01-09,5


In [6]:
df.resample('d').sum()

Unnamed: 0,data
2020-01-01,1
2020-01-02,0
2020-01-03,2
2020-01-04,0
2020-01-05,3
2020-01-06,0
2020-01-07,4
2020-01-08,0
2020-01-09,5


In [7]:
dd.groupby(level=0).sum()

Unnamed: 0,data
2020-01-01,1
2020-01-03,2
2020-01-05,3
2020-01-07,4
2020-01-09,5


## Main sources

- [Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)