# PyRasgo Time Series Date Gaps

This notebook explains how to identify date gaps in time series data with `pyrasgo`.

### Packages

This tutorial uses:
* [pandas](https://pandas.pydata.org/docs/)
* [numpy](https://numpy.org/doc/stable/)
* [PyRasgo](https://app.gitbook.com/@rasgo/s/rasgo-docs/pyrasgo-0.1/dataframe-prep)

In [1]:
import pandas as pd
import numpy as np
import pyrasgo

## Connect to Rasgo

NB: This does not run as this has not yet been built

In [None]:
api_key = pyrasgo.register(email='<your email>')
rasgo = pyrasgo.connect(api_key)

## Creating the data

We will create a dataframe that contains multiple time series, one for each group.

In [3]:
np.random.seed(1066)
dates = pd.date_range(start='2010-01-01', end='2010-12-31', freq='D')
df = pd.DataFrame({'date': dates,
                   'group': 'A',
                   'value': np.random.randint(0, 100, size=len(dates))
                  }).append(pd.DataFrame({'date': dates,
                                          'group': 'B',
                                          'value': np.random.randint(0, 100, size=len(dates))
                  })).append(pd.DataFrame({'date': dates,
                                           'group': 'C',
                                           'value': np.random.randint(0, 100, size=len(dates))
                                          })).reset_index(drop=True)
df

Unnamed: 0,date,group,value
0,2010-01-01,A,57
1,2010-01-02,A,11
2,2010-01-03,A,83
3,2010-01-04,A,83
4,2010-01-05,A,93
...,...,...,...
1090,2010-12-27,C,50
1091,2010-12-28,C,59
1092,2010-12-29,C,85
1093,2010-12-30,C,32


Drop some rows randomly to create gaps in the data.

In [4]:
length = df.shape[0]
droplist = np.unique(np.sort(np.random.randint(0, length, size=100))).tolist()
df = df.drop(droplist).reset_index(drop=True)
df

Unnamed: 0,date,group,value
0,2010-01-01,A,57
1,2010-01-02,A,11
2,2010-01-03,A,83
3,2010-01-04,A,83
4,2010-01-05,A,93
...,...,...,...
992,2010-12-27,C,50
993,2010-12-28,C,59
994,2010-12-29,C,85
995,2010-12-30,C,32


## Identify Date Gaps

### In a single series

The function `evaluate.timeseries_gaps` will identify date gaps in the data.

In [5]:
gaps = rasgo.evaluate.timeseries_gaps(df[df.group == 'A'], datetime_column='date', partition_columns=['group'])
gaps

Unnamed: 0,date,group,value,TSGAPLastDate,TSGAPNextDate
0,2010-01-01,A,57,NaT,2010-01-02
38,2010-02-08,A,58,2010-02-07,2010-02-10
39,2010-02-10,A,97,2010-02-08,2010-02-11
43,2010-02-14,A,54,2010-02-13,2010-02-17
44,2010-02-17,A,93,2010-02-14,2010-02-19
45,2010-02-19,A,88,2010-02-17,2010-02-20
56,2010-03-02,A,93,2010-03-01,2010-03-04
57,2010-03-04,A,92,2010-03-02,2010-03-05
76,2010-03-23,A,21,2010-03-22,2010-03-25
77,2010-03-25,A,44,2010-03-23,2010-03-26


### In multiple time series

Passing the series identifier (**group** in this case) into `evaluate.timeseries_gaps` using the `partition_columns` parameter checks for date gaps in each of the series independently.

In [6]:
gaps = rasgo.evaluate.timeseries_gaps(df, datetime_column='date', partition_columns=['group'])
gaps

Unnamed: 0,date,group,value,TSGAPLastDate,TSGAPNextDate
0,2010-01-01,A,57,NaT,2010-01-02
38,2010-02-08,A,58,2010-02-07,2010-02-10
39,2010-02-10,A,97,2010-02-08,2010-02-11
43,2010-02-14,A,54,2010-02-13,2010-02-17
44,2010-02-17,A,93,2010-02-14,2010-02-19
...,...,...,...,...,...
973,2010-12-05,C,51,2010-12-04,2010-12-08
974,2010-12-08,C,1,2010-12-05,2010-12-09
984,2010-12-18,C,71,2010-12-17,2010-12-20
985,2010-12-20,C,39,2010-12-18,2010-12-21
