The full documentation site is here, and the GitHub page is here.
Here is a short description of some of the main functions (more details below and in the documentation):
setup_tseries
: cleans up dates and sets them as the indexsetup_panel
: cleans up dates and panel id’s and sets them as the index (panel id, period date)lag
: robust lagging that accounts for panel structure, unsorted or duplicate dates, or gaps in the time-series
pip install pandasmore
First, we set up an example dataset to showcase the functions in this module.
import pandas as pd
import numpy as np
import pandasmore as pdm
raw = pd.DataFrame(np.random.rand(15,2),
columns=list('AB'),
index=pd.MultiIndex.from_product(
[[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
names = ['firm_id','date'])
).reset_index()
raw
firm_id | date | A | B | |
---|---|---|---|---|
0 | 1.0 | NaN | 0.249370 | 0.926335 |
1 | 1.0 | 2010-01 | 0.282501 | 0.513859 |
2 | 1.0 | 2010-02 | 0.804278 | 0.307171 |
3 | 1.0 | 2010-02 | 0.828895 | 0.746789 |
4 | 1.0 | 2010-04 | 0.569099 | 0.331814 |
5 | 2.0 | NaN | 0.533977 | 0.823457 |
6 | 2.0 | 2010-01 | 0.207558 | 0.401378 |
7 | 2.0 | 2010-02 | 0.086001 | 0.959371 |
8 | 2.0 | 2010-02 | 0.054230 | 0.993980 |
9 | 2.0 | 2010-04 | 0.062525 | 0.200272 |
10 | NaN | NaN | 0.091012 | 0.635409 |
11 | NaN | 2010-01 | 0.866369 | 0.972394 |
12 | NaN | 2010-02 | 0.432087 | 0.837597 |
13 | NaN | 2010-02 | 0.878219 | 0.148009 |
14 | NaN | 2010-04 | 0.820386 | 0.834821 |
df = pdm.setup_tseries(raw.query('firm_id==1'),
time_var='date', time_var_format="%Y-%m",
freq='M')
df
date | dtdate | firm_id | A | B | |
---|---|---|---|---|---|
Mdate | |||||
2010-01 | 2010-01 | 2010-01-01 | 1.0 | 0.282501 | 0.513859 |
2010-02 | 2010-02 | 2010-02-01 | 1.0 | 0.828895 | 0.746789 |
2010-04 | 2010-04 | 2010-04-01 | 1.0 | 0.569099 | 0.331814 |
df = pdm.setup_panel(raw,
panel_ids='firm_id',
time_var='date', time_var_format="%Y-%m",
freq='M')
df
date | dtdate | A | B | ||
---|---|---|---|---|---|
firm_id | Mdate | ||||
1 | 2010-01 | 2010-01 | 2010-01-01 | 0.282501 | 0.513859 |
2010-02 | 2010-02 | 2010-02-01 | 0.828895 | 0.746789 | |
2010-04 | 2010-04 | 2010-04-01 | 0.569099 | 0.331814 | |
2 | 2010-01 | 2010-01 | 2010-01-01 | 0.207558 | 0.401378 |
2010-02 | 2010-02 | 2010-02-01 | 0.054230 | 0.993980 | |
2010-04 | 2010-04 | 2010-04-01 | 0.062525 | 0.200272 |
pdm.lag(df['A'])
firm_id Mdate
1 2010-01 NaN
2010-02 0.282501
2010-04 NaN
2 2010-01 NaN
2010-02 0.207558
2010-04 NaN
Name: A_lag1, dtype: float64