Skip to content

ionmihai/pandasmore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pandasmore

The full documentation site is here, and the GitHub page is here.

Here is a short description of some of the main functions (more details below and in the documentation):

  • setup_tseries: cleans up dates and sets them as the index
  • setup_panel: cleans up dates and panel id’s and sets them as the index (panel id, period date)
  • lag: robust lagging that accounts for panel structure, unsorted or duplicate dates, or gaps in the time-series

Install

pip install pandasmore

How to use

First, we set up an example dataset to showcase the functions in this module.

import pandas as pd
import numpy as np
import pandasmore as pdm
raw = pd.DataFrame(np.random.rand(15,2), 
                    columns=list('AB'), 
                    index=pd.MultiIndex.from_product(
                        [[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
                        names = ['firm_id','date'])
                      ).reset_index()
raw
firm_id date A B
0 1.0 NaN 0.249370 0.926335
1 1.0 2010-01 0.282501 0.513859
2 1.0 2010-02 0.804278 0.307171
3 1.0 2010-02 0.828895 0.746789
4 1.0 2010-04 0.569099 0.331814
5 2.0 NaN 0.533977 0.823457
6 2.0 2010-01 0.207558 0.401378
7 2.0 2010-02 0.086001 0.959371
8 2.0 2010-02 0.054230 0.993980
9 2.0 2010-04 0.062525 0.200272
10 NaN NaN 0.091012 0.635409
11 NaN 2010-01 0.866369 0.972394
12 NaN 2010-02 0.432087 0.837597
13 NaN 2010-02 0.878219 0.148009
14 NaN 2010-04 0.820386 0.834821
df = pdm.setup_tseries(raw.query('firm_id==1'),
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
date dtdate firm_id A B
Mdate
2010-01 2010-01 2010-01-01 1.0 0.282501 0.513859
2010-02 2010-02 2010-02-01 1.0 0.828895 0.746789
2010-04 2010-04 2010-04-01 1.0 0.569099 0.331814
df = pdm.setup_panel(raw,
                        panel_ids='firm_id',
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
date dtdate A B
firm_id Mdate
1 2010-01 2010-01 2010-01-01 0.282501 0.513859
2010-02 2010-02 2010-02-01 0.828895 0.746789
2010-04 2010-04 2010-04-01 0.569099 0.331814
2 2010-01 2010-01 2010-01-01 0.207558 0.401378
2010-02 2010-02 2010-02-01 0.054230 0.993980
2010-04 2010-04 2010-04-01 0.062525 0.200272
pdm.lag(df['A'])
firm_id  Mdate  
1        2010-01         NaN
         2010-02    0.282501
         2010-04         NaN
2        2010-01         NaN
         2010-02    0.207558
         2010-04         NaN
Name: A_lag1, dtype: float64

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published