A high-performance Python library for calculating economic index numbers using Polars. Designed for statisticians and economists working with price and quantity indices.
- High Performance: Built on Polars for efficient data processing of large datasets
- Comprehensive Index Methods: Support for bilateral and multilateral price/quantity indices
- Data Preparation Tools: Built-in utilities for data standardization and temporal aggregation
- Panel Data Handling: Robust methods for dealing with unbalanced panels through removal or imputation
- Extension Methods: Support for index splicing and rolling window calculations
- Type Safety: Full type annotations for better IDE support and code reliability
pip install pyindexnumuv add pyindexnumgit clone https://github.com/paluigi/PyIndexNum.git
cd PyIndexNum
uv syncHere's the typical workflow for calculating economic indices:
import polars as pl
import pyindexnum as pin
# Load your price data
df = pl.read_csv("price_data.csv")
# 1. Standardize column names
df_std = pin.standardize_columns(df, date_col="date", price_col="price", id_col="product_id", quantity_col="quantity")
# 2. Aggregate to desired time frequency
df_agg = pin.aggregate_time(df_std, freq="1mo", agg_type="arithmetic")
# 3. Handle unbalanced panels (optional)
df_balanced = pin.remove_unbalanced(df_agg)
# or
df_imputed = pin.carry_forward_imputation(df_agg, ["aggregated_price", "aggregated_quantity"])
# 4. Calculate bilateral indices (two periods)
laspeyres_idx = pin.laspeyres(df_balanced)
fisher_idx = pin.fisher(df_balanced)
# 5. Calculate multilateral indices (multiple periods)
geks_fisher_idx = pin.geks_fisher(df_agg)
# 6. Apply extension methods (optional)
extended_idx = pin.movement_splice(geks_fisher_idx1, geks_fisher_idx2)| Index | Formula | Use Case |
|---|---|---|
| Jevons | Geometric mean of price relatives | Unweighted geometric average |
| Carli | Arithmetic mean of price relatives | Unweighted arithmetic average |
| Dutot | Ratio of arithmetic means of prices | Simple price average comparison |
| Laspeyres | Weighted by base period quantities | Fixed basket approach |
| Paasche | Weighted by current period quantities | Current basket approach |
| Fisher | Geometric mean of Laspeyres and Paasche | Ideal index (time/quantity reversal) |
| Törnqvist | Weighted geometric mean with average expenditure shares | Symmetric treatment |
| Walsh | Geometric mean of quantities as fixed basket | Alternative symmetric approach |
| Index | Method | Description |
|---|---|---|
| GEKS-Fisher | Chained Fisher indices | Most widely used multilateral method |
| GEKS-Törnqvist | Chained Törnqvist indices | Alternative chaining approach |
| Geary-Khamis | System of equations | Global approach |
| Time Product Dummy | Regression-based | WLS with expenditure shares or unweighted OLS |
- Movement Splice: Chain indices using movement ratios
- Window Splice: Moving window chaining
- Half Splice: Half-year overlapping windows
- Mean Splice: Average of overlapping windows
- Fixed Base Rolling Window: Rolling window with fixed base
Your data should contain:
- Date column: Date or datetime values
- Price column: Numeric price observations
- Product ID column: Unique identifier for each product/variety
- Quantity column: Numeric quantities (required for weighted indices)
Example data structure:
┌────────────┬────────────┬───────┬──────────┐
│ date ┆ product_id ┆ price ┆ quantity │
│ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ f64 ┆ f64 │
╞════════════╪════════════╪═══════╪══════════╡
│ 2023-01-01 ┆ A ┆ 100.0 ┆ 10.0 │
│ 2023-01-01 ┆ B ┆ 200.0 ┆ 5.0 │
│ 2023-02-01 ┆ A ┆ 105.0 ┆ 12.0 │
│ 2023-02-01 ┆ B ┆ 210.0 ┆ 4.5 │
└────────────┴────────────┴────────────┴──────────┘
# Standardize column names and types
df_std = pin.standardize_columns(df, date_col="date", price_col="price", id_col="id")
# Aggregate time series data
df_agg = pin.aggregate_time(df_std, freq="1mo", agg_type="weighted_arithmetic")
# Handle unbalanced panels
df_balanced = pin.remove_unbalanced(df_agg)
df_imputed = pin.carry_forward_imputation(df_agg, ["price", "quantity"])# Bilateral indices
jevons = pin.jevons(df)
laspeyres = pin.laspeyres(df)
fisher = pin.fisher(df)
# Multilateral indices
geks = pin.geks_fisher(df)
gk = pin.geary_khamis(df)# Splicing methods
movement_spliced = pin.movement_splice(multilateral_index1, multilateral_index2)
window_spliced = pin.window_splice(multilateral_index1, multilateral_index2)Full documentation is available at https://pyindexnum.readthedocs.io/
PyIndexNum is an open-source project and welcomes contributions! See our contributing guide for details.
# Clone and setup
git clone https://github.com/paluigi/PyIndexNum.git
cd PyIndexNum
uv sync --dev
# Run tests
uv run pytest
# Build documentation
cd docs && make html- New index methods and formulations
- Performance optimizations
- Additional data validation
- Enhanced documentation and examples
- Bug fixes and improvements
If you use PyIndexNum in your research, please cite:
@software{pyindexnum,
title = {PyIndexNum: A Python Library for Economic Index Numbers},
author = {Palumbo, Luigi, and Yu, Mengting},
url = {https://github.com/paluigi/PyIndexNum},
version = {0.1.2},
}PyIndexNum is licensed under the MIT License. See LICENSE for details.
- Polars: The high-performance DataFrame library that powers PyIndexNum
Built with ❤️ for the economic statistics community