Skip to content

fmuny/ORFpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI version Supported Python versions Licence

orf: ordered random forests

Welcome to the repository of the Python package orf for random forest estimation of the ordered choice models. For the R version of the orf package Lechner and Okasa (2020) please refer to the CRAN repository.

Introduction

The Python package orf is an implementation of the Ordered Forest estimator as developed in Lechner and Okasa (2019). The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the Ordered Forest provides functions for estimating marginal effects and thus provides similar output as in standard econometric models for ordered choice. The core Ordered Forest algorithm relies on the fast forest implementation from the scikit-learn (Pedregosa et al., 2011) library.

Installation

In order to install the latest PyPi released version run

pip install orf

in the terminal. orf requires the following dependencies:

  • numpy (>=1.21.0)
  • pandas (>=1.3.5)
  • scipy (>=1.7.2)
  • scikit-learn (>=1.0.2)
  • joblib (>=1.0.1)
  • plotnine (>=0.8.0)

In case of an installation failure due to dependency issues or conflicts with Anaconda distribution, consider installing the package in a virtual environment.

The implementation relies on Python 3 and is compatible with version 3.8, 3.9 and 3.10.

Examples

The example below demonstrates the basic functionality of the Ordered Forest.

## Ordered Forest
import orf

# load example data
features, outcome = orf.make_ordered_regression()

# initiate Ordered Forest with custom settings
oforest = orf.OrderedForest(n_estimators=1000, min_samples_leaf=5,
                            max_features=2, replace=False, sample_fraction=0.5,
                            honesty=True, honesty_fraction=0.5, inference=False,
                            n_jobs=-1, random_state=123)

# fit Ordered Forest
oforest.fit(X=features, y=outcome)

# show summary of the Ordered Forest estimation
oforest.summary()

# evaluate the prediction performance
oforest.performance()

# plot the estimated probability distributions
oforest.plot()

# predict ordered probabilities in-sample
oforest.predict(X=None, prob=True)

# evaluate marginal effects for the Ordered Forest
oforest.margins(X=None, X_cat=None, X_eval=None, eval_point='mean', window=0.1)

For more detailed examples see the package description.

References

The orf logo has been created via R-package hexSticker using Tourney font designed by Tyler Finck, ETC.