Skip to content

elliottash/tax-classification

Repository files navigation

U.S. Tax Law Classifier

This repository accompanies the paper

  • Luyang Han, Malka Guillot, and Elliott Ash, "Machine Extraction of Tax Laws from Legislative Texts", Workshop on Natural Legal Language Processing (2021).

The repo contains scripts and trained models for identifying tax-related documents in legislative text, and further predicting the tax source of those documents (e.g. income tax, sales tax, property tax).

We have replication code for training Logistic Regression and Random Forest models on the annotated statutes corpus. The script for training the classifier for "is tax related" is in the tax_source folder. The script for training the classifier for tax source is in the tax_source folder.

The trained model can then be applied to other corpora. For example, in the paper, we apply it to historical statutes. The trained random forest models (with best performance from the paper) are available here:

The sample scripts for using the trained models are included as prediction_sample_statutes.py and prediction_sample_sentence_source.py.

About

Classification of tax related and source on state session laws

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages