The project uses historic United States Supreme Court cases to train natural language processing models to predict case rulings.
- The number of unique roles within the advocates' file is too numerous to be helpful, so we merged them into 5 categories. While this merger may remove some variability and nuance in the file, we believe it will make it easier to derive meaningful conclusions.
- The groupings for the roles are as follows:
inferred
,for respondent
,for partitioner
, andfor amicus curiae
- The code for that grouping can be found in the
clean_roles
function in thedescriptives.py
file.
- The groupings for the roles are as follows:
- The years included within this data set are 2014 to 2019.
- The datasets included within the previously mentioned year range are ones where the winnings side was either 0 or 1 (no missing, etc.).
- All reports and presentations are contained within the
reports
folder in the base directory.- The Jupyter notebooks cannot be run from the
reports
directory as they use paths that require being in the base directory. If you want to run them yourself, move them to the base directory.
- The Jupyter notebooks cannot be run from the
- Python version:
^3.11
- Poetry
- Any modules should be added via the
poetry add [module]
command.- Example:
poetry add pytest
- Example:
There are two ways that you can run this application, one of them is to run all components of it at once and the other is to run each component individually. I will give you the instructions for both methods below.
- After you have installed Poetry, run the command from the base repository directory:
poetry shell
- Run the
poetry install
command to install the package dependencies within the project. - Run the
make run
command to run the entirety of the application from end-to-end.
- After you have installed Poetry, run the command from the base repository directory:
poetry shell
- Run the command
poetry install
to install the package dependencies within the project. - Run the
make get-data
command to get the data from Convokit. - Run the
make prepare-data
command to process the cleaned Convokit, produce an Excel sheet containing descriptive statistics of the cleaned data, and prepare the cleaned data to be processed by the machine learning models. - Run the
make run-all-models
command to run the Logistic Regression, Random Forest, and XGBoost models on the output of theprepare-data
command.
make format
: RunsBlack
on the codebase.make lint
: Runsruff
on the codebase.make test
: Runs test cases in thetest
directory.make test-and-fail
Runs test cases in thetest
directory with the-x
flag that causes a build to fail if a test fails.
api
: Houses all functions used to access external APIs.processing
: Houses all functions used to clean and prepare data for statistical analysis and machine learning models.summary_analysis
: Houses all functions used to create our cursory statistical analysis.models
: Houses all functions used to create and run our machine learning models.util
: Houses all functions and constants utilized in multiple packages to prevent code duplication throughout thesupreme-court-predictions
application.