# Draft Blog Post Again Again Again Again Again
> Applying deep learning to predict Justice Brennan's voting record

- toc: true 
- badges: true
- comments: true
- author: Charles Dobson
- categories: [artificial intelligence, machine learning, deep learning, litigation, analytics]

# Introduction

In Prof. Wolfgang Alschner's fantastic course, [Data Science for Lawyers](https://www.datascienceforlawyers.org/), Lesson 8 uses machine learning to predict Justice Brennan's voting record. One of the key questions is: if a machine learning model "studies" information about thousands of Justice Brennan's cases, how well can it predict the way he voted on other cases?

The lesson reviews several different machine learning algorithms.{% fn 1 %} This excercise inspired me to apply a different machine learning approach to this same dataset. This approach is known as deep learning. In this post, I detail the results of this effort.

# Justice Brennan

Per [Wikipedia](https://en.wikipedia.org/wiki/William_J._Brennan_Jr.), Justice Brennan (April 25, 1906 – July 24, 1997) was an American lawyer and jurist who served as an Associate Justice of the Supreme Court of the United States from 1956 to 1990. He was the seventh-longest-serving justice in Supreme Court history, and known for being a leader of the Court's liberal wing.

The length of Justice Brennan's tenure is key for present purposes. Since he sat on SCOTUS for so long, he has a lengthy voting record. This is important for training a machine learning model effectively (the more data, the better). 

# The Dataset

The dataset is available online at the course website. The data is from [The Supreme Court Database](http://scdb.wustl.edu/index.php). In this database, court decisions are coded for a variety of variables relating to the identification, chronology, background, substance, and outcome of each case. 

The dataset is a simple CSV file. [Click to view it in a "raw" format](https://github.com/litkm/WJBrennan-Voting/blob/main/WJBrennan_voting.csv).

Below, the first five entries of dataset are printed out.

![](my_icons/BrennanDataset.png)

Decoded, the first entry indicates:
* The case was heard in 1956 (term)
* The petitioner (appellant) was a "bank, savings and loan, credit union, investment company" (petitioner)
* The respondent was an "agent, fiduciary, trustee, or executor" (respondent)
* The court assumed jurisdiction on the basis of a writ of certiorari (jurisdiction)
* The case originated from the Pennsylvania Western U.S. District Court (caseOrigin)
* The U.S. Court of Appeals, Third Circuit, was the source of the decision SCOTUS reviewed (caseSource)
* SCOTUS granted the writ of certiorari in order to "to resolve important or significant question" (certReason)
* The subject matter of the controversy related to "cruel and unusual punishment, death penalty (cf. extra legal jury influence, death penalty)" (issue)
* The preceding variable was categorized as relating to federalism (issueArea)
* Lastly, Justice Brennan voted with the majority (vote)

Below, additional information from the dataset is set out.

![](my_icons/BrennanDataset2.png)

For present purposes, the most important information shown here is that the dataset contains 4746 entries, i.e. there is information regarding 4746 cases, including whether Justice Brennan voted with the majority or the minority of the SCOTUS panel.

# The Deep Learning Model

Machine learning is a subfield of computer science. The basic objective is to program computers to learn so that they can perform tasks for which they were not explicitly programmed.{% fn 2 %}

There are many approaches to machine learning, of which deep learning is only one. This approach is based on artificial neural networks, which are a kind of algorithm loosely modelled on neurons in the human brain.{% fn 3 %}

The deep learning model I used for this project is based on a model from the [Codecademy] (https://www.codecademy.com) course, *Build Deep Learning Models with TensorFlow*. This model is coded in Python, and also uses a deep learning framework from Google, known as Keras (TensorFlow).

My hope was to assemble a model that takes the Brennan dataset as an input, and outputs accurate predictions regarding his voting.

Critically, the model is not pre-programmed with any particular patterns, rules, or guidelines specific to Justice Brennan and the way he voted on SCOTUS. Rather, the model applies the deep learning algorithm to process the dataset and develop, independently, its own "understanding" of his voting history. Based on this understanding, it make predictions.

I am glossing over a lot of details, but, in simple terms, this is how the model in this project works:
* It randomly apportions the dataset into two subsets: one for training, and another for testing (70% for training, 30% for testing).
* Then it looks at each case in the training dataset, one-by-one.
* With every case, it considers each of the variables (petitioner, respondent, etc), and then predicts whether Justice Brennan voted with the majority or the minority in this particular case.
* The model then checks the final column of the dataset: was the prediction correct or not?
* It then recalibrates its weighting of each variable based on whether it made a correct or incorrect prediction. When a model works well, this recalibration results in incrementally better (more accurate) predictions.
* Once the model has reviewed each of the cases in the training dataset, it then tests itself, case-by-case again, against the second (testing) dataset. This is an important way check against the model merely memorizing the training dataset, as opposed to calibrating its predictive process to enable it to generalize and make accurate predictions about new cases (the test dataset).
* After the model cycles through both the training and the testing datasets, it repeats this process over again. Models will do this cycle many times (one hundred, in this instance - but it can be much more). Ideally, the predictive accuracy of the model increases each cycle until it plateaus when it reaches its predictive potential.  

# The Results

So how did the model do?


{{ 'These algorithms are naive bayes, support vector machines, and K-nearest neighbor.' | fndetail: 1 }}
{{ 'Andrew Trask, *Grokking Deep Learning* (Shelter Island, NY: Manning Publications, 2019), p. 11' | fndetail: 2 }}
{{ 'Ibid., p. 10' | fndetail: 3 }}


