# Word Embedding of 10th Circut Court Opinions
    Keira Richards- University of Colorado, Denver

### Methodology

The United States Court of Appeals for the Tenth Circuit is the appelate federal court for for the states of Colorado, Kansas, New Mexico, Utah, Oklahoma and Wyoming. I've chosen to analyze these opinions as a view into the sentiments of the second highest court in Colorado and to see how it differs from embeddings of the Supreme Court

The opinons arefr om [Court Listener](https://www.courtlistener.com), a Free Law Project...project. [Free Law Project](https://free.law) is a data/software oriented nonprofit seeking to "make the legal ecosystem more equitable and competitive." The [Court Listener API](https://www.courtlistener.com/api/) allows for bulk dowloads of court proceedings including arguments and opinions from each Circuit Court and the Supreme Court. I found the Court Listener API this [this tutorial](https://github.com/idc9/word_embed_tutorial) which creates a word embedding with Supreme Court opinions.

The data set is 56194 JSON files of court opinions from the 10th circuit court. Each file is one opinion on one case. The cases are chornological, but the numbered files arent, alluding to possible missing cases. it begins with 95.json from march 13, 2010 and ends with 6216474.json from Febuary 8, 2022. The data does not go as far back as the Supreme Court opinions available on Court Listiner, but will lend itself well to analyzing change over the last decade.

In [3]:
import os
import io
import string
import tqdm
import numpy as np
import sys
import re
import random
from gensim.models import Word2Vec, KeyedVectors
from gensim.similarities import Similarity
from gensim.models.phrases import Phraser, Phrases
import gensim



My model was only trainned on 5,000 random files due to hardware limitations, but this should still be enough to visualize the sentiments of the court. The model erroneously recognized the metadata of the JSON files which i plan to recifiy in later versions.

In [4]:
model=gensim.models.KeyedVectors.load_word2vec_format("ca10model.bin", binary=True)

To begin to get an idea for 10th circuit sentiments, I put in some common political jargon

In [31]:
model.most_similar("duty")

[('obligation', 0.8176979422569275),
 ('legal_obligation', 0.6650003790855408),
 ('affirmative_duty', 0.6591303944587708),
 ('power', 0.6556466221809387),
 ('legal_duty', 0.6342571973800659),
 ('refusal', 0.6192611455917358),
 ('promise', 0.617901623249054),
 ('intention', 0.6046798825263977),
 ('authority', 0.6019030213356018),
 ('duties', 0.5943748354911804)]

In [6]:
model.most_similar("state")

[('federal', 0.7436718344688416),
 ('tribal', 0.7030956745147705),
 ('State', 0.6512160897254944),
 ('county', 0.6399028301239014),
 ('military', 0.582913339138031),
 ('municipal', 0.5628710389137268),
 ('state_law', 0.5480576157569885),
 ('juvenile', 0.503242552280426),
 ('prison', 0.5029860734939575),
 ('competent_jurisdiction', 0.49854838848114014)]

In [7]:
model.most_similar("autonomy")

[("consumers'", 0.6437810659408569),
 ('commonlaw_right', 0.6404505968093872),
 ('Leathaâ\x80\x99s_shares', 0.6353196501731873),
 ('placement_construction', 0.6248644590377808),
 ('capability', 0.6189494132995605),
 ('scarce_judicial_resources', 0.618840754032135),
 ('interestsâ\x80\x9d', 0.6142489314079285),
 ('harmony', 0.6125048995018005),
 ('exclusive_authority', 0.611504077911377),
 ('franchises', 0.6080130934715271)]

In [12]:
model.most_similar("security")

[('commercial', 0.7312663793563843),
 ('financial', 0.6954485177993774),
 ('housing', 0.6910195350646973),
 ('equipment', 0.6891607046127319),
 ('storage', 0.6866706609725952),
 ('market', 0.6848353743553162),
 ('financing', 0.6825789213180542),
 ('food', 0.6779769659042358),
 ('operating', 0.6743307113647461),
 ('investment', 0.6705516576766968)]

Some of these associations are clear and obvious. 'Duty' and obligation, 'state' and 'federal'...words one would expect to be related. However, some less obvious relationships stood out. For example, the association of 'tribal' and 'state' is interesting and likely stronger in the 10th circuit than other court systems due to the number of native reservations in the area. Also, the correlation of 'housing' and 'security' is intriguing, as the American government does not consider housing a right to the best of my knowledge **what kind of housing security cases were they looking at?**
    
The focus of economics was also striking. 'Commercial' and 'security' are the highest correlation, possibly pointing to many cases on business interests (and possibly a few about security guards). For 'Autonomy' I was expecting words pertaining to individual rights, instead the highest correlation is 'consumers'. Let's look a bit deeper into consumer security....


In [36]:
model.most_similar(positive=["consumer","autonomy"])

[('pollutants', 0.7137232422828674),
 ('competitive', 0.6902402639389038),
 ('conservation', 0.680314302444458),
 ('\\right', 0.6708946228027344),
 ('political', 0.6703660488128662),
 ('monopoly', 0.661037802696228),
 ('political_parties', 0.6534027457237244),
 ('utility', 0.6530541777610779),
 ('commercial', 0.6517835855484009),
 ('environment', 0.6466688513755798)]

Not sure what I was expecting, but it wasn't 'Pollutants' or 'conservation'. I think this points to an environmental protectionist ideology of the 10th circuit court, or at least a good portion of cases pretraining to environmental protection. **how does environmental protection relate to consumer autonomy??**

In [37]:
model.most_similar("abortion")

[('regulating', 0.638451874256134),
 ('teaching', 0.6196387410163879),
 ('conservation', 0.5918653607368469),
 ('functions', 0.5888144969940186),
 ('governmental', 0.5821440815925598),
 ('performing', 0.5770695805549622),
 ('tribal_selfgovernment', 0.5715588331222534),
 ('daytoday', 0.568727970123291),
 ('international', 0.5685784816741943),
 ('aimed_at', 0.5679572820663452)]

In [39]:
model.most_similar("woman")

[('man', 0.8402585983276367),
 ('friend', 0.8204723596572876),
 ('passenger', 0.8114058971405029),
 ('girl', 0.8037137389183044),
 ('neighbor', 0.797197699546814),
 ('nurse', 0.7881025671958923),
 ('rental_car', 0.7803377509117126),
 ('man_who', 0.7735366821289062),
 ('boy', 0.7638214826583862),
 ('her_father', 0.7615417838096619)]

In [62]:
model.most_similar("man")

[('woman', 0.8402586579322815),
 ('driver', 0.7768806219100952),
 ('girl', 0.7757141590118408),
 ('lot', 0.7640547752380371),
 ('passenger', 0.7632373571395874),
 ('someone', 0.7618067860603333),
 ('suspect', 0.7565501928329468),
 ('friend', 0.7449239492416382),
 ('bomb', 0.743765652179718),
 ('boy', 0.739574134349823)]

In [42]:
model.most_similar(positive=["women"],negative=["men"])

[('\\egregious', 0.4411745071411133),
 ('shown\\', 0.4398689270019531),
 ('unsubstantiated', 0.4283992648124695),
 ('extrajudicial', 0.42744091153144836),
 ('speculative\\', 0.40712136030197144),
 ('practices\\', 0.4051051437854767),
 ('disproportionate_impact', 0.4039613604545593),
 ('noninfringement', 0.39435338973999023),
 ('ineffectiveâ\x80\x9d', 0.3925777077674866),
 ('association\\', 0.39242303371429443)]

I felt looking at some political hot buttons would really illuminate the views of the court. There’s a lot of legislative terms around 'abortion', probably emblematic of the continuing battle on abortion rights. It is interesting that ‘tribal self-governance' makes an appearance, **maybe there was a case in which abortion laws on the reservation differed from the state.** 'women' and 'nurse' could point to a high number of female nurses or sexism depending on the cases and your viewpoint (rental car is just odd, as is the link between 'man' and 'bomb').

When we look at 'women' isolated from 'man', there is a lot of doubting terms like 'unsubstantiated', 'speculative' mixed with terms of magnitude like 'disproportionate impact' and 'egregious'. These terms seem to indicate women as victims and/or doubting victim testimony, a relationship i did not expect to find in such a modern dataset.


In [43]:
model.most_similar("race")

[('national_origin', 0.8153603076934814),
 ('gender', 0.8118056058883667),
 ('political_opinion', 0.7024502754211426),
 ('discrimination', 0.6937640905380249),
 ('sex', 0.6844298243522644),
 ('illness', 0.6839743256568909),
 ('race_color', 0.6773388981819153),
 ('race_color_religion_sex', 0.6557826399803162),
 ('harassment', 0.6508350968360901),
 ('persecution', 0.6488223671913147)]

In [60]:
model.most_similar(positive=["black","race"],negative=['white'])

[('national_origin', 0.8013886213302612),
 ('gender', 0.7701539397239685),
 ('race_color', 0.6503126621246338),
 ('discrimination', 0.6262590289115906),
 ('political_opinion', 0.6239770650863647),
 ('illness', 0.6233958005905151),
 ('sex_national', 0.6198543906211853),
 ('harassment', 0.6167808175086975),
 ('retaliation', 0.6125229597091675),
 ('her_gender', 0.6078314185142517)]