## Imports

In [2]:
#%pip install python-Levenshtein
import pandas as pd
from bs4 import BeautifulSoup
import re
import pickle
import Levenshtein
import json
import contractions
pd.set_option('display.max_colwidth', None)


In [3]:
df = pd.read_pickle('pulled_cases.pkl')
df.columns


Index(['level_0', 'index', 'ID', 'name', 'href', 'first_party', 'second_party',
       'winning_party', 'question', 'conclusion', 'winner_index', 'Facts'],
      dtype='object')

In [16]:
df.sample()

Unnamed: 0,level_0,index,ID,name,href,first_party,second_party,winning_party,question,conclusion,winner_index,Facts
3012,898,2971,62700,Manrique v. United States,https://api.oyez.org/cases/2016/15-7250,Marcelo Manrique,United States,United States,<p>Does an appellate court have jurisdiction over an appeal of a restitution award when the judgment that awarded the amount of restitution was entered after the notice of appeal was filed?</p>\n,"<p dir=""ltr"">Appellate courts may not review a restitution award where the amended judgment of the lower court granting restitution is entered after notice of appeal was filed. Justice Clarence Thomas delivered the opinion for the 6-2 majority. The Court held that, in order to secure appellate review of a judgment, a party must file notice of appeal from that particular judgment after that judgment has been entered. In this case, Manrique filed his only notice of appeal several months before the amended judgment granting restitution was entered. The Court rejected Manrique’s argument that the initial judgment deferring restitution and amended judgment granting restitution combined to form a single judgment. Instead, in cases of deferred restitution, the initial judgment and amended judgment qualify as two distinct appealable judgments. Therefore, the Court affirmed the lower court’s judgment and held that the U.S. Court of Appeals for the Eleventh Circuit did not have jurisdiction to review the amended judgment.</p>\n<p>In her dissenting opinion, Justice Ruth Bader Ginsburg argued that the district court failed to notify Manrique of his right to appeal from the amended judgment. Additionally, because the district court clerk sent the record for the amended judgment to the appellate court without waiting for Manrique to file notice of appeal, the district court conferred jurisdiction on the appellate court. Such conferral, Justice Ginsburg contended, adequately substituted the requirement that Manrique file a second notice of appeal from the amended judgment. Justice Sonia Sotomayor joined in the dissent.</p>\n<p>Justice Neil Gorsuch did not participate in the discussion or decision of this case.</p>\n",1,"[[UNK] [UNK] was convicted in federal district court of possession of child pornography. He was sentenced to a life term of supervised release or mandatory imprisonment, although the final judgment did not include an amount in the restitution and stated that would be included in the amended judgment. [UNK] filed his notice to appeal before the amended judgment was entered. When the amended judgment was entered while the appeal was pending, it contained the details of the restitution award, and both parties subsequently included arguments regarding the challenge of the award in their briefs. The YOU. S. Court of Appeals for the [UNK] Circuit ruled that it did not have jurisdiction to consider the challenge to the restitution award because [UNK] did not file a second order of appeal regarding the amended judgment but included the amount to mandatory restitution award.]"


## Filter: Remove Data Irrelevant to Model Training

In [17]:
df = df.drop(columns=['level_0','ID', 'index', 'href'])

In [18]:
df.sample()

Unnamed: 0,name,first_party,second_party,winning_party,question,conclusion,winner_index,Facts
32,McClanahan v. Arizona State Tax Commission,Rosalind McClanahan,Arizona State Tax Commission,Rosalind McClanahan,<p>Did the State of Arizona have a right to tax Navajo Indians residing on the Navajo Reservation if their income is entirely from reservation sources?</p>\n,"<p>No. In a unanimous decision, the Court reversed the decision of the Arizona Court of Appeals and held that Arizona did not have the right to tax McClanahan. In an opinion authored by Justice Thurgood Marshall, the Court emphasized the long-standing status of Indian reservations as exempt from state authority. While the Navajo Treaty signed between the Navajo nation and the United States did not explicitly exempt the nation from state laws, Arizona entered the Union on the explicit condition that it would lose its authority over Indian tribes and reservations within the state, including taxation powers. Consistent with its decision in <em>Warren Trading Post Co. v. Arizona State Tax Commission</em>, the Court ruled that Arizona ""had no jurisdiction to impose"" state income tax on McLanahan.</p>\n",0,"Rosalind McClanahan was a member of the Navajo Indian nation who lived on the Navajo Reservation in Apache County, Arizona. Her employer withheld $16.20 in 1967 for Arizona state income taxes. McClanahan sought the return of her withheld income. She claimed that since she was a Navajo Indian residing on the reservation and since her income was derived completely on the reservation, she was exempt from state taxation. When her request was denied, she filed suit in Apache County Superior Court. The Superior Court dismissed her claim. The Court of Appeals of Arizona affirmed the dismissal. The Supreme Court of Arizona rejected her petition for review."


## Clean Data

In [19]:
def sanitize_review(text):
    # remove HTML tags
    text = BeautifulSoup(str(text), 'html.parser').get_text()   
    # remove URLS
    text = re.sub(r'http\S+', '', str(text))
    return text
def remove_non_ascii(text):
    return text.encode(encoding='utf-8', errors='ignore').decode()
def fix_contractions(text):
    return contractions.fix(text)

df['Facts'] = df['Facts'].apply(sanitize_review)
df['question'] = df['question'].apply(sanitize_review)
df['conclusion'] = df['conclusion'].apply(sanitize_review)

df['Facts'] = df['Facts'].str.replace('\n', ' ')
df['conclusion'] = df['conclusion'].str.replace('\n', ' ')
df['question'] = df['question'].str.replace('\n', ' ')

df['Facts'] = df['Facts'].apply(remove_non_ascii)
df['question'] = df['question'].apply(remove_non_ascii)
df['conclusion'] = df['conclusion'].apply(remove_non_ascii)

df['Facts'] = df['Facts'].apply(fix_contractions)
df['question'] = df['question'].apply(fix_contractions)
df['conclusion'] = df['conclusion'].apply(fix_contractions)


## Test DF

In [26]:
print(len(df))
df.sample()

3464


Unnamed: 0,name,first_party,second_party,winning_party,question,conclusion,winner_index,Facts
2503,Heien v. North Carolina,Nicholas B. Heien,State of North Carolina,North Carolina,Does a police officer's mistake of law provide the individualized reasonable suspicion that the Fourth Amendment requires to justify a traffic stop?,"Yes. Chief Justice John G. Roberts, Jr., delivered the opinion for the 8-1 majority. The Court held that a search or seizure is reasonable under the Fourth Amendment when an officer has made a reasonable factual or legal mistake. Because Fourth Amendment jurisprudence turns on the question of reasonableness, governing officials have traditionally been allowed leeway to enforce the law for the community's protection. As long as the mistake of fact or law in question was reasonable, the Fourth Amendment does not hold such mistakes to be incompatible with the concept of reasonable suspicion. However, the Court also held that those mistakes must be objectively reasonable; an officer cannot gain the benefits of Fourth Amendment reasonableness through a sloppy or incomplete knowledge of the law. In her concurring opinion, Justice Elena Kagan emphasized that the majority opinion's analysis was limited to when the mistake of law in question is an objectively reasonable one. Justice Kagan also wrote that the test to determine whether an officer made an objectively reasonable mistake is much more stringent than the one to determine whether a government official is entitled to qualified immunity. Justice Ruth Bader Ginsburg joined in the concurring opinion. Justice Sonia Sotomayor wrote a dissenting opinion in which she argued that Fourth Amendment jurisprudence has traditionally focused on the officer's factual conclusions rather than understanding of the law. Expanding leeway allowed to police officers with respect to their factual assessment to the meaning of the laws they are meant to enforce runs the risk of eroding the Fourth Amendment's protections. In the absence of any evidence that holding police officers to this standard would prevent effective enforcement of the law, mistakes of law should not be considered reasonable under the Fourth Amendment.",1,"['On April 29, 2010, Sergeant [UNK] of the [UNK] County Sheriff\'s Department observed [UNK] Javier [UNK] driving north on I - 77 with a broken brake light. When [UNK] pulled over the vehicle, he noticed another man, Nicholas [UNK], lying under a blanket in his backseat. [UNK] spoke with the two men, felt that their stories did not match up, and was concerned that [UNK] has not gotten up from the back seat. [UNK] asked for permission to search a vehicle. [UNK] agreed, and [UNK] found a bag containing 2. 2 pounds of cocaine in the car. A grand jury indicted [UNK] for two counts of trafficking cocaine. [UNK] filed another lawsuit to suppress the evidence discovered during the search of his vehicle, but the trial court denied the motion. The North Carolina Court of Appeals reversed the trial court that held that the traffic stop was not objectively reasonable because North Carolina law only required one working brake light. The North Carolina Supreme Court reversed and held that when an officer\'s mistake of the law is reasonable, it may give rise to the "" reasonable suspicion "" required for a traffic stop of a vehicle under the Fourth Amendment. That North Carolina Supreme Court sent the case back to the state Court of Appeals. The North Carolina Court of Appeals found no error in the trial court\'s judgment. A dissenting judge, however, stated that the North Carolina Supreme Court\'s ruling created "" fundamental unfairness "" because it held citizens to the traditional rule that "" ignorance of the law is no excuse "" while allowing police to be ignorant of the law. Based on this dissent, [UNK] again appealed to the North Carolina Supreme Court which rejected [UNK]\'s motion.']"


In [28]:
def clean_fact(fact):
    # Remove the surrounding brackets and quotes
    fact = fact.strip("['']")
    
    # Remove all instances of [UNK]
    fact = fact.replace("[UNK]", "")
    
    # Remove extraneous backslashes and fix extra spaces
    fact = fact.replace("\\'", "'").replace("  ", " ")
    
    return fact.strip()

# Apply the cleaning function to the Fact column
df['Facts'] = df['Facts'].apply(clean_fact)

In [36]:
df.sample()

Unnamed: 0,name,first_party,second_party,winning_party,question,conclusion,winner_index,Facts
1328,Pepper v. United States,Jason Pepper,United States,Pason Pepper,"1) Can a federal district judge consider a defendant's post-sentencing rehabilitation as a permissible factor supporting a sentencing variance? 2) As a sentencing consideration, should post-sentencing rehabilitation be treated the same as post-offense rehabilitation 3) When a federal district judge is removed from resentencing a defendant after remand and a new judge is assigned, is the new judge obligated to follow sentencing findings issued by the original judge?","Yes and no. The Supreme Court reversed in part, affirmed in part and remanded the case back to the lower court in a majority opinion written by Justice Sonia Sotomayor. The Court held that when the defendant's sentence has been set aside on appeal, a district court at resentencing may consider evidence of the defendant's rehabilitation after the initial sentences; and, that evidence may, in appropriate cases, support a downward variance from the sentencing guidelines. Justice Stephen J. Breyer filed a concurrence in which he agreed with the majority that the ""law does not require a sentencing court to follow a Guideline policy statement that forbids taking account of post-sentencing rehabilitation."" He went on, however, to suggest: ""this conclusion does not leave a sentencing court free to disregard the Guidelines at will."" Meanwhile, Justice Samuel Alito filed a partial concurrence and partial dissent, contending that ""requiring judges to give significant weight to the Commission's policy decisions does not run afoul of the Sixth Amendment right that the mandatory Guidelines system was found to violate, i.e., the right to have a jury make certain factual findings that are relevant to sentencing."" Justice Clarence Thomas dissented in full, writing that he would have affirmed the lower court's decision and upheld Pepper's sentence. Justice Elena Kagan took no part in consideration of the case.",0,"Jason Pepper pleaded guilty to conspiracy to distribute 500 grams or more of a mixture or substance containing methamphetamine in an Iowa federal district court. In the latest of a long-running series of appeals and remands, a newly assigned Iowa federal district court sentenced Mr. Pepper to 77 months imprisonment and 12 months supervised release – a 20% downward departure from the Federal Sentencing Guidelines advisory range. Thereafter, the district court granted the government's motion to reduce Mr. Pepper's sentence further to 65 months imprisonment because of the assistance Mr. Pepper provided after he was initially sentenced. Mr. Pepper appealed arguing in part that the district court should consider evidence of his post-sentence rehabilitation to reduce his sentence further. On appeal, the YOU.S. Court of Appeals for the Eighth Circuit affirmed Mr. Pepper's sentence, holding in part that evidence of a defendant's post-sentence rehabilitation was not relevant at resentencing. The court reasoned that Eighth Circuit precedent was clear that such evidence was not relevant."


In [37]:
df.to_pickle('cleaned_cases.pkl')