# Problem set 3: Text analysis of DOJ press releases

**Total points (without extra credit)**: 52 

- For background:

    - DOJ is the federal law enforcement agency responsible for federal prosecutions; this contrasts with the local prosecutions in the Cook County dataset we analyzed earlier. Here's a short explainer on which crimes get prosecuted federally versus locally: https://www.criminaldefenselawyer.com/resources/criminal-defense/federal-crime/state-vs-federal-crimes.htm#:~:text=Federal%20criminal%20prosecutions%20are%20handled,of%20state%20and%20local%20law. 
    - Here's the Kaggle that contains the data: https://www.kaggle.com/jbencina/department-of-justice-20092018-press-releases 
    - Here's the code the dataset creator used to scrape those press releases here if you're interested: https://github.com/jbencina/dojreleases

## 0.0 Import packages

In [67]:
## helpful packages
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import random
import re
import string

## nltk imports
import nltk
### uncomment and run these lines if you haven't downloaded relevant nltk add-ons yet
#nltk.download('averaged_perceptron_tagger')
#nltk.download('stopwords')
from nltk import pos_tag
from nltk.tokenize import word_tokenize, wordpunct_tokenize
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords



## spacy imports
import spacy
### uncomment and run the below line if you haven't loaded the en_core_web_sm library yet
#! python -m spacy download en_core_web_sm
import en_core_web_sm
nlp = en_core_web_sm.load()

## vectorizer
from sklearn.feature_extraction.text import CountVectorizer

## sentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## lda
from gensim import corpora
import gensim

## repeated printouts and wide-format text
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_colwidth', None)

## 0.1 Load and clean text data

In [2]:
## first, unzip the file pset3_inputdata.zip 
## then, run this code to load the unzipped json file and convert to a dataframe
## (may need to change the pathname depending on where you store stuff)
## and convert some of the attributes from lists to values
doj = pd.read_json("combined.json", lines = True)

## due to json, topics are in a list so remove them and concatenate with ;
doj['topics_clean'] = ["; ".join(topic) 
                      if len(topic) > 0 else "No topic" 
                      for topic in doj.topics]

## similarly with components
doj['components_clean'] = ["; ".join(comp) 
                           if len(comp) > 0 else "No component" 
                           for comp in doj.components]

## drop older columns from data
doj = doj[['id', 'title', 'contents', 'date', 'topics_clean', 
           'components_clean']].copy()

doj.head()

Unnamed: 0,id,title,contents,date,topics_clean,components_clean
0,,Convicted Bomb Plotter Sentenced to 30 Years,"PORTLAND, Oregon. – Mohamed Osman Mohamud, 23, who was convicted in 2013 of attempting to use a weapon of mass destruction (explosives) in connection with a plot to detonate a vehicle bomb at an annual Christmas tree lighting ceremony in Portland, was sentenced today to serve 30 years in prison, followed by a lifetime term of supervised release. Mohamud, a naturalized U.S. citizen from Somalia and former resident of Corvallis, Oregon, was arrested on Nov. 26, 2010, after he attempted to detonate what he believed to be an explosives-laden van that was parked near the tree lighting ceremony in Portland. The arrest was the culmination of a long-term undercover operation, during which Mohamud was monitored closely for months as his bomb plot developed. The device was in fact inert, and the public was never in danger from the device. At sentencing, United States District Court Judge Garr M. King, who presided over Mohamed’s 14-day trial, said “the intended crime was horrific,” and that the defendant, even though he was presented with options by undercover FBI employees, “never once expressed a change of heart.” King further noted that the Christmas tree ceremony was attended by up to 10,000 people, and that the defendant “wanted everyone to leave either dead or injured.” King said his sentence was necessary in view of the seriousness of the crime and to serve as deterrence to others who might consider similar acts. “With today’s sentencing, Mohamed Osman Mohamud is being held accountable for his attempted use of what he believed to be a massive bomb to attack innocent civilians attending a public Christmas tree lighting ceremony in Portland,” said John P. Carlin, Assistant Attorney General for National Security. “The evidence clearly indicated that Mohamud was intent on killing as many people as possible with his attack. Fortunately, law enforcement was able to identify him as a threat, insert themselves in the place of a terrorist that Mohamud was trying to contact, and thwart Mohamud’s efforts to conduct an attack on our soil. This case highlights how the use of undercover operations against would-be terrorists allows us to engage and disrupt those who wish to commit horrific acts of violence against the innocent public. The many agents, analysts, and prosecutors who have worked on this case deserve great credit for their roles in protecting Portland from the threat posed by this defendant and ensuring that he was brought to justice.” “This trial provided a rare glimpse into the techniques Al Qaeda employs to radicalize home-grown extremists,” said Amanda Marshall, U.S. Attorney for the District of Oregon. “With the sentencing today, the court has held this defendant accountable. I thank the dedicated professionals in the law enforcement and intelligence communities who were responsible for this successful outcome. I look forward to our continued work with Muslim communities in Oregon who are committed to ensuring that all young people are safe from extremists who seek to radicalize others to engage in violence.” According to the trial evidence, in February 2009, Mohamud began communicating via e-mail with Samir Khan, a now-deceased al Qaeda terrorist who published Jihad Recollections, an online magazine that advocated violent jihad, and who also published Inspire, the official magazine of al-Qaeda in the Arabian Peninsula. Between February and August 2009, Mohamed exchanged approximately 150 emails with Khan. Mohamud wrote several articles for Jihad Recollections that were published under assumed names. In August 2009, Mohamud was in email contact with Amro Al-Ali, a Saudi national who was in Yemen at the time and is today in custody in Saudi Arabia for terrorism offenses. Al-Ali sent Mohamud detailed e-mails designed to facilitate Mohamud’s travel to Yemen to train for violent jihad. In December 2009, while Al-Ali was in the northwest frontier province of Pakistan, Mohamud and Al-Ali discussed the possibility of Mohamud traveling to Pakistan to join Al-Ali in terrorist activities. Mohamud responded to Al-Ali in an e-mail: “yes, that would be wonderful, just tell me what I need to do.” Al-Ali referred Mohamud to a second associate overseas and provided Mohamud with a name and email address to facilitate the process. In the following months, Mohamud made several unsuccessful attempts to contact Al-Ali’s associate. Ultimately, an FBI undercover operative contacted Mohamud via email under the guise of being an associate of Al-Ali’s. Mohamud and the FBI undercover operative agreed to meet in Portland in July 2010. At the meeting, Mohamud told the FBI undercover operative he had written articles that were published in Jihad Recollections. Mohamud also said that he wanted to become “operational.” Asked what he meant by “operational,” Mohamud said he wanted to put an explosion together, but needed help. According to evidence presented at trial, at a meeting in August 2010, Mohamud told undercover FBI operatives he had been thinking of committing violent jihad since the age of 15. Mohamud then told the undercover FBI operatives that he had identified a potential target for a bomb: the annual Christmas tree lighting ceremony in Portland’s Pioneer Courthouse Square on Nov. 26, 2010. The undercover FBI operatives cautioned Mohamud several times about the seriousness of this plan, noting there would be many people at the event, including children, and emphasized that Mohamud could abandon his attack plans at any time with no shame. Mohamud indicated the deaths would be justified and that he would not mind carrying out a suicide attack on the crowd. According to evidence presented at trial, in the ensuing months Mohamud continued to express his interest in carrying out the attack and worked on logistics. On Nov. 4, 2010, Mohamud and the undercover FBI operatives traveled to a remote location in Lincoln County, Oregon, where they detonated a bomb concealed in a backpack as a trial run for the upcoming attack. During the drive back to Corvallis, Mohamud was asked if was capable looking at all the bodies of those who would be killed during the explosion. In response, Mohamud noted, “I want whoever is attending that event to be, to leave either dead or injured.” Mohamud later recorded a video of himself, with the assistance of the undercover FBI operatives, in which he read a statement that offered his rationale for his bomb attack. On Nov. 18, 2010, undercover FBI operatives picked up Mohamud to travel to Portland to finalize the details of the attack. On Nov. 26, 2010, just hours before the planned attack, Mohamud examined the 1,800 pound bomb in the van and remarked that it was “beautiful.” Later that day, Mohamud was arrested after he attempted to remotely detonate the inert vehicle bomb rked near the Christmas tree lighting ceremony This case was investigated by the FBI, with assistance from the Oregon State Police, the Corvallis Police Department, the Lincoln County Sheriff’s Office and the Portland Police Bureau. The prosecution was handled by Assistant U.S. Attorneys Ethan D. Knight and Pamala Holsinger from the U.S. Attorney’s Office for the District of Oregon. Trial Attorney Jolie F. Zimmerman, from the Counterterrorism Section of the Justice Department’s National Security Division, assisted. # # # 14-1077",2014-10-01T00:00:00-04:00,No topic,National Security Division (NSD)
1,12-919,$1 Million in Restitution Payments Announced to Preserve North Carolina Wetlands,"WASHINGTON – North Carolina’s Waccamaw River watershed will benefit from a $1 million restitution order from a federal court, funding environmental projects to acquire and preserve wetlands in an area damaged by illegal releases of wastewater from a corporate hog farm, announced Ignacia S. Moreno, Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division; U.S. Attorney for the Eastern District of North Carolina Thomas G. Walker; Director Greg McLeod from the North Carolina State Bureau of Investigation; and Camilla M. Herlevich, Executive Director of the North Carolina Coastal Land Trust. Freedman Farms Inc. was sentenced in February 2012 to five years of probation and ordered to pay $1.5 million in fines, restitution and community service payments for violating the Clean Water Act when it discharged hog waste into a stream that leads to the Waccamaw River. William B. Freedman, president of Freedman Farms, was sentenced to six months in prison to be followed by six months of home confinement. Freedman Farms also is required to implement a comprehensive environmental compliance program and institute an annual training program. In an order issued on April 19, 2012, the court ordered that the defendants would be responsible for restitution of $1 million in the form of five annual payments starting in January 2013, which the court will direct to the North Carolina Coastal Land Trust (NCCLT). The NCCLT plans to use the money to acquire and conserve land along streams in the Waccamaw watershed. The court also directed a $75,000 community service payment to the Southern Environmental Enforcement Network, an organization dedicated to environmental law enforcement training and information sharing in the region. “The resolution of the case against Freedman Farms demonstrates the commitment of the Department of Justice to enforcing the Clean Water Act to ensure the protection of human health and the environment,” said Assistant Attorney General Moreno. “The court-ordered restitution in this case will conserve wetlands for the benefit of the people of North Carolina. By enforcing the nation’s environmental laws, we will continue to ensure that concentrated animal feeding operations (CAFOs) operate without threatening our drinking water, the health of our communities and the environment.” “This office is committed to doing our part to hold accountable those who commit crimes against our environment, which can cause serious health problems to residents and damage the environment that makes North Carolina such a beautiful place to live and visit,” said U.S. Attorney Walker. “This case shows what we can accomplish when our SBI agents work closely with their local, state and federal partners to investigate environmental crimes and hold the polluters accountable,” said Director McLeod. “We’ll continue our efforts to fight illegal pollution that damages our water and puts the public’s health at risk.” “The Waccamaw is unique and wild,” said Director Herlevich of the North Carolina Coastal Land Trust. “Its watershed includes some of the most extensive cypress gum swamps in the state, and its headwaters at Lake Waccamaw contain fish that are found nowhere else on Earth. We appreciate the trust of the court and the U. S. Attorney, and we look forward to using these funds for conservation projects in a river system that is one of our top conservation priorities.” According to evidence presented in court, in December 2007 Freedman Farms discharged hog waste into Browder’s Branch, a tributary to the Waccamaw River that flows through the White Marsh, a large wetlands complex. Freedman Farms, located in Columbus County, N.C., is in the business of raising hogs for market, and this particular farm had some 4,800 hogs. The hog waste was supposed to be directed to two lagoons for treatment and disposal. Instead, hog waste was discharged from Freedman Farms directly into Browder’s Branch. The Clean Water Act is a federal law that makes it illegal to knowingly or negligently discharge a pollutant into a water of the United States. The Freedman case was investigated by the U.S. Environmental Protection Agency (EPA) Criminal Investigation Division, the U.S. Army Corps of Engineers and the North Carolina State Bureau of Investigation, with assistance from the EPA Science and Ecosystem Support Division. The case was prosecuted by Assistant U.S. Attorney J. Gaston B. Williams of the Eastern District of North Carolina and Trial Attorney Mary Dee Carraway of the Environmental Crimes Section of the Justice Department’s Environment and Natural Resources Division. The North Carolina Coastal Land Trust is celebrating its 20th anniversary of saving special lands in eastern North Carolina. The organization has protected nearly 50,000 acres of lands with scenic, recreational, historic and ecological values. North Carolina Coastal Land Trust has saved streams and wetlands that provide clean water, forests that are havens for wildlife, working farms that provide local food and nature parks that everyone can enjoy. More information about the Coastal Land Trust is available at www.coastallandtrust.org.",2012-07-25T00:00:00-04:00,No topic,Environment and Natural Resources Division
2,11-1002,$1 Million Settlement Reached for Natural Resource Damages at Superfund Site in Massachusetts,"BOSTON– A $1-million settlement has been reached for natural resource damages (NRD) at the Blackburn & Union Privileges Superfund Site in Walpole, Mass., the Departments of Justice and Interior (DOI), and the Office of the Massachusetts Attorney General announced today. The Blackburn & Union Privileges Superfund Site includes 22 acres of contaminated land and water in Walpole. The contamination resulted from the operations of various industrial facilities dating back to the 19th century that exposed the site to asbestos, arsenic, lead and other hazardous substances. The private parties involved in the settlement include two former owners and operators of the site, W.R. Grace & Co.– Conn. and Tyco Healthcare Group LP, as well as the current owners, BIM Investment Corp. and Shaffer Realty Nominee Trust. From about 1915 to 1936, a predecessor of W.R. Grace manufactured asbestos brake linings and clutch linings on a large portion of the property. From 1946 to about 1983, a predecessor of Tyco Healthcare operated a cotton fabric manufacturing business, which used caustic solutions, on a portion of the property. In a 2010 settlement with U.S. Environmental Protection Agency (EPA), the four private parties agreed to perform a remedial action to clean up the site at an estimated cost of $13 million. The consent decree lodged today resolves both state and federal NRD liability claims; it requires the parties to pay $1,094,169.56 to the state and federal natural resource trustees, the Massachusetts Executive Office of Energy and Environmental Affairs (EEA) and DOI, for injuries to ecological resources including groundwater and wetlands, which provide habitat for waterfowl and wading birds, including black ducks and great blue herons. The trustees will use the settlement funds for natural resource restoration projects in the area. “This settlement demonstrates our commitment to recovering damages from the parties responsible for injury to natural resources, in partnership with state trustees,” said Bruce Gelber, Acting Deputy Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division. “The citizens of Walpole have had to live with the environmental impact of this contamination for many years,” Attorney General Martha Coakley said. “We are pleased that today’s agreement will not only require the responsible parties to reimburse taxpayer dollars, but will also provide funding to begin restoring or replacing the wetland and other natural resources.” The consent decree was lodged in the U.S. District Court for Massachusetts. A portion of the funds, $300,000, will be distributed to the EEA-sponsored groundwater restoration projects; $575,000 will be used for ecological restoration projects jointly sponsored by EEA and the U.S. Fish and Wildlife Service (FWS). In addition, $125,000 will go for projects jointly sponsored by EEA and FWS that achieve both ecological and groundwater restoration; $57,491.34 will be allocated for reimbursement for the FWS’s assessment costs; and $36,678.22 will be distributed as reimbursement for the commonwealth’s assessment costs. “This settlement provides the means for a range of projects designed to compensate the public for decades of groundwater and other ecological damage at this site. I encourage local citizens and organizations to become engaged in the public process that will take place as we solicit, take comment on, and choose these projects in the months ahead,” said Energy and Environmental Affairs Secretary Richard K. Sullivan Jr., who serves as the Commonwealth’s Natural Resources Damages trustee. “This settlement will help restore habitat for fish and wildlife in the Neponset River watershed,” said Tom Chapman of the FWS New England Field Office. “We look forward to working with the commonwealth and local stakeholders to implement restoration.” “More than 100 years-worth of industrial activities at this site caused major environmental contamination to the Neponset River, nearby wetlands and to groundwater below the site,” said Commissioner Kenneth Kimmell of the Massachusetts Department of Environmental Protection (MassDEP), which will staff the Trustee Council for the Commonwealth. “We will ensure that the community and the public will be active participants in the process to use these NRD funds to restore the injured natural resources.” Under the federal Comprehensive Environmental Response, Compensation and Liability Act, EEA and DOI, acting through the FWS, are the designated state and federal natural resource Trustees for the site. The site has been listed on the EPA’s National Priorities List since 1994. The consent decree is subject to a public comment period and court approval. A copy of the consent decree and instructions about how to submit comments is available on www.usdoj.gov/enrd/Consent_Decrees.html . After the consent decree is approved, EEA and FWS will develop proposed restoration plans to use the settlement funds for restoration projects. The proposed restoration plans will also be made available to the public for review and comment. Assistant Attorney General Matthew Brock of Massachusetts Attorney General Coakley's Environmental Protection Division handled this matter. Attorney Jennifer Davis of MassDEP, Attorney Anna Blumkin of EEA and MassDEP’s NRD Coordinator Karen Pelto also worked on this settlement.",2011-08-03T00:00:00-04:00,No topic,Environment and Natural Resources Division
3,10-015,10 Las Vegas Men Indicted \r\nfor Falsifying Vehicle Emissions Tests,"WASHINGTON—A federal grand jury in Las Vegas today returned indictments against 10 Nevada-certified emissions testers for falsifying vehicle emissions test reports, the Justice Department announced. Each defendant faces one felony Clean Air Act count for falsifying reports between November 2007 and May 2009. The number of falsifications varied by defendant, with some defendants having falsified approximately 250 records, while others falsified more than double that figure. One defendant is alleged to have falsified over 700 reports. The individuals indicted include: Escudero resides in Pahrump, Nev. All other individuals are from Clark County, Nev. The 10 defendants are alleged to have engaged in a practice known as ""clean scanning"" vehicles. The scheme involved entering the Vehicle Identification Number (VIN) for a vehicle that would not pass the emissions test into the computerized system, then connecting a different vehicle the testers knew would pass the test. These falsifications were allegedly performed for anywhere from $10 to $100 over and above the usual emissions testing fee. The U.S. Environmental Protection Agency (EPA), under the Clean Air Act, requires the state of Nevada to conduct vehicle emissions testing in certain areas because the areas exceed national standards for carbon monoxide and ozone. Las Vegas is currently required to perform emissions testing. To obtain a registration renewal, vehicle owners bring the vehicles to a licensed inspection station for testing. The emissions inspector logs into a computer to activate the system by using a unique password issued to the emissions inspector. The emissions inspector manually inputs the vehicle’s VIN to identify the tested vehicle, then connects the vehicle for model year 1996 and later to an onboard diagnostics port connected to an analyzer. The analyzer downloads data from the vehicle’s computer, analyzes the data and provides a ""pass"" or ""fail"" result. The pass or fail result and vehicle identification data are reported on the Vehicle Inspection Report. It is a crime to knowingly alter or conceal any record or other document required to be maintained by the Clean Air Act. ""Falsifications of vehicle emissions testing, such as those alleged in the indictments unsealed today, are serious matters and we intend to use all of our enforcement tools to stop this harmful practice. These actions undermine a system that is designed to reduce air pollutants including smog and provide better air quality for the citizens of Nevada,"" said Ignacia S. Moreno, Assistant Attorney General for the Justice Department’s Environment and Natural Resources Division. ""The residents of Nevada deserve to know that the vast majority of licensed vehicle emission inspectors are not corrupt and are not circumventing emission testing procedures,"" said U.S. Attorney Bogden. ""These indictments should serve as a clear warning to offenders that the Department of Justice will prosecute you if you make fraudulent statements and reports concerning compliance with the federal Clean Air Act."" ""Lying about car emissions means dirtier air, which is especially of concern in areas like Las Vegas that are already experiencing air quality problems,"" said Cynthia Giles, Assistant Administrator for Enforcement and Compliance Assurance at EPA. ""We will take aggressive action to ensure communities have clean air."" The maximum penalty for the felony violations contained in the indictments includes up to two years in prison and a fine of up to $250,000. An indictment is merely an accusation, and a defendant is presumed innocent unless and until proven guilty in a court of law. The case was investigated by the EPA, Criminal Investigation Division; and the Nevada Department of Motor Vehicles Compliance Enforcement Division. The case is being prosecuted by the U.S. Attorney’s Office for the District of Nevada and the Justice Department’s Environmental Crimes Section.",2010-01-08T00:00:00-05:00,No topic,Environment and Natural Resources Division
4,18-898,"$100 Million Settlement Will Speed Cleanup Work at Centredale Manor Superfund Site in North Providence, R.I.","The U.S. Department of Justice, the U.S. Environmental Protection Agency (EPA), and the Rhode Island Department of Environmental Management (RIDEM) announced today that two subsidiaries of Stanley Black & Decker Inc.—Emhart Industries Inc. and Black & Decker Inc.—have agreed to clean up dioxin contaminated sediment and soil at the Centredale Manor Restoration Project Superfund Site in North Providence and Johnston, Rhode Island. “We are pleased to reach a resolution through collaborative work with the responsible parties, EPA, and other stakeholders,” said Acting Assistant Attorney General Jeffrey H. Wood for the Justice Department's Environment and Natural Resources Division . “Today’s settlement ends protracted litigation and allows for important work to get underway to restore a healthy environment for citizens living in and around the Centredale Manor Site and the Woonasquatucket River.” “This settlement demonstrates the tremendous progress we are achieving working with responsible parties, states, and our federal partners to expedite sites through the entire Superfund remediation process,” said EPA Acting Administrator Andrew Wheeler. “The Centredale Manor Site has been on the National Priorities List for 18 years; we are taking charge and ensuring the Agency makes good on its promise to clean it up for the betterment of the environment and those communities affected.” “Successfully concluding this settlement paves the way for EPA to make good on our commitment to aggressively pursue cleaning up the Centredale Manor Superfund Site,” said EPA New England Regional Administrator Alexandra Dunn. “We are excited to get to work on the cleanup at this site, and get it closer to the goal of being fully utilized by the North Providence and Johnston communities.” “We are pleased that the collective efforts of the State of Rhode Island, EPA, and DOJ in these negotiations have concluded in this major milestone toward the cleanup of the Centredale Manor Restoration Superfund site and are consistent with our long-standing efforts to make the polluter pay,” said RIDEM Director Janet Coit. “The settlement will speed up a remedy that protects public health and the river environment, and moves us closer to the day that we can reclaim recreational uses of this beautiful river resource.” The settlement, which includes cleanup work in the Woonasquatucket River (River) and bordering residential and commercial properties along the River, requires the companies to perform the remedy selected by EPA for the Site in 2012, which is estimated to cost approximately $100 million, and resolves longstanding litigation. The cleanup remedy includes excavation of contaminated sediment and floodplain soil from the Woonasquatucket River, including from adjacent residential properties. Once the cleanup remedy is completed, full access to the Woonasquatucket River should be restored for local citizens. The cleanup will be a step toward the State’s goal of a fishable and swimmable river. The work will also include upgrading caps over contaminated soil in the peninsula area of the Site that currently house two high-rise apartment buildings. The settlement also ensures that the long-term monitoring and maintenance of the site, as directed in the remedy, will be implemented to ensure that public health is protected. Under the settlement, Emhart and Black & Decker will reimburse EPA for approximately $42 million in past costs incurred at the Site. The companies will also reimburse EPA and the State of Rhode Island for future costs incurred by those agencies in overseeing the work required by the settlement. The settlement will also include payments on behalf of two federal agencies to resolve claims against those agencies. These payments, along with prior settlements related to the Site, will result in a 100 percent recovery for the United States of its past and future response costs related to the Site. Litigation related to the Site has been ongoing for nearly eight years. While the Federal District Court found Black & Decker and Emhart to be liable for their hazardous waste and responsible to conduct the cleanup of the Site, it had also ruled that EPA needed to reconsider certain aspects of that cleanup. EPA appealed the decision requiring it to reconsider aspects of the cleanup. This settlement, once entered by the District Court, will resolve the litigation between the United States, Rhode Island, and Emhart and Black and Decker, allowing the cleanup of the Site to begin. The Site spans a one and a half mile stretch of the Woonasquatucket River and encompasses a nine-acre peninsula, two ponds and a significant forested wetland. From the 1940s to the early 1970s, Emhart’s predecessor operated a chemical manufacturing facility on the peninsula and used a raw material that was contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin, a toxic form of dioxin. The Site property was also previously used by a barrel refurbisher. Elevated levels of dioxins and other contaminants have been detected in soil, groundwater, sediment, surface water and fish. The Site was added to the National Priorities List (NPL) in 2000, and in December 2017, EPA included the Centredale Manor Restoration Project Superfund Site on a list of Superfund sites targeted for immediate and intense attention. Several short-term actions were previously performed at the Site to address immediate threats to the residents and minimize potential erosion and downstream transport of contaminated soil and sediment. This settlement is the latest agreement EPA has reached since the Site was listed on the NPL. Prior agreements addressed the performance and recovery of costs for the past environmental investigations and interim cleanup actions from Emhart, the barrel reconditioning company, the current owners of the peninsula portion of the Site, and other potentially responsible parties. The Consent Decree, lodged in the U.S. District Court of Rhode Island, will be posted in the Federal Register and available for public comment for a period of 30 days. The Consent Decree can be viewed on the Justice Department website: www.justice.gov/enrd/Consent_Decrees.html. EPA information on the Centredale Manor Superfund Site: www.epa.gov/superfund/centredale.",2018-07-09T00:00:00-04:00,Environment,Environment and Natural Resources Division


## 1. Tagging and sentiment scoring (17 points)

Focus on the following press release: `id` == "17-1204" about this pharmaceutical kickback prosecution: https://www.forbes.com/sites/michelatindera/2017/11/16/fentanyl-billionaire-john-kapoor-to-plead-not-guilty-in-opioid-kickback-case/?sh=21b8574d6c6c 

The `contents` column is the one we're treating as a document. You may need to to convert it from a pandas series to a single string.

We'll call the raw string of this press release `pharma`

In [3]:
## your code to subset to one press release and take the string

pharma = doj.loc[doj['id'] == "17-1204"]['contents']

### 1.1 part of speech tagging (3 points)

A. Preprocess the `pharma` press release to remove all punctuation / digits (you can use `.isalpha()` to subset)

B. With the preprocessed press release from part A, use the part of speech tagger within nltk to tag all the words in that one press release with their part of speech. 

C. Using the output from B, extract the adjectives and sort those adjectives from most occurrences to fewest occurrences. Print a dataframe with the 5 most frequent adjectives and their counts in the `pharma` release. See here for a list of the names of adjectives within nltk: https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/

**Resources**:

- Documentation for `.isalpha()`: https://www.w3schools.com/python/ref_string_isalpha.asp

In [4]:
#A
pharma_cleaned = pharma.apply(lambda x: ''.join([char for char in x if char.isalpha() or char.isspace()]))

In [5]:
#B
pharma_str = pharma_cleaned.to_string()
tokens = word_tokenize(pharma_str)
tokens_pos = nltk.pos_tag(tokens)

In [6]:
#C

#create list of all adjectives
all_adj = [one_tok[0] for one_tok in tokens_pos if one_tok[1] == 'JJ']

#convert from list to dataframe
df_all_adj = pd.DataFrame(all_adj)
df_all_adj = df_all_adj.rename(columns={0:'Adjective'})

#sort by value counts
sorted_adj = df_all_adj['Adjective'].value_counts().reset_index()
sorted_adj.columns = ['Adjective','Count']

#find most 5 frequent 
most_freq_5 = sorted_adj.head(5) 
most_freq_5

Unnamed: 0,Adjective,Count
0,former,8
1,opioid,5
2,nationwide,4
3,addictive,3
4,other,3


## 1.2 named entity recognition (4 points)

A. Using the original `pharma` press release (so the one before stripping punctuation/digits), use spaCy to extract all named entities from the press release.

B. Print the unique named entities with the tag: `LAW`

In [7]:
#A #QUESTION: not sure if this we should be sotring the named entities somewhere? or if just printing them is ok? 
spacy_pharma = nlp(pharma.to_string())
for one_tok in spacy_pharma.ents: 
    print( "Entity: " + one_tok.text + " ; NER tag: " + one_tok.label_)

Entity: 4909 ; NER tag: DATE
Entity: Insys Therapeutics Inc. ; NER tag: ORG
Entity: today ; NER tag: DATE
Entity: Fentanyl ; NER tag: PERSON
Entity: More than 20,000 ; NER tag: CARDINAL
Entity: Americans ; NER tag: NORP
Entity: last year ; NER tag: DATE
Entity: millions ; NER tag: CARDINAL
Entity: Jeff Sessions ; NER tag: PERSON
Entity: This Justice Department ; NER tag: ORG
Entity: Trump ; NER tag: PERSON
Entity: American ; NER tag: NORP
Entity: ”John N. Kapoor ; NER tag: PERSON
Entity: 74 ; NER tag: DATE
Entity: Phoenix ; NER tag: GPE
Entity: Ariz. ; NER tag: GPE
Entity: the Board of Directors ; NER tag: ORG
Entity: Insys ; NER tag: ORG
Entity: this morning ; NER tag: TIME
Entity: Arizona ; NER tag: GPE
Entity: RICO ; NER tag: LAW
Entity: Kapoor ; NER tag: PERSON
Entity: Executive ; NER tag: ORG
Entity: Board ; NER tag: ORG
Entity: Insys ; NER tag: ORG
Entity: Phoenix ; NER tag: GPE
Entity: today ; NER tag: DATE
Entity: U.S. ; NER tag: GPE
Entity: District Court ; NER tag: ORG
Entity

In [8]:
#B
for one_tok in spacy_pharma.ents: 
    if (one_tok.label_ == "LAW"):
        print("Entity: " + one_tok.text + " ; NER tag: " + one_tok.label_)

Entity: RICO ; NER tag: LAW
Entity: the Controlled Substances Act ; NER tag: LAW
Entity: RICO ; NER tag: LAW


C. Use Google to summarize in one sentence what the `RICO` named entity means and why this might apply to a pharmaceutical kickbacks case (and not just a mafia case...) 

The Racketeer Influenced and Corrupt Organizations Act target organized crime, which can involve illicit drug activities such as moving money illegally via kickbacks from drug sales.  

D. You want to extract the possible sentence lengths the CEO is facing; pull out the named entities with (1) the label `DATE` and (2) that contain the word year or years (hint: you may want to use the `re` module for that second part). Print these named entities.

In [9]:
#D
for one_tok in spacy_pharma.ents:
    if one_tok.label_ == "DATE":
        match = re.search(r'((?:.*?)(?:year|years)(?:.*?))', one_tok.text)
        if match:
            print("Entity: " + one_tok.text + " ; NER tag: " + one_tok.label_)


Entity: last year ; NER tag: DATE
Entity: 20 years ; NER tag: DATE
Entity: three years ; NER tag: DATE
Entity: five years ; NER tag: DATE
Entity: three years ; NER tag: DATE


E. Pull and print the original parts of the press releases where those year lengths are mentioned (e.g., the sentences or rough region of the press release). Describe in your own words (1 sentence) what length of sentence (prison) and probation (supervised release) the CEO may be facing if convicted after this indictment (if there are multiple lengths mentioned describe the maximum). 

**Hint**: you may want to use re.search or re.findall 

- For part E, you can use `re.search` and `re.findall`, or anything that works 😳.

In [10]:
press_pattern = r'(?<=\.|\s)(.*?((?:\d+ (?:year|years)).*?))(?=\.)'
press_pattern2 = r'((?:[^.]{0,50})(?:\d+ (?:year|years).*?\.)(?:.*?\.[^.]{0,50}))'
press_pattern3 = r'((?:.{0,50})(?:year|years)(?:.{0,50}))'


press_search = [re.findall(press_pattern3, spacy_pharma.text, re.IGNORECASE)]
press_search

[['Americans died of synthetic opioid overdoses last year, and millions are addicted to opioids. And yet so',
  'each provide for a sentence of no greater than 20 years in prison, three years of supervised release and',
  'aw provide for a sentence of no greater than five years in prison, three years of supervised release and']]

Based on these other mentions of sentence and probation lengths, the CEO may be facing up to 20 years in prison with three years of probation if convicted after the indictment.

## 1.3 sentiment analysis  (10 points)

A. Subset the press releases to those labeled with one of three topics via `topics_clean`: Civil Rights, Hate Crimes, and Project Safe Childhood. We'll call this `doj_subset` going forward and it should have 717 rows.



In [11]:
#A 

doj_subset = doj[(doj['topics_clean'] == 'Civil Rights') | 
                 (doj['topics_clean'] == 'Hate Crimes') | 
                 (doj['topics_clean'] == 'Project Safe Childhood')]

#Confirming it has 717 rows
doj_subset.shape


(717, 6)

B. Write a function that takes one press release string as an input and:

- Removes named entities from each press release string (**Hint**: you may want to use `re.sub` with an or condition)
- Scores the sentiment of the entire press release using the `SentimentIntensityAnalyzer` and `polarity_scores`
- Returns the length-four (negative, positive, neutral, compound) sentiment dictionary (any order is fine)

Apply that function to each of the press releases in `doj_subset`. 

**Hints**: 

- A function + list comprehension to execute will takes about 30 seconds on a respectable local machine and about 2 mins on jhub; if it's taking a very long time, you may want to check your code for inefficiencies. If you can't fix those, for partial credit on this part/full credit on remainder, you can take a small random sample of the 717


In [14]:
def get_sentiment(press): 
    # Check if press is already a string
    if not isinstance(press, str):
        press = press.to_string()
    
    # create spaCy doc object from string 
    press_doc = nlp(press)
    
    # initialize sentiment analyzer
    sent_obj = SentimentIntensityAnalyzer()
    
    # create variable to store modified press
    mod_press = press
    
    # Create regex pattern for all named entities
    pattern = '|'.join([re.escape(entity.text) for entity in press_doc.ents])
    
    # Remove named entities from the press text
    mod_press = re.sub(pattern, '', mod_press)
    
    # Get sentiment score for modified_press 
    sentiment = sent_obj.polarity_scores(mod_press)
    
    return sentiment


In [13]:
#Apply to each press release in doj_subset 

for press_release in doj_subset['contents']: 
    get_sentiment(press_release)

{'neg': 0.2, 'neu': 0.751, 'pos': 0.049, 'compound': -0.9931}

{'neg': 0.134, 'neu': 0.797, 'pos': 0.069, 'compound': -0.9325}

{'neg': 0.092, 'neu': 0.832, 'pos': 0.076, 'compound': -0.7579}

{'neg': 0.127, 'neu': 0.788, 'pos': 0.085, 'compound': -0.9037}

{'neg': 0.179, 'neu': 0.777, 'pos': 0.044, 'compound': -0.9864}

{'neg': 0.148, 'neu': 0.799, 'pos': 0.053, 'compound': -0.987}

{'neg': 0.155, 'neu': 0.766, 'pos': 0.079, 'compound': -0.9559}

{'neg': 0.093, 'neu': 0.841, 'pos': 0.066, 'compound': -0.7783}

{'neg': 0.107, 'neu': 0.832, 'pos': 0.061, 'compound': -0.9136}

{'neg': 0.167, 'neu': 0.776, 'pos': 0.056, 'compound': -0.9801}

{'neg': 0.216, 'neu': 0.748, 'pos': 0.036, 'compound': -0.9973}

{'neg': 0.095, 'neu': 0.841, 'pos': 0.064, 'compound': -0.8519}

{'neg': 0.082, 'neu': 0.853, 'pos': 0.065, 'compound': -0.6486}

{'neg': 0.334, 'neu': 0.633, 'pos': 0.033, 'compound': -0.9951}

{'neg': 0.178, 'neu': 0.753, 'pos': 0.068, 'compound': -0.9889}

{'neg': 0.125, 'neu': 0.803, 'pos': 0.071, 'compound': -0.9643}

{'neg': 0.152, 'neu': 0.756, 'pos': 0.092, 'compound': -0.9896}

{'neg': 0.247, 'neu': 0.699, 'pos': 0.053, 'compound': -0.9985}

{'neg': 0.152, 'neu': 0.759, 'pos': 0.089, 'compound': -0.9799}

{'neg': 0.138, 'neu': 0.792, 'pos': 0.07, 'compound': -0.9779}

{'neg': 0.233, 'neu': 0.698, 'pos': 0.069, 'compound': -0.9971}

{'neg': 0.158, 'neu': 0.825, 'pos': 0.017, 'compound': -0.9884}

{'neg': 0.294, 'neu': 0.676, 'pos': 0.031, 'compound': -0.9986}

{'neg': 0.211, 'neu': 0.733, 'pos': 0.057, 'compound': -0.9948}

{'neg': 0.198, 'neu': 0.728, 'pos': 0.075, 'compound': -0.9884}

{'neg': 0.141, 'neu': 0.802, 'pos': 0.057, 'compound': -0.9665}

{'neg': 0.153, 'neu': 0.791, 'pos': 0.056, 'compound': -0.9858}

{'neg': 0.239, 'neu': 0.732, 'pos': 0.029, 'compound': -0.9944}

{'neg': 0.189, 'neu': 0.763, 'pos': 0.049, 'compound': -0.9716}

{'neg': 0.212, 'neu': 0.733, 'pos': 0.054, 'compound': -0.9983}

{'neg': 0.143, 'neu': 0.81, 'pos': 0.047, 'compound': -0.9666}

{'neg': 0.106, 'neu': 0.77, 'pos': 0.124, 'compound': 0.4754}

{'neg': 0.078, 'neu': 0.845, 'pos': 0.077, 'compound': -0.0772}

{'neg': 0.106, 'neu': 0.757, 'pos': 0.137, 'compound': 0.6486}

{'neg': 0.301, 'neu': 0.675, 'pos': 0.025, 'compound': -0.9983}

{'neg': 0.071, 'neu': 0.792, 'pos': 0.136, 'compound': 0.997}

{'neg': 0.134, 'neu': 0.748, 'pos': 0.118, 'compound': -0.9416}

{'neg': 0.207, 'neu': 0.677, 'pos': 0.116, 'compound': -0.9739}

{'neg': 0.159, 'neu': 0.725, 'pos': 0.117, 'compound': -0.9936}

{'neg': 0.271, 'neu': 0.643, 'pos': 0.086, 'compound': -0.9169}

{'neg': 0.034, 'neu': 0.851, 'pos': 0.114, 'compound': 0.9217}

{'neg': 0.156, 'neu': 0.798, 'pos': 0.046, 'compound': -0.975}

{'neg': 0.122, 'neu': 0.816, 'pos': 0.063, 'compound': -0.9371}

{'neg': 0.119, 'neu': 0.829, 'pos': 0.053, 'compound': -0.9778}

{'neg': 0.085, 'neu': 0.841, 'pos': 0.074, 'compound': 0.0433}

{'neg': 0.068, 'neu': 0.871, 'pos': 0.061, 'compound': -0.4767}

{'neg': 0.196, 'neu': 0.761, 'pos': 0.043, 'compound': -0.9913}

{'neg': 0.138, 'neu': 0.799, 'pos': 0.063, 'compound': -0.9524}

{'neg': 0.131, 'neu': 0.793, 'pos': 0.076, 'compound': -0.9517}

{'neg': 0.11, 'neu': 0.812, 'pos': 0.078, 'compound': -0.8904}

{'neg': 0.194, 'neu': 0.722, 'pos': 0.084, 'compound': -0.9906}

{'neg': 0.11, 'neu': 0.812, 'pos': 0.079, 'compound': -0.9477}

{'neg': 0.205, 'neu': 0.721, 'pos': 0.073, 'compound': -0.993}

{'neg': 0.098, 'neu': 0.787, 'pos': 0.116, 'compound': 0.5267}

{'neg': 0.097, 'neu': 0.816, 'pos': 0.088, 'compound': -0.4939}

{'neg': 0.126, 'neu': 0.762, 'pos': 0.111, 'compound': -0.6682}

{'neg': 0.128, 'neu': 0.8, 'pos': 0.071, 'compound': -0.9001}

{'neg': 0.178, 'neu': 0.793, 'pos': 0.029, 'compound': -0.9924}

{'neg': 0.195, 'neu': 0.778, 'pos': 0.027, 'compound': -0.9982}

{'neg': 0.142, 'neu': 0.777, 'pos': 0.08, 'compound': -0.9659}

{'neg': 0.206, 'neu': 0.766, 'pos': 0.028, 'compound': -0.9973}

{'neg': 0.101, 'neu': 0.802, 'pos': 0.097, 'compound': -0.2263}

{'neg': 0.175, 'neu': 0.754, 'pos': 0.071, 'compound': -0.9846}

{'neg': 0.149, 'neu': 0.791, 'pos': 0.06, 'compound': -0.9702}

{'neg': 0.193, 'neu': 0.75, 'pos': 0.056, 'compound': -0.9776}

{'neg': 0.12, 'neu': 0.815, 'pos': 0.065, 'compound': -0.9874}

{'neg': 0.138, 'neu': 0.799, 'pos': 0.063, 'compound': -0.9851}

{'neg': 0.154, 'neu': 0.81, 'pos': 0.037, 'compound': -0.9729}

{'neg': 0.119, 'neu': 0.765, 'pos': 0.117, 'compound': -0.1045}

{'neg': 0.112, 'neu': 0.799, 'pos': 0.089, 'compound': -0.9655}

{'neg': 0.21, 'neu': 0.755, 'pos': 0.036, 'compound': -0.9859}

{'neg': 0.124, 'neu': 0.813, 'pos': 0.063, 'compound': -0.93}

{'neg': 0.0, 'neu': 0.834, 'pos': 0.166, 'compound': 0.9909}

{'neg': 0.126, 'neu': 0.832, 'pos': 0.042, 'compound': -0.9423}

{'neg': 0.098, 'neu': 0.833, 'pos': 0.069, 'compound': -0.9466}

{'neg': 0.097, 'neu': 0.842, 'pos': 0.061, 'compound': -0.8215}

{'neg': 0.112, 'neu': 0.82, 'pos': 0.068, 'compound': -0.91}

{'neg': 0.067, 'neu': 0.83, 'pos': 0.103, 'compound': 0.836}

{'neg': 0.084, 'neu': 0.888, 'pos': 0.028, 'compound': -0.9709}

{'neg': 0.081, 'neu': 0.824, 'pos': 0.095, 'compound': 0.5574}

{'neg': 0.14, 'neu': 0.789, 'pos': 0.072, 'compound': -0.9565}

{'neg': 0.162, 'neu': 0.767, 'pos': 0.072, 'compound': -0.9371}

{'neg': 0.054, 'neu': 0.824, 'pos': 0.122, 'compound': 0.9696}

{'neg': 0.084, 'neu': 0.831, 'pos': 0.085, 'compound': 0.0439}

{'neg': 0.038, 'neu': 0.863, 'pos': 0.099, 'compound': 0.9678}

{'neg': 0.075, 'neu': 0.842, 'pos': 0.083, 'compound': 0.5994}

{'neg': 0.119, 'neu': 0.744, 'pos': 0.138, 'compound': 0.7351}

{'neg': 0.07, 'neu': 0.85, 'pos': 0.08, 'compound': 0.3897}

{'neg': 0.129, 'neu': 0.778, 'pos': 0.093, 'compound': -0.978}

{'neg': 0.011, 'neu': 0.811, 'pos': 0.178, 'compound': 0.996}

{'neg': 0.024, 'neu': 0.861, 'pos': 0.115, 'compound': 0.9948}

{'neg': 0.024, 'neu': 0.865, 'pos': 0.111, 'compound': 0.987}

{'neg': 0.174, 'neu': 0.747, 'pos': 0.079, 'compound': -0.9828}

{'neg': 0.206, 'neu': 0.728, 'pos': 0.067, 'compound': -0.9928}

{'neg': 0.186, 'neu': 0.744, 'pos': 0.07, 'compound': -0.9806}

{'neg': 0.155, 'neu': 0.775, 'pos': 0.069, 'compound': -0.9846}

{'neg': 0.171, 'neu': 0.757, 'pos': 0.072, 'compound': -0.9829}

{'neg': 0.177, 'neu': 0.743, 'pos': 0.08, 'compound': -0.9755}

{'neg': 0.083, 'neu': 0.823, 'pos': 0.095, 'compound': 0.5983}

{'neg': 0.127, 'neu': 0.803, 'pos': 0.07, 'compound': -0.9928}

{'neg': 0.095, 'neu': 0.808, 'pos': 0.097, 'compound': -0.6652}

{'neg': 0.064, 'neu': 0.802, 'pos': 0.134, 'compound': 0.9933}

{'neg': 0.019, 'neu': 0.928, 'pos': 0.053, 'compound': 0.871}

{'neg': 0.088, 'neu': 0.789, 'pos': 0.123, 'compound': 0.9661}

{'neg': 0.15, 'neu': 0.792, 'pos': 0.058, 'compound': -0.9919}

{'neg': 0.163, 'neu': 0.758, 'pos': 0.079, 'compound': -0.9931}

{'neg': 0.124, 'neu': 0.784, 'pos': 0.092, 'compound': -0.9878}

{'neg': 0.127, 'neu': 0.818, 'pos': 0.055, 'compound': -0.9996}

{'neg': 0.122, 'neu': 0.795, 'pos': 0.083, 'compound': -0.9834}

{'neg': 0.183, 'neu': 0.781, 'pos': 0.035, 'compound': -0.9962}

{'neg': 0.091, 'neu': 0.864, 'pos': 0.045, 'compound': -0.9977}

{'neg': 0.191, 'neu': 0.711, 'pos': 0.098, 'compound': -0.9964}

{'neg': 0.147, 'neu': 0.806, 'pos': 0.047, 'compound': -0.9817}

{'neg': 0.158, 'neu': 0.809, 'pos': 0.033, 'compound': -0.9853}

{'neg': 0.111, 'neu': 0.795, 'pos': 0.094, 'compound': -0.8944}

{'neg': 0.096, 'neu': 0.792, 'pos': 0.112, 'compound': 0.1868}

{'neg': 0.201, 'neu': 0.742, 'pos': 0.057, 'compound': -0.9872}

{'neg': 0.139, 'neu': 0.774, 'pos': 0.088, 'compound': -0.9738}

{'neg': 0.272, 'neu': 0.638, 'pos': 0.09, 'compound': -0.9974}

{'neg': 0.202, 'neu': 0.727, 'pos': 0.071, 'compound': -0.9939}

{'neg': 0.118, 'neu': 0.818, 'pos': 0.064, 'compound': -0.9656}

{'neg': 0.123, 'neu': 0.762, 'pos': 0.115, 'compound': -0.7351}

{'neg': 0.103, 'neu': 0.814, 'pos': 0.083, 'compound': -0.6808}

{'neg': 0.128, 'neu': 0.777, 'pos': 0.094, 'compound': -0.9919}

{'neg': 0.202, 'neu': 0.705, 'pos': 0.092, 'compound': -0.9891}

{'neg': 0.166, 'neu': 0.787, 'pos': 0.047, 'compound': -0.9929}

{'neg': 0.155, 'neu': 0.798, 'pos': 0.047, 'compound': -0.9816}

{'neg': 0.092, 'neu': 0.833, 'pos': 0.075, 'compound': -0.6249}

{'neg': 0.161, 'neu': 0.779, 'pos': 0.06, 'compound': -0.9956}

{'neg': 0.161, 'neu': 0.809, 'pos': 0.031, 'compound': -0.9974}

{'neg': 0.129, 'neu': 0.805, 'pos': 0.066, 'compound': -0.9579}

{'neg': 0.152, 'neu': 0.745, 'pos': 0.102, 'compound': -0.9431}

{'neg': 0.126, 'neu': 0.768, 'pos': 0.106, 'compound': -0.9305}

{'neg': 0.146, 'neu': 0.762, 'pos': 0.092, 'compound': -0.9543}

{'neg': 0.105, 'neu': 0.831, 'pos': 0.064, 'compound': -0.9081}

{'neg': 0.148, 'neu': 0.77, 'pos': 0.082, 'compound': -0.9524}

{'neg': 0.246, 'neu': 0.722, 'pos': 0.032, 'compound': -0.9889}

{'neg': 0.152, 'neu': 0.798, 'pos': 0.05, 'compound': -0.9797}

{'neg': 0.166, 'neu': 0.758, 'pos': 0.076, 'compound': -0.9859}

{'neg': 0.136, 'neu': 0.753, 'pos': 0.112, 'compound': -0.8248}

{'neg': 0.148, 'neu': 0.752, 'pos': 0.1, 'compound': -0.9267}

{'neg': 0.194, 'neu': 0.731, 'pos': 0.075, 'compound': -0.9627}

{'neg': 0.15, 'neu': 0.794, 'pos': 0.056, 'compound': -0.9928}

{'neg': 0.196, 'neu': 0.771, 'pos': 0.033, 'compound': -0.9852}

{'neg': 0.185, 'neu': 0.777, 'pos': 0.037, 'compound': -0.998}

{'neg': 0.118, 'neu': 0.794, 'pos': 0.088, 'compound': -0.8462}

{'neg': 0.089, 'neu': 0.865, 'pos': 0.045, 'compound': -0.9712}

{'neg': 0.093, 'neu': 0.817, 'pos': 0.09, 'compound': -0.4404}

{'neg': 0.128, 'neu': 0.816, 'pos': 0.056, 'compound': -0.9772}

{'neg': 0.111, 'neu': 0.816, 'pos': 0.073, 'compound': -0.875}

{'neg': 0.105, 'neu': 0.816, 'pos': 0.079, 'compound': -0.8591}

{'neg': 0.196, 'neu': 0.738, 'pos': 0.067, 'compound': -0.9955}

{'neg': 0.119, 'neu': 0.834, 'pos': 0.047, 'compound': -0.9681}

{'neg': 0.123, 'neu': 0.791, 'pos': 0.086, 'compound': -0.8316}

{'neg': 0.146, 'neu': 0.812, 'pos': 0.042, 'compound': -0.9885}

{'neg': 0.135, 'neu': 0.781, 'pos': 0.084, 'compound': -0.9328}

{'neg': 0.123, 'neu': 0.823, 'pos': 0.053, 'compound': -0.984}

{'neg': 0.149, 'neu': 0.776, 'pos': 0.075, 'compound': -0.9887}

{'neg': 0.208, 'neu': 0.754, 'pos': 0.038, 'compound': -0.9956}

{'neg': 0.109, 'neu': 0.871, 'pos': 0.02, 'compound': -0.9833}

{'neg': 0.125, 'neu': 0.81, 'pos': 0.065, 'compound': -0.9081}

{'neg': 0.107, 'neu': 0.817, 'pos': 0.075, 'compound': -0.882}

{'neg': 0.162, 'neu': 0.742, 'pos': 0.097, 'compound': -0.9873}

{'neg': 0.118, 'neu': 0.824, 'pos': 0.058, 'compound': -0.9637}

{'neg': 0.091, 'neu': 0.835, 'pos': 0.073, 'compound': -0.5405}

{'neg': 0.132, 'neu': 0.739, 'pos': 0.128, 'compound': 0.3291}

{'neg': 0.113, 'neu': 0.759, 'pos': 0.128, 'compound': 0.6486}

{'neg': 0.138, 'neu': 0.787, 'pos': 0.074, 'compound': -0.9477}

{'neg': 0.102, 'neu': 0.81, 'pos': 0.087, 'compound': -0.8126}

{'neg': 0.174, 'neu': 0.785, 'pos': 0.041, 'compound': -0.9873}

{'neg': 0.115, 'neu': 0.844, 'pos': 0.041, 'compound': -0.9477}

{'neg': 0.19, 'neu': 0.719, 'pos': 0.091, 'compound': -0.9958}

{'neg': 0.149, 'neu': 0.797, 'pos': 0.054, 'compound': -0.9938}

{'neg': 0.101, 'neu': 0.841, 'pos': 0.059, 'compound': -0.9304}

{'neg': 0.192, 'neu': 0.761, 'pos': 0.047, 'compound': -0.975}

{'neg': 0.184, 'neu': 0.768, 'pos': 0.048, 'compound': -0.9938}

{'neg': 0.129, 'neu': 0.766, 'pos': 0.105, 'compound': -0.5994}

{'neg': 0.113, 'neu': 0.809, 'pos': 0.078, 'compound': -0.8658}

{'neg': 0.19, 'neu': 0.723, 'pos': 0.087, 'compound': -0.9919}

{'neg': 0.091, 'neu': 0.843, 'pos': 0.067, 'compound': -0.9207}

{'neg': 0.164, 'neu': 0.757, 'pos': 0.079, 'compound': -0.9646}

{'neg': 0.091, 'neu': 0.765, 'pos': 0.143, 'compound': 0.9903}

{'neg': 0.164, 'neu': 0.792, 'pos': 0.044, 'compound': -0.9898}

{'neg': 0.126, 'neu': 0.827, 'pos': 0.047, 'compound': -0.9882}

{'neg': 0.201, 'neu': 0.756, 'pos': 0.043, 'compound': -0.9877}

{'neg': 0.171, 'neu': 0.788, 'pos': 0.042, 'compound': -0.9958}

{'neg': 0.148, 'neu': 0.793, 'pos': 0.059, 'compound': -0.9965}

{'neg': 0.183, 'neu': 0.762, 'pos': 0.055, 'compound': -0.9839}

{'neg': 0.113, 'neu': 0.817, 'pos': 0.07, 'compound': -0.9777}

{'neg': 0.117, 'neu': 0.811, 'pos': 0.072, 'compound': -0.9849}

{'neg': 0.149, 'neu': 0.777, 'pos': 0.074, 'compound': -0.9956}

{'neg': 0.213, 'neu': 0.721, 'pos': 0.066, 'compound': -0.9988}

{'neg': 0.171, 'neu': 0.764, 'pos': 0.066, 'compound': -0.9972}

{'neg': 0.157, 'neu': 0.806, 'pos': 0.037, 'compound': -0.9913}

{'neg': 0.129, 'neu': 0.77, 'pos': 0.101, 'compound': -0.9225}

{'neg': 0.14, 'neu': 0.809, 'pos': 0.051, 'compound': -0.9844}

{'neg': 0.084, 'neu': 0.807, 'pos': 0.108, 'compound': 0.4588}

{'neg': 0.085, 'neu': 0.819, 'pos': 0.096, 'compound': 0.128}

{'neg': 0.216, 'neu': 0.693, 'pos': 0.091, 'compound': -0.9913}

{'neg': 0.111, 'neu': 0.846, 'pos': 0.043, 'compound': -0.9201}

{'neg': 0.115, 'neu': 0.819, 'pos': 0.067, 'compound': -0.9403}

{'neg': 0.093, 'neu': 0.821, 'pos': 0.085, 'compound': -0.5994}

{'neg': 0.106, 'neu': 0.851, 'pos': 0.044, 'compound': -0.9552}

{'neg': 0.093, 'neu': 0.824, 'pos': 0.083, 'compound': -0.802}

{'neg': 0.144, 'neu': 0.787, 'pos': 0.068, 'compound': -0.9732}

{'neg': 0.159, 'neu': 0.73, 'pos': 0.111, 'compound': -0.9509}

{'neg': 0.197, 'neu': 0.756, 'pos': 0.047, 'compound': -0.9959}

{'neg': 0.158, 'neu': 0.745, 'pos': 0.097, 'compound': -0.9746}

{'neg': 0.184, 'neu': 0.751, 'pos': 0.065, 'compound': -0.9976}

{'neg': 0.157, 'neu': 0.764, 'pos': 0.079, 'compound': -0.9905}

{'neg': 0.193, 'neu': 0.735, 'pos': 0.072, 'compound': -0.9981}

{'neg': 0.216, 'neu': 0.73, 'pos': 0.054, 'compound': -0.9885}

{'neg': 0.214, 'neu': 0.681, 'pos': 0.105, 'compound': -0.9958}

{'neg': 0.231, 'neu': 0.721, 'pos': 0.048, 'compound': -0.9973}

{'neg': 0.259, 'neu': 0.697, 'pos': 0.045, 'compound': -0.998}

{'neg': 0.151, 'neu': 0.772, 'pos': 0.077, 'compound': -0.9853}

{'neg': 0.126, 'neu': 0.786, 'pos': 0.088, 'compound': -0.9434}

{'neg': 0.095, 'neu': 0.831, 'pos': 0.075, 'compound': -0.8481}

{'neg': 0.097, 'neu': 0.814, 'pos': 0.089, 'compound': -0.296}

{'neg': 0.189, 'neu': 0.806, 'pos': 0.005, 'compound': -0.9955}

{'neg': 0.084, 'neu': 0.788, 'pos': 0.128, 'compound': 0.9438}

{'neg': 0.24, 'neu': 0.723, 'pos': 0.037, 'compound': -0.9961}

{'neg': 0.218, 'neu': 0.753, 'pos': 0.029, 'compound': -0.9905}

{'neg': 0.224, 'neu': 0.689, 'pos': 0.087, 'compound': -0.9906}

{'neg': 0.21, 'neu': 0.715, 'pos': 0.075, 'compound': -0.9871}

{'neg': 0.19, 'neu': 0.782, 'pos': 0.028, 'compound': -0.9921}

{'neg': 0.184, 'neu': 0.788, 'pos': 0.028, 'compound': -0.9906}

{'neg': 0.224, 'neu': 0.73, 'pos': 0.046, 'compound': -0.9917}

{'neg': 0.125, 'neu': 0.809, 'pos': 0.065, 'compound': -0.9432}

{'neg': 0.156, 'neu': 0.782, 'pos': 0.061, 'compound': -0.9869}

{'neg': 0.093, 'neu': 0.831, 'pos': 0.076, 'compound': -0.5859}

{'neg': 0.188, 'neu': 0.766, 'pos': 0.047, 'compound': -0.9623}

{'neg': 0.222, 'neu': 0.748, 'pos': 0.03, 'compound': -0.9938}

{'neg': 0.15, 'neu': 0.723, 'pos': 0.127, 'compound': -0.9392}

{'neg': 0.114, 'neu': 0.789, 'pos': 0.097, 'compound': -0.791}

{'neg': 0.139, 'neu': 0.742, 'pos': 0.12, 'compound': -0.6512}

{'neg': 0.125, 'neu': 0.806, 'pos': 0.07, 'compound': -0.9001}

{'neg': 0.216, 'neu': 0.741, 'pos': 0.043, 'compound': -0.9928}

{'neg': 0.115, 'neu': 0.8, 'pos': 0.086, 'compound': -0.9705}

{'neg': 0.153, 'neu': 0.767, 'pos': 0.08, 'compound': -0.9349}

{'neg': 0.193, 'neu': 0.728, 'pos': 0.08, 'compound': -0.9957}

{'neg': 0.165, 'neu': 0.768, 'pos': 0.067, 'compound': -0.9977}

{'neg': 0.183, 'neu': 0.76, 'pos': 0.058, 'compound': -0.9932}

{'neg': 0.167, 'neu': 0.766, 'pos': 0.067, 'compound': -0.9833}

{'neg': 0.183, 'neu': 0.725, 'pos': 0.092, 'compound': -0.996}

{'neg': 0.037, 'neu': 0.835, 'pos': 0.128, 'compound': 0.9934}

{'neg': 0.07, 'neu': 0.769, 'pos': 0.162, 'compound': 0.989}

{'neg': 0.042, 'neu': 0.845, 'pos': 0.113, 'compound': 0.9842}

{'neg': 0.051, 'neu': 0.778, 'pos': 0.171, 'compound': 0.9926}

{'neg': 0.069, 'neu': 0.827, 'pos': 0.105, 'compound': 0.9414}

{'neg': 0.06, 'neu': 0.875, 'pos': 0.065, 'compound': 0.2023}

{'neg': 0.087, 'neu': 0.787, 'pos': 0.125, 'compound': 0.9914}

{'neg': 0.053, 'neu': 0.832, 'pos': 0.115, 'compound': 0.9963}

{'neg': 0.072, 'neu': 0.807, 'pos': 0.12, 'compound': 0.9651}

{'neg': 0.019, 'neu': 0.861, 'pos': 0.119, 'compound': 0.988}

{'neg': 0.062, 'neu': 0.826, 'pos': 0.112, 'compound': 0.9559}

{'neg': 0.054, 'neu': 0.834, 'pos': 0.113, 'compound': 0.9712}

{'neg': 0.031, 'neu': 0.89, 'pos': 0.079, 'compound': 0.9592}

{'neg': 0.057, 'neu': 0.837, 'pos': 0.106, 'compound': 0.9899}

{'neg': 0.05, 'neu': 0.82, 'pos': 0.131, 'compound': 0.9978}

{'neg': 0.065, 'neu': 0.857, 'pos': 0.078, 'compound': 0.1779}

{'neg': 0.035, 'neu': 0.794, 'pos': 0.17, 'compound': 0.997}

{'neg': 0.099, 'neu': 0.811, 'pos': 0.09, 'compound': -0.6976}

{'neg': 0.022, 'neu': 0.889, 'pos': 0.089, 'compound': 0.9932}

{'neg': 0.013, 'neu': 0.788, 'pos': 0.199, 'compound': 0.9944}

{'neg': 0.111, 'neu': 0.753, 'pos': 0.135, 'compound': 0.4243}

{'neg': 0.069, 'neu': 0.88, 'pos': 0.051, 'compound': -0.7096}

{'neg': 0.057, 'neu': 0.811, 'pos': 0.132, 'compound': 0.9905}

{'neg': 0.091, 'neu': 0.826, 'pos': 0.083, 'compound': -0.4854}

{'neg': 0.007, 'neu': 0.8, 'pos': 0.194, 'compound': 0.994}

{'neg': 0.103, 'neu': 0.706, 'pos': 0.191, 'compound': 0.9864}

{'neg': 0.107, 'neu': 0.818, 'pos': 0.075, 'compound': -0.8986}

{'neg': 0.142, 'neu': 0.776, 'pos': 0.082, 'compound': -0.9898}

{'neg': 0.078, 'neu': 0.851, 'pos': 0.071, 'compound': -0.7705}

{'neg': 0.092, 'neu': 0.834, 'pos': 0.074, 'compound': -0.9223}

{'neg': 0.16, 'neu': 0.762, 'pos': 0.078, 'compound': -0.9959}

{'neg': 0.079, 'neu': 0.857, 'pos': 0.064, 'compound': -0.5831}

{'neg': 0.084, 'neu': 0.869, 'pos': 0.048, 'compound': -0.8151}

{'neg': 0.087, 'neu': 0.85, 'pos': 0.063, 'compound': -0.7906}

{'neg': 0.099, 'neu': 0.786, 'pos': 0.115, 'compound': 0.7126}

{'neg': 0.085, 'neu': 0.824, 'pos': 0.091, 'compound': 0.5385}

{'neg': 0.089, 'neu': 0.81, 'pos': 0.101, 'compound': 0.4207}

{'neg': 0.074, 'neu': 0.879, 'pos': 0.048, 'compound': -0.6705}

{'neg': 0.068, 'neu': 0.901, 'pos': 0.032, 'compound': -0.9468}

{'neg': 0.14, 'neu': 0.755, 'pos': 0.105, 'compound': -0.9738}

{'neg': 0.109, 'neu': 0.828, 'pos': 0.064, 'compound': -0.9698}

{'neg': 0.154, 'neu': 0.802, 'pos': 0.044, 'compound': -0.9886}

{'neg': 0.173, 'neu': 0.765, 'pos': 0.062, 'compound': -0.9949}

{'neg': 0.151, 'neu': 0.8, 'pos': 0.049, 'compound': -0.9925}

{'neg': 0.182, 'neu': 0.751, 'pos': 0.068, 'compound': -0.9887}

{'neg': 0.047, 'neu': 0.882, 'pos': 0.071, 'compound': 0.8246}

{'neg': 0.099, 'neu': 0.816, 'pos': 0.085, 'compound': -0.5413}

{'neg': 0.075, 'neu': 0.856, 'pos': 0.069, 'compound': -0.296}

{'neg': 0.079, 'neu': 0.779, 'pos': 0.142, 'compound': 0.9528}

{'neg': 0.08, 'neu': 0.827, 'pos': 0.093, 'compound': 0.7467}

{'neg': 0.06, 'neu': 0.85, 'pos': 0.09, 'compound': 0.8689}

{'neg': 0.103, 'neu': 0.784, 'pos': 0.113, 'compound': 0.5979}

{'neg': 0.099, 'neu': 0.785, 'pos': 0.116, 'compound': 0.7891}

{'neg': 0.044, 'neu': 0.892, 'pos': 0.064, 'compound': 0.8769}

{'neg': 0.078, 'neu': 0.805, 'pos': 0.116, 'compound': 0.9411}

{'neg': 0.098, 'neu': 0.835, 'pos': 0.067, 'compound': -0.8902}

{'neg': 0.037, 'neu': 0.89, 'pos': 0.073, 'compound': 0.81}

{'neg': 0.033, 'neu': 0.913, 'pos': 0.054, 'compound': 0.8521}

{'neg': 0.097, 'neu': 0.849, 'pos': 0.054, 'compound': -0.9869}

{'neg': 0.082, 'neu': 0.846, 'pos': 0.072, 'compound': -0.7684}

{'neg': 0.065, 'neu': 0.883, 'pos': 0.053, 'compound': -0.5267}

{'neg': 0.084, 'neu': 0.853, 'pos': 0.063, 'compound': -0.8834}

{'neg': 0.12, 'neu': 0.792, 'pos': 0.088, 'compound': -0.9279}

{'neg': 0.094, 'neu': 0.753, 'pos': 0.153, 'compound': 0.951}

{'neg': 0.04, 'neu': 0.873, 'pos': 0.088, 'compound': 0.9617}

{'neg': 0.034, 'neu': 0.866, 'pos': 0.1, 'compound': 0.9786}

{'neg': 0.084, 'neu': 0.8, 'pos': 0.115, 'compound': 0.9727}

{'neg': 0.049, 'neu': 0.787, 'pos': 0.164, 'compound': 0.9968}

{'neg': 0.023, 'neu': 0.755, 'pos': 0.222, 'compound': 0.9966}

{'neg': 0.054, 'neu': 0.777, 'pos': 0.169, 'compound': 0.994}

{'neg': 0.067, 'neu': 0.832, 'pos': 0.101, 'compound': 0.8666}

{'neg': 0.023, 'neu': 0.778, 'pos': 0.199, 'compound': 0.995}

{'neg': 0.005, 'neu': 0.898, 'pos': 0.096, 'compound': 0.986}

{'neg': 0.012, 'neu': 0.805, 'pos': 0.184, 'compound': 0.9968}

{'neg': 0.006, 'neu': 0.798, 'pos': 0.196, 'compound': 0.9969}

{'neg': 0.063, 'neu': 0.828, 'pos': 0.109, 'compound': 0.9468}

{'neg': 0.055, 'neu': 0.858, 'pos': 0.087, 'compound': 0.9241}

{'neg': 0.008, 'neu': 0.833, 'pos': 0.158, 'compound': 0.9893}

{'neg': 0.028, 'neu': 0.829, 'pos': 0.143, 'compound': 0.9867}

{'neg': 0.0, 'neu': 0.84, 'pos': 0.16, 'compound': 0.9794}

{'neg': 0.014, 'neu': 0.826, 'pos': 0.16, 'compound': 0.9883}

{'neg': 0.047, 'neu': 0.844, 'pos': 0.109, 'compound': 0.9831}

{'neg': 0.112, 'neu': 0.717, 'pos': 0.171, 'compound': 0.9548}

{'neg': 0.065, 'neu': 0.752, 'pos': 0.182, 'compound': 0.9924}

{'neg': 0.04, 'neu': 0.794, 'pos': 0.166, 'compound': 0.9977}

{'neg': 0.037, 'neu': 0.84, 'pos': 0.124, 'compound': 0.9963}

{'neg': 0.026, 'neu': 0.871, 'pos': 0.104, 'compound': 0.9943}

{'neg': 0.122, 'neu': 0.701, 'pos': 0.177, 'compound': 0.8176}

{'neg': 0.02, 'neu': 0.811, 'pos': 0.168, 'compound': 0.9881}

{'neg': 0.079, 'neu': 0.791, 'pos': 0.13, 'compound': 0.9837}

{'neg': 0.068, 'neu': 0.773, 'pos': 0.159, 'compound': 0.9941}

{'neg': 0.023, 'neu': 0.798, 'pos': 0.179, 'compound': 0.9935}

{'neg': 0.038, 'neu': 0.86, 'pos': 0.102, 'compound': 0.9552}

{'neg': 0.095, 'neu': 0.807, 'pos': 0.098, 'compound': -0.9391}

{'neg': 0.077, 'neu': 0.808, 'pos': 0.115, 'compound': 0.9631}

{'neg': 0.064, 'neu': 0.763, 'pos': 0.173, 'compound': 0.9932}

{'neg': 0.006, 'neu': 0.834, 'pos': 0.16, 'compound': 0.9943}

{'neg': 0.056, 'neu': 0.808, 'pos': 0.136, 'compound': 0.9921}

{'neg': 0.136, 'neu': 0.659, 'pos': 0.205, 'compound': 0.9888}

{'neg': 0.063, 'neu': 0.828, 'pos': 0.109, 'compound': 0.9688}

{'neg': 0.129, 'neu': 0.746, 'pos': 0.125, 'compound': -0.891}

{'neg': 0.0, 'neu': 0.861, 'pos': 0.139, 'compound': 0.9766}

{'neg': 0.028, 'neu': 0.875, 'pos': 0.097, 'compound': 0.9735}

{'neg': 0.041, 'neu': 0.864, 'pos': 0.095, 'compound': 0.9427}

{'neg': 0.033, 'neu': 0.89, 'pos': 0.078, 'compound': 0.9118}

{'neg': 0.006, 'neu': 0.95, 'pos': 0.044, 'compound': 0.9348}

{'neg': 0.089, 'neu': 0.87, 'pos': 0.041, 'compound': -0.9371}

{'neg': 0.121, 'neu': 0.782, 'pos': 0.096, 'compound': -0.9317}

{'neg': 0.095, 'neu': 0.766, 'pos': 0.139, 'compound': 0.802}

{'neg': 0.063, 'neu': 0.862, 'pos': 0.075, 'compound': 0.5574}

{'neg': 0.062, 'neu': 0.833, 'pos': 0.105, 'compound': 0.9502}

{'neg': 0.035, 'neu': 0.848, 'pos': 0.116, 'compound': 0.9767}

{'neg': 0.041, 'neu': 0.9, 'pos': 0.059, 'compound': 0.6896}

{'neg': 0.084, 'neu': 0.86, 'pos': 0.056, 'compound': -0.9062}

{'neg': 0.048, 'neu': 0.811, 'pos': 0.141, 'compound': 0.9812}

{'neg': 0.062, 'neu': 0.854, 'pos': 0.085, 'compound': 0.0161}

{'neg': 0.057, 'neu': 0.807, 'pos': 0.136, 'compound': 0.9843}

{'neg': 0.037, 'neu': 0.884, 'pos': 0.079, 'compound': 0.9584}

{'neg': 0.082, 'neu': 0.828, 'pos': 0.09, 'compound': 0.4404}

{'neg': 0.054, 'neu': 0.82, 'pos': 0.126, 'compound': 0.9902}

{'neg': 0.013, 'neu': 0.83, 'pos': 0.157, 'compound': 0.9894}

{'neg': 0.095, 'neu': 0.838, 'pos': 0.067, 'compound': -0.9092}

{'neg': 0.108, 'neu': 0.797, 'pos': 0.095, 'compound': -0.4939}

{'neg': 0.092, 'neu': 0.856, 'pos': 0.052, 'compound': -0.9277}

{'neg': 0.054, 'neu': 0.874, 'pos': 0.071, 'compound': 0.8225}

{'neg': 0.052, 'neu': 0.888, 'pos': 0.06, 'compound': 0.7579}

{'neg': 0.04, 'neu': 0.81, 'pos': 0.15, 'compound': 0.9866}

{'neg': 0.041, 'neu': 0.902, 'pos': 0.057, 'compound': 0.7003}

{'neg': 0.059, 'neu': 0.893, 'pos': 0.049, 'compound': -0.6187}

{'neg': 0.049, 'neu': 0.884, 'pos': 0.067, 'compound': 0.6597}

{'neg': 0.066, 'neu': 0.863, 'pos': 0.071, 'compound': 0.34}

{'neg': 0.049, 'neu': 0.881, 'pos': 0.069, 'compound': 0.8567}

{'neg': 0.076, 'neu': 0.843, 'pos': 0.081, 'compound': 0.2445}

{'neg': 0.075, 'neu': 0.862, 'pos': 0.063, 'compound': -0.5267}

{'neg': 0.075, 'neu': 0.826, 'pos': 0.099, 'compound': 0.7391}

{'neg': 0.068, 'neu': 0.869, 'pos': 0.064, 'compound': -0.128}

{'neg': 0.138, 'neu': 0.779, 'pos': 0.082, 'compound': -0.9484}

{'neg': 0.015, 'neu': 0.794, 'pos': 0.191, 'compound': 0.997}

{'neg': 0.025, 'neu': 0.853, 'pos': 0.122, 'compound': 0.959}

{'neg': 0.063, 'neu': 0.847, 'pos': 0.091, 'compound': 0.9715}

{'neg': 0.049, 'neu': 0.845, 'pos': 0.106, 'compound': 0.9757}

{'neg': 0.057, 'neu': 0.861, 'pos': 0.082, 'compound': 0.8338}

{'neg': 0.092, 'neu': 0.81, 'pos': 0.098, 'compound': -0.0451}

{'neg': 0.101, 'neu': 0.843, 'pos': 0.056, 'compound': -0.9744}

{'neg': 0.098, 'neu': 0.803, 'pos': 0.099, 'compound': 0.3893}

{'neg': 0.159, 'neu': 0.801, 'pos': 0.039, 'compound': -0.9963}

{'neg': 0.095, 'neu': 0.842, 'pos': 0.062, 'compound': -0.9272}

{'neg': 0.0, 'neu': 0.829, 'pos': 0.171, 'compound': 0.9854}

{'neg': 0.079, 'neu': 0.912, 'pos': 0.009, 'compound': -0.9753}

{'neg': 0.055, 'neu': 0.86, 'pos': 0.085, 'compound': 0.9793}

{'neg': 0.129, 'neu': 0.809, 'pos': 0.062, 'compound': -0.9786}

{'neg': 0.084, 'neu': 0.877, 'pos': 0.039, 'compound': -0.9684}

{'neg': 0.179, 'neu': 0.696, 'pos': 0.126, 'compound': -0.9921}

{'neg': 0.1, 'neu': 0.843, 'pos': 0.057, 'compound': -0.929}

{'neg': 0.08, 'neu': 0.853, 'pos': 0.067, 'compound': -0.5106}

{'neg': 0.107, 'neu': 0.831, 'pos': 0.063, 'compound': -0.9768}

{'neg': 0.218, 'neu': 0.687, 'pos': 0.095, 'compound': -0.995}

{'neg': 0.065, 'neu': 0.833, 'pos': 0.103, 'compound': 0.9382}

{'neg': 0.115, 'neu': 0.757, 'pos': 0.128, 'compound': -0.2529}

{'neg': 0.15, 'neu': 0.746, 'pos': 0.104, 'compound': -0.8828}

{'neg': 0.121, 'neu': 0.806, 'pos': 0.073, 'compound': -0.8834}

{'neg': 0.096, 'neu': 0.817, 'pos': 0.087, 'compound': -0.5719}

{'neg': 0.185, 'neu': 0.735, 'pos': 0.08, 'compound': -0.9983}

{'neg': 0.158, 'neu': 0.789, 'pos': 0.052, 'compound': -0.9835}

{'neg': 0.185, 'neu': 0.765, 'pos': 0.051, 'compound': -0.9942}

{'neg': 0.11, 'neu': 0.794, 'pos': 0.096, 'compound': -0.813}

{'neg': 0.127, 'neu': 0.8, 'pos': 0.073, 'compound': -0.9771}

{'neg': 0.17, 'neu': 0.698, 'pos': 0.133, 'compound': -0.972}

{'neg': 0.067, 'neu': 0.869, 'pos': 0.064, 'compound': -0.6059}

{'neg': 0.074, 'neu': 0.825, 'pos': 0.102, 'compound': 0.9118}

{'neg': 0.181, 'neu': 0.755, 'pos': 0.064, 'compound': -0.9928}

{'neg': 0.19, 'neu': 0.745, 'pos': 0.066, 'compound': -0.9914}

{'neg': 0.095, 'neu': 0.864, 'pos': 0.041, 'compound': -0.9618}

{'neg': 0.16, 'neu': 0.804, 'pos': 0.036, 'compound': -0.9944}

{'neg': 0.234, 'neu': 0.698, 'pos': 0.068, 'compound': -0.998}

{'neg': 0.106, 'neu': 0.887, 'pos': 0.007, 'compound': -0.9565}

{'neg': 0.226, 'neu': 0.733, 'pos': 0.041, 'compound': -0.9865}

{'neg': 0.156, 'neu': 0.745, 'pos': 0.099, 'compound': -0.9674}

{'neg': 0.125, 'neu': 0.81, 'pos': 0.065, 'compound': -0.9618}

{'neg': 0.182, 'neu': 0.747, 'pos': 0.071, 'compound': -0.9887}

{'neg': 0.165, 'neu': 0.772, 'pos': 0.062, 'compound': -0.9852}

{'neg': 0.098, 'neu': 0.815, 'pos': 0.087, 'compound': -0.5994}

{'neg': 0.129, 'neu': 0.818, 'pos': 0.053, 'compound': -0.9878}

{'neg': 0.191, 'neu': 0.701, 'pos': 0.108, 'compound': -0.9758}

{'neg': 0.158, 'neu': 0.771, 'pos': 0.072, 'compound': -0.9779}

{'neg': 0.205, 'neu': 0.738, 'pos': 0.057, 'compound': -0.9803}

{'neg': 0.16, 'neu': 0.78, 'pos': 0.059, 'compound': -0.9933}

{'neg': 0.145, 'neu': 0.791, 'pos': 0.064, 'compound': -0.9652}

{'neg': 0.17, 'neu': 0.768, 'pos': 0.062, 'compound': -0.9877}

{'neg': 0.111, 'neu': 0.791, 'pos': 0.099, 'compound': -0.7096}

{'neg': 0.181, 'neu': 0.755, 'pos': 0.063, 'compound': -0.9946}

{'neg': 0.141, 'neu': 0.804, 'pos': 0.055, 'compound': -0.9875}

{'neg': 0.137, 'neu': 0.821, 'pos': 0.042, 'compound': -0.9817}

{'neg': 0.188, 'neu': 0.73, 'pos': 0.081, 'compound': -0.9867}

{'neg': 0.201, 'neu': 0.749, 'pos': 0.051, 'compound': -0.9709}

{'neg': 0.17, 'neu': 0.769, 'pos': 0.06, 'compound': -0.984}

{'neg': 0.157, 'neu': 0.761, 'pos': 0.082, 'compound': -0.994}

{'neg': 0.212, 'neu': 0.692, 'pos': 0.097, 'compound': -0.9982}

{'neg': 0.206, 'neu': 0.764, 'pos': 0.03, 'compound': -0.991}

{'neg': 0.1, 'neu': 0.832, 'pos': 0.069, 'compound': -0.836}

{'neg': 0.075, 'neu': 0.806, 'pos': 0.12, 'compound': 0.8885}

{'neg': 0.202, 'neu': 0.702, 'pos': 0.096, 'compound': -0.9871}

{'neg': 0.159, 'neu': 0.763, 'pos': 0.078, 'compound': -0.9538}

{'neg': 0.172, 'neu': 0.729, 'pos': 0.1, 'compound': -0.9612}

{'neg': 0.103, 'neu': 0.81, 'pos': 0.087, 'compound': -0.8126}

{'neg': 0.1, 'neu': 0.818, 'pos': 0.082, 'compound': -0.5719}

{'neg': 0.129, 'neu': 0.796, 'pos': 0.076, 'compound': -0.9413}

{'neg': 0.123, 'neu': 0.778, 'pos': 0.099, 'compound': -0.8481}

{'neg': 0.118, 'neu': 0.874, 'pos': 0.008, 'compound': -0.9451}

{'neg': 0.127, 'neu': 0.79, 'pos': 0.083, 'compound': -0.8481}

{'neg': 0.155, 'neu': 0.739, 'pos': 0.105, 'compound': -0.9246}

{'neg': 0.117, 'neu': 0.815, 'pos': 0.068, 'compound': -0.9042}

{'neg': 0.124, 'neu': 0.795, 'pos': 0.081, 'compound': -0.8555}

{'neg': 0.11, 'neu': 0.795, 'pos': 0.096, 'compound': -0.7181}

{'neg': 0.097, 'neu': 0.809, 'pos': 0.095, 'compound': -0.1531}

{'neg': 0.149, 'neu': 0.771, 'pos': 0.08, 'compound': -0.9944}

{'neg': 0.184, 'neu': 0.745, 'pos': 0.072, 'compound': -0.9794}

{'neg': 0.097, 'neu': 0.797, 'pos': 0.107, 'compound': -0.1531}

{'neg': 0.097, 'neu': 0.794, 'pos': 0.109, 'compound': -0.0964}

{'neg': 0.109, 'neu': 0.817, 'pos': 0.074, 'compound': -0.9201}

{'neg': 0.101, 'neu': 0.797, 'pos': 0.102, 'compound': -0.5423}

{'neg': 0.094, 'neu': 0.803, 'pos': 0.103, 'compound': -0.0258}

{'neg': 0.129, 'neu': 0.787, 'pos': 0.084, 'compound': -0.9524}

{'neg': 0.233, 'neu': 0.73, 'pos': 0.037, 'compound': -0.9887}

{'neg': 0.15, 'neu': 0.767, 'pos': 0.083, 'compound': -0.9946}

{'neg': 0.146, 'neu': 0.774, 'pos': 0.08, 'compound': -0.9849}

{'neg': 0.143, 'neu': 0.773, 'pos': 0.085, 'compound': -0.9843}

{'neg': 0.14, 'neu': 0.733, 'pos': 0.127, 'compound': -0.81}

{'neg': 0.048, 'neu': 0.776, 'pos': 0.176, 'compound': 0.9924}

{'neg': 0.21, 'neu': 0.709, 'pos': 0.082, 'compound': -0.9848}

{'neg': 0.097, 'neu': 0.829, 'pos': 0.074, 'compound': -0.802}

{'neg': 0.136, 'neu': 0.764, 'pos': 0.099, 'compound': -0.872}

{'neg': 0.172, 'neu': 0.752, 'pos': 0.076, 'compound': -0.981}

{'neg': 0.169, 'neu': 0.704, 'pos': 0.127, 'compound': -0.9612}

{'neg': 0.158, 'neu': 0.728, 'pos': 0.113, 'compound': -0.9274}

{'neg': 0.094, 'neu': 0.821, 'pos': 0.085, 'compound': -0.5994}

{'neg': 0.187, 'neu': 0.753, 'pos': 0.06, 'compound': -0.9909}

{'neg': 0.171, 'neu': 0.755, 'pos': 0.074, 'compound': -0.9724}

{'neg': 0.219, 'neu': 0.695, 'pos': 0.086, 'compound': -0.995}

{'neg': 0.202, 'neu': 0.739, 'pos': 0.059, 'compound': -0.9946}

{'neg': 0.223, 'neu': 0.697, 'pos': 0.081, 'compound': -0.9929}

{'neg': 0.047, 'neu': 0.883, 'pos': 0.07, 'compound': 0.8765}

{'neg': 0.171, 'neu': 0.781, 'pos': 0.047, 'compound': -0.9758}

{'neg': 0.212, 'neu': 0.735, 'pos': 0.052, 'compound': -0.9985}

{'neg': 0.212, 'neu': 0.728, 'pos': 0.06, 'compound': -0.9988}

{'neg': 0.155, 'neu': 0.768, 'pos': 0.077, 'compound': -0.9848}

{'neg': 0.193, 'neu': 0.723, 'pos': 0.084, 'compound': -0.993}

{'neg': 0.171, 'neu': 0.757, 'pos': 0.072, 'compound': -0.9857}

{'neg': 0.155, 'neu': 0.741, 'pos': 0.104, 'compound': -0.9775}

{'neg': 0.17, 'neu': 0.72, 'pos': 0.111, 'compound': -0.9866}

{'neg': 0.173, 'neu': 0.761, 'pos': 0.067, 'compound': -0.9917}

{'neg': 0.214, 'neu': 0.702, 'pos': 0.084, 'compound': -0.995}

{'neg': 0.155, 'neu': 0.749, 'pos': 0.097, 'compound': -0.9691}

{'neg': 0.141, 'neu': 0.761, 'pos': 0.098, 'compound': -0.9287}

{'neg': 0.132, 'neu': 0.811, 'pos': 0.057, 'compound': -0.9595}

{'neg': 0.17, 'neu': 0.765, 'pos': 0.065, 'compound': -0.9786}

{'neg': 0.095, 'neu': 0.894, 'pos': 0.01, 'compound': -0.946}

{'neg': 0.127, 'neu': 0.809, 'pos': 0.064, 'compound': -0.9468}

{'neg': 0.131, 'neu': 0.806, 'pos': 0.063, 'compound': -0.9607}

{'neg': 0.097, 'neu': 0.841, 'pos': 0.062, 'compound': -0.7906}

{'neg': 0.137, 'neu': 0.801, 'pos': 0.062, 'compound': -0.9538}

{'neg': 0.128, 'neu': 0.792, 'pos': 0.08, 'compound': -0.8658}

{'neg': 0.121, 'neu': 0.786, 'pos': 0.093, 'compound': -0.8074}

{'neg': 0.178, 'neu': 0.756, 'pos': 0.066, 'compound': -0.9824}

{'neg': 0.172, 'neu': 0.783, 'pos': 0.045, 'compound': -0.9936}

{'neg': 0.16, 'neu': 0.751, 'pos': 0.089, 'compound': -0.9889}

{'neg': 0.204, 'neu': 0.708, 'pos': 0.089, 'compound': -0.9974}

{'neg': 0.17, 'neu': 0.719, 'pos': 0.111, 'compound': -0.9932}

{'neg': 0.22, 'neu': 0.73, 'pos': 0.051, 'compound': -0.9905}

{'neg': 0.177, 'neu': 0.786, 'pos': 0.036, 'compound': -0.9941}

{'neg': 0.135, 'neu': 0.768, 'pos': 0.097, 'compound': -0.9389}

{'neg': 0.139, 'neu': 0.779, 'pos': 0.082, 'compound': -0.9313}

{'neg': 0.196, 'neu': 0.74, 'pos': 0.064, 'compound': -0.9766}

{'neg': 0.154, 'neu': 0.775, 'pos': 0.071, 'compound': -0.8225}

{'neg': 0.174, 'neu': 0.747, 'pos': 0.079, 'compound': -0.9769}

{'neg': 0.088, 'neu': 0.858, 'pos': 0.054, 'compound': -0.8809}

{'neg': 0.176, 'neu': 0.75, 'pos': 0.075, 'compound': -0.9803}

{'neg': 0.128, 'neu': 0.778, 'pos': 0.094, 'compound': -0.9595}

{'neg': 0.125, 'neu': 0.783, 'pos': 0.092, 'compound': -0.9008}

{'neg': 0.171, 'neu': 0.766, 'pos': 0.063, 'compound': -0.9884}

{'neg': 0.167, 'neu': 0.733, 'pos': 0.1, 'compound': -0.9674}

{'neg': 0.154, 'neu': 0.764, 'pos': 0.082, 'compound': -0.9967}

{'neg': 0.114, 'neu': 0.812, 'pos': 0.074, 'compound': -0.9042}

{'neg': 0.12, 'neu': 0.85, 'pos': 0.03, 'compound': -0.9823}

{'neg': 0.119, 'neu': 0.815, 'pos': 0.065, 'compound': -0.8846}

{'neg': 0.092, 'neu': 0.82, 'pos': 0.088, 'compound': -0.128}

{'neg': 0.16, 'neu': 0.714, 'pos': 0.126, 'compound': -0.922}

{'neg': 0.197, 'neu': 0.751, 'pos': 0.053, 'compound': -0.9911}

{'neg': 0.141, 'neu': 0.764, 'pos': 0.096, 'compound': -0.986}

{'neg': 0.127, 'neu': 0.796, 'pos': 0.078, 'compound': -0.9042}

{'neg': 0.108, 'neu': 0.819, 'pos': 0.073, 'compound': -0.8519}

{'neg': 0.165, 'neu': 0.748, 'pos': 0.087, 'compound': -0.9908}

{'neg': 0.158, 'neu': 0.776, 'pos': 0.066, 'compound': -0.9824}

{'neg': 0.14, 'neu': 0.775, 'pos': 0.085, 'compound': -0.9231}

{'neg': 0.115, 'neu': 0.772, 'pos': 0.113, 'compound': -0.3612}

{'neg': 0.18, 'neu': 0.747, 'pos': 0.073, 'compound': -0.9468}

{'neg': 0.141, 'neu': 0.825, 'pos': 0.034, 'compound': -0.9893}

{'neg': 0.163, 'neu': 0.809, 'pos': 0.028, 'compound': -0.9932}

{'neg': 0.089, 'neu': 0.798, 'pos': 0.112, 'compound': 0.5423}

{'neg': 0.118, 'neu': 0.79, 'pos': 0.092, 'compound': -0.9118}

{'neg': 0.13, 'neu': 0.825, 'pos': 0.045, 'compound': -0.9853}

{'neg': 0.125, 'neu': 0.819, 'pos': 0.056, 'compound': -0.9803}

{'neg': 0.221, 'neu': 0.744, 'pos': 0.034, 'compound': -0.9912}

{'neg': 0.201, 'neu': 0.772, 'pos': 0.027, 'compound': -0.9924}

{'neg': 0.169, 'neu': 0.763, 'pos': 0.068, 'compound': -0.9806}

{'neg': 0.138, 'neu': 0.81, 'pos': 0.052, 'compound': -0.9891}

{'neg': 0.0, 'neu': 0.95, 'pos': 0.05, 'compound': 0.8481}

{'neg': 0.108, 'neu': 0.813, 'pos': 0.079, 'compound': -0.9274}

{'neg': 0.109, 'neu': 0.808, 'pos': 0.082, 'compound': -0.8957}

{'neg': 0.118, 'neu': 0.828, 'pos': 0.054, 'compound': -0.9632}

{'neg': 0.133, 'neu': 0.819, 'pos': 0.048, 'compound': -0.9787}

{'neg': 0.104, 'neu': 0.834, 'pos': 0.063, 'compound': -0.8834}

{'neg': 0.109, 'neu': 0.813, 'pos': 0.078, 'compound': -0.6908}

{'neg': 0.211, 'neu': 0.7, 'pos': 0.089, 'compound': -0.9962}

{'neg': 0.182, 'neu': 0.759, 'pos': 0.059, 'compound': -0.9975}

{'neg': 0.176, 'neu': 0.766, 'pos': 0.058, 'compound': -0.9847}

{'neg': 0.164, 'neu': 0.767, 'pos': 0.069, 'compound': -0.9953}

{'neg': 0.172, 'neu': 0.759, 'pos': 0.069, 'compound': -0.9808}

{'neg': 0.121, 'neu': 0.783, 'pos': 0.096, 'compound': -0.8429}

{'neg': 0.282, 'neu': 0.685, 'pos': 0.033, 'compound': -0.9964}

{'neg': 0.198, 'neu': 0.739, 'pos': 0.063, 'compound': -0.9987}

{'neg': 0.167, 'neu': 0.759, 'pos': 0.075, 'compound': -0.9979}

{'neg': 0.257, 'neu': 0.668, 'pos': 0.075, 'compound': -0.9939}

{'neg': 0.247, 'neu': 0.705, 'pos': 0.047, 'compound': -0.9915}

{'neg': 0.15, 'neu': 0.786, 'pos': 0.064, 'compound': -0.9783}

{'neg': 0.186, 'neu': 0.721, 'pos': 0.093, 'compound': -0.992}

{'neg': 0.1, 'neu': 0.827, 'pos': 0.074, 'compound': -0.9501}

{'neg': 0.102, 'neu': 0.778, 'pos': 0.12, 'compound': 0.7425}

{'neg': 0.012, 'neu': 0.87, 'pos': 0.118, 'compound': 0.9957}

{'neg': 0.02, 'neu': 0.861, 'pos': 0.119, 'compound': 0.9916}

{'neg': 0.039, 'neu': 0.932, 'pos': 0.028, 'compound': -0.3549}

{'neg': 0.138, 'neu': 0.765, 'pos': 0.097, 'compound': -0.9623}

{'neg': 0.0, 'neu': 0.892, 'pos': 0.108, 'compound': 0.7717}

{'neg': 0.0, 'neu': 0.879, 'pos': 0.121, 'compound': 0.8519}

{'neg': 0.031, 'neu': 0.752, 'pos': 0.217, 'compound': 0.9811}

{'neg': 0.08, 'neu': 0.851, 'pos': 0.069, 'compound': -0.3612}

{'neg': 0.0, 'neu': 0.94, 'pos': 0.06, 'compound': 0.7003}

{'neg': 0.22, 'neu': 0.733, 'pos': 0.047, 'compound': -0.9723}

{'neg': 0.148, 'neu': 0.774, 'pos': 0.078, 'compound': -0.9777}

{'neg': 0.156, 'neu': 0.751, 'pos': 0.093, 'compound': -0.9804}

{'neg': 0.199, 'neu': 0.738, 'pos': 0.063, 'compound': -0.9947}

{'neg': 0.026, 'neu': 0.848, 'pos': 0.126, 'compound': 0.9828}

{'neg': 0.145, 'neu': 0.786, 'pos': 0.069, 'compound': -0.9795}

{'neg': 0.119, 'neu': 0.823, 'pos': 0.058, 'compound': -0.9403}

{'neg': 0.12, 'neu': 0.747, 'pos': 0.133, 'compound': 0.5996}

{'neg': 0.254, 'neu': 0.676, 'pos': 0.07, 'compound': -0.9978}

{'neg': 0.126, 'neu': 0.818, 'pos': 0.056, 'compound': -0.9709}

{'neg': 0.16, 'neu': 0.706, 'pos': 0.133, 'compound': -0.9404}

{'neg': 0.186, 'neu': 0.695, 'pos': 0.119, 'compound': -0.9659}

{'neg': 0.283, 'neu': 0.652, 'pos': 0.065, 'compound': -0.999}

{'neg': 0.231, 'neu': 0.671, 'pos': 0.097, 'compound': -0.9971}

{'neg': 0.215, 'neu': 0.73, 'pos': 0.055, 'compound': -0.9924}

{'neg': 0.183, 'neu': 0.76, 'pos': 0.057, 'compound': -0.9969}

{'neg': 0.137, 'neu': 0.813, 'pos': 0.05, 'compound': -0.9884}

{'neg': 0.134, 'neu': 0.814, 'pos': 0.052, 'compound': -0.9871}

{'neg': 0.189, 'neu': 0.725, 'pos': 0.086, 'compound': -0.9833}

{'neg': 0.195, 'neu': 0.761, 'pos': 0.044, 'compound': -0.9965}

{'neg': 0.199, 'neu': 0.698, 'pos': 0.103, 'compound': -0.9936}

{'neg': 0.134, 'neu': 0.761, 'pos': 0.106, 'compound': -0.9274}

{'neg': 0.148, 'neu': 0.81, 'pos': 0.042, 'compound': -0.9747}

{'neg': 0.19, 'neu': 0.737, 'pos': 0.073, 'compound': -0.9919}

{'neg': 0.143, 'neu': 0.76, 'pos': 0.096, 'compound': -0.9538}

{'neg': 0.197, 'neu': 0.758, 'pos': 0.045, 'compound': -0.9968}

{'neg': 0.125, 'neu': 0.81, 'pos': 0.066, 'compound': -0.9451}

{'neg': 0.194, 'neu': 0.777, 'pos': 0.029, 'compound': -0.9953}

{'neg': 0.125, 'neu': 0.752, 'pos': 0.123, 'compound': -0.5817}

{'neg': 0.121, 'neu': 0.746, 'pos': 0.133, 'compound': 0.9003}

{'neg': 0.24, 'neu': 0.712, 'pos': 0.048, 'compound': -0.9978}

{'neg': 0.177, 'neu': 0.78, 'pos': 0.043, 'compound': -0.9652}

{'neg': 0.159, 'neu': 0.749, 'pos': 0.092, 'compound': -0.9974}

{'neg': 0.198, 'neu': 0.737, 'pos': 0.066, 'compound': -0.9992}

{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'compound': -0.9981}

{'neg': 0.221, 'neu': 0.764, 'pos': 0.015, 'compound': -0.9961}

{'neg': 0.155, 'neu': 0.767, 'pos': 0.078, 'compound': -0.9382}

{'neg': 0.155, 'neu': 0.807, 'pos': 0.038, 'compound': -0.9847}

{'neg': 0.164, 'neu': 0.748, 'pos': 0.087, 'compound': -0.9967}

{'neg': 0.165, 'neu': 0.733, 'pos': 0.102, 'compound': -0.974}

{'neg': 0.15, 'neu': 0.789, 'pos': 0.061, 'compound': -0.9805}

{'neg': 0.198, 'neu': 0.744, 'pos': 0.059, 'compound': -0.9973}

{'neg': 0.12, 'neu': 0.846, 'pos': 0.034, 'compound': -0.992}

{'neg': 0.147, 'neu': 0.805, 'pos': 0.048, 'compound': -0.9792}

{'neg': 0.185, 'neu': 0.757, 'pos': 0.058, 'compound': -0.991}

{'neg': 0.114, 'neu': 0.861, 'pos': 0.026, 'compound': -0.9393}

{'neg': 0.161, 'neu': 0.806, 'pos': 0.033, 'compound': -0.9951}

{'neg': 0.17, 'neu': 0.759, 'pos': 0.071, 'compound': -0.9848}

{'neg': 0.189, 'neu': 0.725, 'pos': 0.086, 'compound': -0.9858}

{'neg': 0.213, 'neu': 0.724, 'pos': 0.063, 'compound': -0.9951}

{'neg': 0.297, 'neu': 0.67, 'pos': 0.033, 'compound': -0.9964}

{'neg': 0.187, 'neu': 0.767, 'pos': 0.046, 'compound': -0.9652}

{'neg': 0.139, 'neu': 0.804, 'pos': 0.057, 'compound': -0.9949}

{'neg': 0.118, 'neu': 0.82, 'pos': 0.062, 'compound': -0.9776}

{'neg': 0.185, 'neu': 0.74, 'pos': 0.075, 'compound': -0.9886}

{'neg': 0.266, 'neu': 0.684, 'pos': 0.051, 'compound': -0.9989}

{'neg': 0.143, 'neu': 0.756, 'pos': 0.101, 'compound': -0.9916}

{'neg': 0.123, 'neu': 0.782, 'pos': 0.095, 'compound': -0.9201}

{'neg': 0.119, 'neu': 0.79, 'pos': 0.091, 'compound': -0.9117}

{'neg': 0.234, 'neu': 0.706, 'pos': 0.06, 'compound': -0.9989}

{'neg': 0.17, 'neu': 0.705, 'pos': 0.126, 'compound': -0.9925}

{'neg': 0.107, 'neu': 0.868, 'pos': 0.025, 'compound': -0.9623}

{'neg': 0.194, 'neu': 0.76, 'pos': 0.046, 'compound': -0.9906}

{'neg': 0.189, 'neu': 0.743, 'pos': 0.067, 'compound': -0.9944}

{'neg': 0.164, 'neu': 0.789, 'pos': 0.047, 'compound': -0.9988}

{'neg': 0.137, 'neu': 0.821, 'pos': 0.042, 'compound': -0.9953}

{'neg': 0.194, 'neu': 0.686, 'pos': 0.119, 'compound': -0.9706}

{'neg': 0.195, 'neu': 0.732, 'pos': 0.072, 'compound': -0.9916}

{'neg': 0.271, 'neu': 0.677, 'pos': 0.052, 'compound': -0.9963}

{'neg': 0.272, 'neu': 0.669, 'pos': 0.059, 'compound': -0.9908}

{'neg': 0.22, 'neu': 0.726, 'pos': 0.054, 'compound': -0.9946}

{'neg': 0.242, 'neu': 0.689, 'pos': 0.069, 'compound': -0.9962}

{'neg': 0.15, 'neu': 0.731, 'pos': 0.119, 'compound': -0.9516}

{'neg': 0.273, 'neu': 0.691, 'pos': 0.036, 'compound': -0.9932}

{'neg': 0.145, 'neu': 0.842, 'pos': 0.014, 'compound': -0.9753}

{'neg': 0.127, 'neu': 0.765, 'pos': 0.108, 'compound': -0.8442}

{'neg': 0.133, 'neu': 0.784, 'pos': 0.083, 'compound': -0.9666}

{'neg': 0.147, 'neu': 0.757, 'pos': 0.096, 'compound': -0.978}

{'neg': 0.177, 'neu': 0.726, 'pos': 0.097, 'compound': -0.9951}

{'neg': 0.199, 'neu': 0.731, 'pos': 0.07, 'compound': -0.995}

{'neg': 0.175, 'neu': 0.751, 'pos': 0.074, 'compound': -0.9962}

{'neg': 0.198, 'neu': 0.745, 'pos': 0.057, 'compound': -0.9989}

{'neg': 0.228, 'neu': 0.713, 'pos': 0.059, 'compound': -0.9989}

{'neg': 0.218, 'neu': 0.713, 'pos': 0.069, 'compound': -0.9989}

{'neg': 0.228, 'neu': 0.754, 'pos': 0.018, 'compound': -0.9974}

{'neg': 0.133, 'neu': 0.758, 'pos': 0.11, 'compound': -0.8771}

{'neg': 0.15, 'neu': 0.768, 'pos': 0.083, 'compound': -0.9665}

{'neg': 0.215, 'neu': 0.721, 'pos': 0.065, 'compound': -0.9978}

{'neg': 0.178, 'neu': 0.748, 'pos': 0.074, 'compound': -0.9924}

{'neg': 0.245, 'neu': 0.687, 'pos': 0.069, 'compound': -0.9989}

{'neg': 0.123, 'neu': 0.822, 'pos': 0.055, 'compound': -0.9774}

{'neg': 0.203, 'neu': 0.768, 'pos': 0.029, 'compound': -0.9981}

{'neg': 0.251, 'neu': 0.713, 'pos': 0.036, 'compound': -0.9982}

{'neg': 0.212, 'neu': 0.75, 'pos': 0.038, 'compound': -0.9965}

{'neg': 0.204, 'neu': 0.731, 'pos': 0.065, 'compound': -0.9892}

{'neg': 0.232, 'neu': 0.725, 'pos': 0.044, 'compound': -0.9949}

{'neg': 0.064, 'neu': 0.863, 'pos': 0.073, 'compound': 0.25}

{'neg': 0.071, 'neu': 0.824, 'pos': 0.105, 'compound': 0.8344}

{'neg': 0.056, 'neu': 0.853, 'pos': 0.091, 'compound': 0.9459}

{'neg': 0.191, 'neu': 0.736, 'pos': 0.073, 'compound': -0.9988}

{'neg': 0.173, 'neu': 0.766, 'pos': 0.06, 'compound': -0.9836}

{'neg': 0.014, 'neu': 0.849, 'pos': 0.136, 'compound': 0.9983}

{'neg': 0.027, 'neu': 0.823, 'pos': 0.15, 'compound': 0.9926}

{'neg': 0.078, 'neu': 0.823, 'pos': 0.1, 'compound': 0.3182}

{'neg': 0.087, 'neu': 0.831, 'pos': 0.081, 'compound': -0.7506}

{'neg': 0.216, 'neu': 0.697, 'pos': 0.087, 'compound': -0.9988}

{'neg': 0.166, 'neu': 0.811, 'pos': 0.023, 'compound': -0.9801}

{'neg': 0.21, 'neu': 0.746, 'pos': 0.044, 'compound': -0.9877}

{'neg': 0.136, 'neu': 0.826, 'pos': 0.038, 'compound': -0.9846}

{'neg': 0.186, 'neu': 0.732, 'pos': 0.081, 'compound': -0.9868}

{'neg': 0.245, 'neu': 0.711, 'pos': 0.045, 'compound': -0.9961}

{'neg': 0.227, 'neu': 0.706, 'pos': 0.066, 'compound': -0.9956}

{'neg': 0.165, 'neu': 0.766, 'pos': 0.069, 'compound': -0.983}

{'neg': 0.075, 'neu': 0.819, 'pos': 0.106, 'compound': 0.6597}

{'neg': 0.097, 'neu': 0.847, 'pos': 0.056, 'compound': -0.8807}

{'neg': 0.099, 'neu': 0.808, 'pos': 0.093, 'compound': -0.5423}

{'neg': 0.136, 'neu': 0.788, 'pos': 0.077, 'compound': -0.9118}

{'neg': 0.084, 'neu': 0.814, 'pos': 0.103, 'compound': 0.5106}

{'neg': 0.073, 'neu': 0.844, 'pos': 0.083, 'compound': 0.7906}

{'neg': 0.092, 'neu': 0.808, 'pos': 0.1, 'compound': 0.2732}

{'neg': 0.12, 'neu': 0.783, 'pos': 0.097, 'compound': -0.9139}

{'neg': 0.114, 'neu': 0.79, 'pos': 0.096, 'compound': -0.5719}

{'neg': 0.103, 'neu': 0.775, 'pos': 0.122, 'compound': 0.6486}

{'neg': 0.083, 'neu': 0.852, 'pos': 0.065, 'compound': -0.8391}

{'neg': 0.057, 'neu': 0.866, 'pos': 0.076, 'compound': 0.5106}

{'neg': 0.064, 'neu': 0.864, 'pos': 0.073, 'compound': 0.3818}

{'neg': 0.235, 'neu': 0.689, 'pos': 0.076, 'compound': -0.9933}

{'neg': 0.242, 'neu': 0.711, 'pos': 0.047, 'compound': -0.9904}

{'neg': 0.151, 'neu': 0.757, 'pos': 0.092, 'compound': -0.9901}

{'neg': 0.188, 'neu': 0.77, 'pos': 0.042, 'compound': -0.9943}

{'neg': 0.155, 'neu': 0.781, 'pos': 0.064, 'compound': -0.9689}

{'neg': 0.084, 'neu': 0.816, 'pos': 0.099, 'compound': 0.5267}

{'neg': 0.159, 'neu': 0.751, 'pos': 0.089, 'compound': -0.9648}

{'neg': 0.138, 'neu': 0.762, 'pos': 0.1, 'compound': -0.9795}

{'neg': 0.152, 'neu': 0.817, 'pos': 0.031, 'compound': -0.9921}

C. Add the four sentiment scores to the `doj_subset` dataframe to create a dataframe: `doj_subset_wscore`. Sort from highest neg to lowest neg score and print the top `id`, `contents`, and `neg` columns of the two most neg press releases. 

Notes:

- Don't worry if your sentiment score differs slightly from our output on GitHub; differences in preprocessing can lead to diff scores

In [23]:
doj_subset_wscore = doj_subset 

In [24]:
#C 

for index,row in doj_subset_wscore.iterrows():
    # Get the press release content from the 'contents' column
    press_release = row['contents']
    
    # Apply the sentiment analysis function to get sentiment scores
    sentiment_scores = get_sentiment(press_release)
    
    
    # Fill in the 'neg', 'neu', 'pos', and 'compound' columns with sentiment scores
    doj_subset_wscore.at[index, 'neg'] = sentiment_scores['pos']
    doj_subset_wscore.at[index, 'neu'] = sentiment_scores['neu']
    doj_subset_wscore.at[index, 'pos'] = sentiment_scores['pos']
    doj_subset_wscore.at[index, 'compound'] = sentiment_scores['compound']


In [25]:
doj_subset_wscore = doj_subset_wscore.sort_values(by='pos', ascending=True)
id_contents_neg = doj_subset_wscore.head(2)[['id', 'contents','neg']]
id_contents_neg

Unnamed: 0,id,contents,neg
5247,18-913,"Glenn Eugene Halfin, 64, from Grapevine, Texas, appeared today before U.S. Magistrate Judge Jeffrey L. Cureton in the U.S. District Court for the Northern District of Texas and pleaded guilty to a federal charge of interfering with an African-American family’s housing rights, announced Acting Assistant Attorney General John Gore of the Civil Rights Division and U.S. Attorney Erin Nealy Cox of the Northern District of Texas. According to court documents, Halfin threatened force, intimidated, and interfered with a family because of their race and occupancy of an apartment that was located directly above his own apartment. According to documents filed in connection with the guilty plea, on Dec. 19, 2017, Halfin purchased a baby doll at a Wal-Mart in Grapevine, Texas. He took a rope, fashioned it into a noose, and hung the baby doll from the noose. Halfin then hung the rope noose and baby doll on the railing directly in front of the only staircase the family could use to access their apartment. Halfin did so, knowing that this display would be particularly intimidating for the family who had a young daughter. In addition, the defendant referenced in his factual basis repeated intimidation of and interference with the same African-American family on other occasions. “The Justice Department will not tolerate acts of intimidation and fear, or illegal threats against any individual or family because of their race,” said Acting Assistant Attorney John Gore. “We will continue to prosecute hate crime offenders.” “No one should be afraid to go home at night,” said U.S. Attorney Erin Nealy Cox. “Our community will not tolerate crimes of intimidation or bigotry, and my office will continue to prosecute all those who persecute others based on their race, color, ethnicity, or religious beliefs.” Halfin faces a statutory maximum penalty of no more one year in federal prison and a $100,000 fine. His sentencing is scheduled for October 24. This case was investigated by the FBI and the Grapevine Police Department. The case was prosecuted by Trial Attorney Rebekah Bailey of the Civil Rights Division’s Criminal Section and Assistant United States Attorney Nicole Dana.",0.005
8075,11-1245,"WASHINGTON – Paul W. Miller, of Denham Springs, La., was convicted late yesterday of two counts of producing and one count of possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division and U.S. Attorney Donald J. Cazayoux Jr. of the Middle District of Louisiana. Miller, 44, was convicted by a federal jury following a two-day trial. U.S. District Judge James J. Brady presided over the trial. Evidence presented at trial showed that from October 2007 to May 2008, Miller sexually abused a 12-year-old girl and an 11-year-old girl and produced numerous photographs of the abuse. According to trial evidence, forensic examination of Miller’s computer revealed that Miller had used his computer to print and possess numerous images of child pornography, including both the images of child pornography he had produced and images of other child victims. Miller faces a maximum statutory sentence of 30 years in prison for each count of production of child pornography and 10 years in prison for the possession of child pornography count. The case is being prosecuted by Trial Attorney Alecia Riewerts Wolak of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Richard L. Bourgeois Jr. of the Middle District of Louisiana. The investigation was conducted by the FBI, the Denham Springs Police Department and the Louisiana Attorney General’s Office.",0.007


D. With the dataframe from part C, find the mean compound sentiment score for each of the three topics in `topics_clean` using group_by and agg.

E. Add a 1 sentence interpretation of why we might see the variation in scores (remember that compound is a standardized summary where -1 is most negative; +1 is most positive)


In [26]:
#D 
#Civil Rights, Hate Crimes, and Project Safe Childhood

#mean compound sentiment score for Civil Rights 
mean_civil = doj_subset_wscore.groupby(doj_subset_wscore['topics_clean'] == 'Civil Rights').agg({'compound': 'mean'})
print("the mean compound sentiment score for Civil Rights topics is:", mean_civil)

#mean compound sentiment score for Hate Crimes 
mean_hate = doj_subset_wscore.groupby(doj_subset_wscore['topics_clean'] == 'Hate Crimes')['compound'].mean()
print("the mean compound sentiment score for Hate Crimes topics is:", mean_hate)

#mean compound sentiment score for Project Safe Childhood
mean_project = doj_subset_wscore.groupby(doj_subset_wscore['topics_clean'] == 'Project Safe Childhood')['compound'].mean()
print("the mean compound sentiment score for Project Safe Childhood topics is:", mean_project)

the mean compound sentiment score for Civil Rights topics is:               compound
topics_clean          
False        -0.824946
True         -0.094465
the mean compound sentiment score for Hate Crimes topics is: topics_clean
False   -0.293930
True    -0.935971
Name: compound, dtype: float64
the mean compound sentiment score for Project Safe Childhood topics is: topics_clean
False   -0.470165
True    -0.660416
Name: compound, dtype: float64


**#E** One reason that we might see a variation in scores is that some topics may be talked about in a positive light in press releases (such as Project Safe Childhood), raising the mean compound score, whereas it would be unlikely to for a press release to talk about hate crimes in a positive way. 

# 2. Topic modeling (25 points)

For this question, use the `doj_subset_wscores` data that is restricted to civil rights, hate crimes, and project safe childhood and with the sentiment scores added


## 2.1 Preprocess the data by removing stopwords, punctuation, and non-alpha words (5 points)

A. Write a function that:

- Takes in a single raw string in the `contents` column from that dataframe
- Does the following preprocessing steps:

    - Converts the words to lowercase
    - Removes stopwords, adding the custom stopwords in your code cell below to the default stopwords list
    - Only retains alpha words (so removes digits and punctuation)
    - Only retains words 4 characters or longer
    - Uses the snowball stemmer from nltk to stem

- Returns a joined preprocessed string
    
B. Use `apply` or list comprehension to execute that function and create a new column in the data called `processed_text`
    
C. Print the `id`, `contents`, and `processed_text` columns for the following press releases:

id = 16-718 (this case: https://www.seattletimes.com/nation-world/doj-miami-police-reach-settlement-in-civil-rights-case/)

id = 16-217 (this case: https://www.wlbt.com/story/32275512/three-mississippi-correctional-officers-indicted-for-inmate-assault-and-cover-up/)
    
**Resources**:

- Here's code examples for the snowball stemmer: https://www.geeksforgeeks.org/snowball-stemmer-nlp/

In [68]:
custom_doj_stopwords = ["civil", "rights", "division", "department", "justice",
                        "office", "attorney", "district", "case", "investigation", "assistant",
                       "trial", "assistance", "assist"]
list_stopwords = stopwords.words("english")
list_stopwords_new = list_stopwords + custom_doj_stopwords
## initialize stemmer
snow_stemmer = SnowballStemmer(language='english')

In [91]:
#A 
def preprocess(string): 
    try:
        if not isinstance(string, str):
            string = string.to_string()
        string_lower = string.lower()
        nostop_string = [word 
                    for word in wordpunct_tokenize(string_lower) 
                    if word not in list_stopwords_new 
                        and word.isalpha()
                        and len(word) > 3]
        clean_string = [snow_stemmer.stem(word) for word in nostop_string]
        clean_string_str = " ".join(clean_string)
        return(clean_string_str)
    except:
        return("")
    

In [90]:
# testing the function on one contents instance
preprocess(doj_subset_wscore.iloc[1].contents)

'washington paul miller denham spring convict late yesterday count produc count possess child pornographi announc general lanni breuer crimin donald cazayoux middl louisiana miller convict feder juri follow judg jame bradi presid evid present show octob miller sexual abus year girl year girl produc numer photograph abus accord evid forens examin miller comput reveal miller use comput print possess numer imag child pornographi includ imag child pornographi produc imag child victim miller face maximum statutori sentenc year prison count product child pornographi year prison possess child pornographi count prosecut alecia riewert wolak crimin child exploit obscen section ceo richard bourgeoi middl louisiana conduct denham spring polic louisiana general'

In [95]:
#B
cleaned_strings = [preprocess(string) for string in 
                   doj_subset_wscore.contents]
doj_subset_wscore['processed_text'] = cleaned_strings
doj_subset_wscore

Unnamed: 0,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,compound,processed_text
5247,18-913,Grapevine Texas Man Pleads Guilty to Federal Hate Crime Against an African-American Family,"Glenn Eugene Halfin, 64, from Grapevine, Texas, appeared today before U.S. Magistrate Judge Jeffrey L. Cureton in the U.S. District Court for the Northern District of Texas and pleaded guilty to a federal charge of interfering with an African-American family’s housing rights, announced Acting Assistant Attorney General John Gore of the Civil Rights Division and U.S. Attorney Erin Nealy Cox of the Northern District of Texas. According to court documents, Halfin threatened force, intimidated, and interfered with a family because of their race and occupancy of an apartment that was located directly above his own apartment. According to documents filed in connection with the guilty plea, on Dec. 19, 2017, Halfin purchased a baby doll at a Wal-Mart in Grapevine, Texas. He took a rope, fashioned it into a noose, and hung the baby doll from the noose. Halfin then hung the rope noose and baby doll on the railing directly in front of the only staircase the family could use to access their apartment. Halfin did so, knowing that this display would be particularly intimidating for the family who had a young daughter. In addition, the defendant referenced in his factual basis repeated intimidation of and interference with the same African-American family on other occasions. “The Justice Department will not tolerate acts of intimidation and fear, or illegal threats against any individual or family because of their race,” said Acting Assistant Attorney John Gore. “We will continue to prosecute hate crime offenders.” “No one should be afraid to go home at night,” said U.S. Attorney Erin Nealy Cox. “Our community will not tolerate crimes of intimidation or bigotry, and my office will continue to prosecute all those who persecute others based on their race, color, ethnicity, or religious beliefs.” Halfin faces a statutory maximum penalty of no more one year in federal prison and a $100,000 fine. His sentencing is scheduled for October 24. This case was investigated by the FBI and the Grapevine Police Department. The case was prosecuted by Trial Attorney Rebekah Bailey of the Civil Rights Division’s Criminal Section and Assistant United States Attorney Nicole Dana.",2018-07-12T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.005,0.806,0.005,-0.9955,glenn eugen halfin grapevin texa appear today magistr judg jeffrey cureton court northern texa plead guilti feder charg interf african american famili hous announc act general john gore erin neali northern texa accord court document halfin threaten forc intimid interf famili race occup apart locat direct apart accord document file connect guilti plea halfin purchas babi doll mart grapevin texa took rope fashion noos hung babi doll noos halfin hung rope noos babi doll rail direct front staircas famili could access apart halfin know display would particular intimid famili young daughter addit defend referenc factual basi repeat intimid interfer african american famili occas toler act intimid fear illeg threat individu famili race said act john gore continu prosecut hate crime offend afraid home night said erin neali communiti toler crime intimid bigotri continu prosecut persecut other base race color ethnic religi belief halfin face statutori maximum penalti year feder prison fine sentenc schedul octob investig grapevin polic prosecut rebekah bailey crimin section unit state nicol dana
8075,11-1245,Louisiana Man Convicted of Producing and Possessing Child Pornography,"WASHINGTON – Paul W. Miller, of Denham Springs, La., was convicted late yesterday of two counts of producing and one count of possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division and U.S. Attorney Donald J. Cazayoux Jr. of the Middle District of Louisiana. Miller, 44, was convicted by a federal jury following a two-day trial. U.S. District Judge James J. Brady presided over the trial. Evidence presented at trial showed that from October 2007 to May 2008, Miller sexually abused a 12-year-old girl and an 11-year-old girl and produced numerous photographs of the abuse. According to trial evidence, forensic examination of Miller’s computer revealed that Miller had used his computer to print and possess numerous images of child pornography, including both the images of child pornography he had produced and images of other child victims. Miller faces a maximum statutory sentence of 30 years in prison for each count of production of child pornography and 10 years in prison for the possession of child pornography count. The case is being prosecuted by Trial Attorney Alecia Riewerts Wolak of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Richard L. Bourgeois Jr. of the Middle District of Louisiana. The investigation was conducted by the FBI, the Denham Springs Police Department and the Louisiana Attorney General’s Office.",2011-09-22T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.007,0.887,0.007,-0.9565,washington paul miller denham spring convict late yesterday count produc count possess child pornographi announc general lanni breuer crimin donald cazayoux middl louisiana miller convict feder juri follow judg jame bradi presid evid present show octob miller sexual abus year girl year girl produc numer photograph abus accord evid forens examin miller comput reveal miller use comput print possess numer imag child pornographi includ imag child pornographi produc imag child victim miller face maximum statutori sentenc year prison count product child pornographi year prison possess child pornographi count prosecut alecia riewert wolak crimin child exploit obscen section ceo richard bourgeoi middl louisiana conduct denham spring polic louisiana general
8353,11-1331,Massachusetts Man Pleads Guilty to Receiving and Possessing Child Pornography,"WASHINGTON – A Springfield, Mass., man pleaded guilty today to receiving and possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division. Robert Rosenbeck, 48, pleaded guilty before U.S. District Judge Denise J. Casper in Boston to one count of receipt of child pornography and two counts of possession of child pornography. He was indicted on those charges on Dec. 10, 2009. According to court documents, Rosenbeck possessed two different computers containing child pornography in 2007. Additionally, from approximately July 22, 2007, to July 25, 2007, Rosenbeck received computer files containing child pornography from an Internet website. At sentencing, scheduled for Dec. 5, 2011, Rosenbeck faces a maximum statutory sentence of 20 years in prison for the receipt of child pornography count and 10 years in prison for each count of possession of child pornography. Rosenbeck also faces a term of supervised release of at least five years and up to life. The case is being prosecuted by Trial Attorneys Alecia Riewerts Wolak and Michael W. Grant of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS). The investigation was conducted by the FBI with assistance provided by the Springfield Police Department.",2011-10-06T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.008,0.874,0.008,-0.9451,washington springfield mass plead guilti today receiv possess child pornographi announc general lanni breuer crimin robert rosenbeck plead guilti judg denis casper boston count receipt child pornographi count possess child pornographi indict charg accord court document rosenbeck possess differ comput contain child pornographi addit approxim juli juli rosenbeck receiv comput file contain child pornographi internet websit sentenc schedul rosenbeck face maximum statutori sentenc year prison receipt child pornographi count year prison count possess child pornographi rosenbeck also face term supervis releas least five year life prosecut attorney alecia riewert wolak michael grant crimin child exploit obscen section ceo conduct provid springfield polic
7618,17-242,"Justice Department Sues Edmonds, Washington Landlords for Discriminating Against Families With Children","The U.S. Department of Justice today filed a lawsuit in U.S. District Court for the Western District of Washington alleging that the owners and manager of three Edmonds, Washington apartment buildings refused to rent their apartments to families with children, in violation of the Fair Housing Act. “The Fair Housing Act prohibits landlords from denying apartments to families just because they have children,” said Acting Assistant Attorney General Tom Wheeler of the Justice Department’s Civil Rights Division. “Many families already face challenges finding affordable housing, and they should not also have to deal with unlawful discrimination.” “Equal access to housing is essential for all Americans, including families with young children,” said U.S. Attorney Annette L. Hayes of the Western District of Washington. “Particularly in our tight housing market, landlords must follow the law and make units available without discrimination based on race, color, religion, sex, national origin, disability or familial status.” The complaint concerns three apartment buildings – located at 201 5th Ave. N., 621 5th Ave. S., and 401 Pine Street in Edmonds – that are managed by defendant Debbie A. Appleby, of Stanwood, Washington. The properties are owned by three Limited Liability Corporations (LLCs) controlled by Appleby – Apple One, LLC, Apple Two, LLC, and Apple Three, LLC—which are also named as defendants in the suit. The complaint alleges that in March 2014, defendant Appleby told a woman seeking an apartment for herself, her husband, and their one year old child that the apartment buildings were “adult only” and therefore not available to her family. The complaint also alleges that at various other times from April 2014 to November 2015, defendants advertised their available apartments as being restricted to adults only. The family filed a complaint with the Department of Housing and Urban Development (“HUD”) which conducted an investigation, issued a charge of discrimination against the defendants, and referred the case to the Department of Justice. The complaint seeks a court order requiring defendants to cease their discriminatory housing practices, damages for the family that filed the HUD complaint and any other families against whom the defendants discriminated against because they had children, and civil penalties. Any individuals who have information relevant to this case are encouraged to contact the Civil Rights Division at 1-800-896-7743, Option 96. The federal Fair Housing Act prohibits discrimination in housing on the basis of race, color, religion, sex, familial status, national origin and disability. More information about the Civil Rights Division and the civil rights laws it enforces is available at www.usdoj.gov/crt and https://www.justice.gov/usao-wdwa/civil-rights. Individuals who believe that they have been victims of housing discrimination may call the Justice Department at 1-800-896-7743, email the Justice Department at fairhousing@usdoj.gov, or contact HUD at 1-800-669-9777 or through its website at www.hud.gov. The case is being jointly handled by the Department’s Civil Rights Division and the U.S. Attorney’s Office for the Western District of Washington. The complaint is an allegation of unlawful conduct. The allegations must still be proven in federal court.",2017-03-03T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Housing and Civil Enforcement Section; USAO - Washington, Western",0.009,0.912,0.009,-0.9753,today file lawsuit court western washington alleg owner manag three edmond washington apart build refus rent apart famili children violat fair hous fair hous prohibit landlord deni apart famili children said act general wheeler mani famili alreadi face challeng find afford hous also deal unlaw discrimin equal access hous essenti american includ famili young children said annett hay western washington particular tight hous market landlord must follow make unit avail without discrimin base race color religion nation origin disabl famili status complaint concern three apart build locat pine street edmond manag defend debbi applebi stanwood washington properti own three limit liabil corpor llcs control applebi appl appl appl three also name defend suit complaint alleg march defend applebi told woman seek apart husband year child apart build adult therefor avail famili complaint also alleg various time april novemb defend advertis avail apart restrict adult famili file complaint hous urban develop conduct issu charg discrimin defend refer complaint seek court order requir defend ceas discriminatori hous practic damag famili file complaint famili defend discrimin children penalti individu inform relev encourag contact option feder fair hous prohibit discrimin hous basi race color religion famili status nation origin disabl inform law enforc avail usdoj https usao wdwa individu believ victim hous discrimin call email fairhous usdoj contact websit joint handl western washington complaint alleg unlaw conduct alleg must still proven feder court
8847,17-550,Mother Sentenced to 26 Months in Prison for Taking Child from Illinois to Canada in International Parental Kidnapping Case,"A Canadian woman was sentenced to serve 26 months in prison following her December conviction for international parental kidnapping, announced Acting Assistant Attorney General Kenneth A. Blanco of the Justice Department’s Criminal Division and Acting U.S. Attorney Patrick D. Hansen of the Central District of Illinois. Sarah M. Nixon, 48, of Montreal, Canada, was sentenced before U.S. District Judge Colin S. Bruce of the Central District of Illinois. On Dec. 21, 2016, a federal jury found Nixon guilty of one count of international parental kidnapping for taking her minor child from the United States to Canada in July 2015, with the intent to obstruct the lawful exercise of the father’s rights. Evidence at trial established that after a custody trial where it was apparent that Nixon would lose custody of her six-year-old daughter, Nixon fled the United States with the child in the middle of the night. When she did not appear for the custody ruling and neither she nor her daughter could be located, law enforcement issued a child abduction alert. Nixon and the child were eventually located in a farmhouse in rural Ontario, Canada. Authorities then returned the child to the father. Nixon was arrested in New York on Sept. 20, 2015, as she attempted to return to the United States. Trial Attorneys Elly M. Peirson and Lauren S. Kupersmith of the Criminal Division’s Child Exploitation and Obscenity Section prosecuted the case. The FBI; Urbana, Illinois, Police Department; University of Illinois Police Department; Illinois Department of Children and Family Services; Ontario Provincial Police; and U.S. Customs and Border Protection investigated the case, with assistance from the Champaign County, Illinois, State’s Attorney’s Office and the Criminal Division’s Office of International Affairs.",2017-05-19T00:00:00-04:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Illinois, Central",0.010,0.894,0.010,-0.9460,canadian woman sentenc serv month prison follow decemb convict intern parent kidnap announc act general kenneth blanco crimin act patrick hansen central illinoi sarah nixon montreal canada sentenc judg colin bruce central illinoi feder juri found nixon guilti count intern parent kidnap take minor child unit state canada juli intent obstruct law exercis father evid establish custodi appar nixon would lose custodi year daughter nixon fled unit state child middl night appear custodi rule neither daughter could locat enforc issu child abduct alert nixon child eventu locat farmhous rural ontario canada author return child father nixon arrest york sept attempt return unit state attorney elli peirson lauren kupersmith crimin child exploit obscen section prosecut urbana illinoi polic univers illinoi polic illinoi children famili servic ontario provinci polic custom border protect investig champaign counti illinoi state crimin intern affair
...,...,...,...,...,...,...,...,...,...,...,...
6089,18-119,Justice Department Announces Religious Liberty Update to U.S. Attorneys’ Manual and Directs the Designation of Religious Liberty Point of Contact for All U.S. Attorney's Offices,"The Department of Justice today announced the update of the United States Attorneys’ Manual (USAM) with a new section titled, “Associate Attorney General’s Approval and Notice Requirements for Issues Implicating Religious Liberty.” On Oct. 6, 2017, the Attorney General issued a Memorandum for All Executive Departments and Agencies entitled Federal Law Protections for Religious Liberty. The memo directed components and United States Attorney’s Offices to use the guidance in litigation, advice to the Executive Branch, operations, grants, and all other aspects of the Department’s work. In order to ensure compliance with the Attorney General’s memo, the USAM will be updated with language that directs relevant Department of Justice components to: The updated USAM will also instruct relevant Justice Department components to consult the 20 religious liberty principles laid out in the Attorney General’s October 6 memo when considering whether the notice or approval requirements are initiated. In order to fully effectuate the approval and notice requirements in the updated USAM, the Department will instruct all U.S. Attorneys to designate a point of contact to lead these efforts for their office. “Religious liberty is an inalienable right protected by the Constitution, and defending it is one of the most important things we do at the Department of Justice,” said Associate Attorney General Rachel Brand. At President Trump's direction, Attorney General Sessions issued a robust and clear guidance document in October that clearly explains how the federal government is to apply the religious liberty protections currently on the books. The requirement that each of the U.S. Attorney offices designate a religious liberty point of contact will ensure that the Attorney General’s Memorandum is effectively implemented. The designees will be responsible for working directly with the leadership offices on civil cases related to religious liberty, ensuring that these cases receive the rigorous attention they deserve.",2018-01-31T00:00:00-05:00,Civil Rights,Civil Rights Division; Office of the Associate Attorney General,0.199,0.788,0.199,0.9944,today announc updat unit state attorney manual usam section titl associ general approv notic requir issu implic religi liberti general issu memorandum execut depart agenc entitl feder protect religi liberti memo direct compon unit state offic guidanc litig advic execut branch oper grant aspect work order ensur complianc general memo usam updat languag direct relev compon updat usam also instruct relev compon consult religi liberti principl laid general octob memo consid whether notic approv requir initi order fulli effectu approv notic requir updat usam instruct attorney design point contact lead effort religi liberti inalien right protect constitut defend import thing said associ general rachel brand presid trump direct general session issu robust clear guidanc document octob clear explain feder govern appli religi liberti protect current book requir offic design religi liberti point contact ensur general memorandum effect implement designe respons work direct leadership offic case relat religi liberti ensur case receiv rigor attent deserv
6733,16-1321,"Justice Department Reaches Agreement with City of Yonkers, New York, to Enhance Police Department Policies and Procedures","The Justice Department announced today that it has reached an agreement with the city of Yonkers, New York, and the Yonkers Police Department (YPD) to resolve the department’s investigation of YPD and ensure constitutional policing. The agreement is the result of the department’s investigation of YPD under the Violent Crime Control and Law Enforcement Act of 1994 and the Omnibus Crime Control and Safe Streets Act of 1968. In June 2009, the United States sent the city a technical assistance letter that identified necessary reforms to YPD practices and policies in the areas of use of force, civilian complaints, investigations, supervisory oversight and training. After receiving the department’s technical assistance letter, the city and YPD made substantial changes to its policies and procedures. This agreement implements and further improves those policies and procedures and addresses the department’s remaining concerns. “This agreement will ensure that the Yonkers Police Department continues to advance constitutional, effective and community-oriented policing,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “Through clear policy guidance, data analysis and accountability systems, we believe these reforms will make the entire community safer and strengthen public trust in the police.” “This agreement ensures that the Yonkers Police Department polices in a way that keeps its citizens safe, while protecting their constitutional rights,” said U.S. Attorney Preet Bharara of the Southern District of New York. “The measures put in place with this agreement, including clear and reasonable use-of-force policies and guidance on how to properly evaluate and respond to use-of-force incidents, will make Yonkers safer for citizens and police alike. We thank the Yonkers Police Department and the city of Yonkers for cooperating with our investigation, and for joining our effort to ensure that the Yonkers Police Department protects its citizens not only from physical harm, but also from violations of their constitutional rights.” The agreement is carefully tailored to address the department’s remaining concerns while also taking into account and seeking to build upon the positive reforms YPD has already made following the department’s investigation. Under the agreement, the YPD will, among other things: The agreement also provides that consultants retained by the department will conduct compliance reviews to ensure that YPD has implemented the measures required by the agreement and issue public reports of those compliance reviews. This case is being handled by the Civil Rights Division’s Special Litigation Section and the U.S. Attorney’s Office of the Southern District of New York. Yonkers Police Department Agreement",2016-11-14T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - New York, Southern",0.199,0.778,0.199,0.9950,announc today reach agreement citi yonker york yonker polic resolv ensur constitut polic agreement result violent crime control enforc omnibus crime control safe street june unit state sent citi technic letter identifi necessari reform practic polici area forc civilian complaint investig supervisori oversight train receiv technic letter citi made substanti chang polici procedur agreement implement improv polici procedur address remain concern agreement ensur yonker polic continu advanc constitut effect communiti orient polic said princip deputi general vanita gupta head clear polici guidanc data analysi account system believ reform make entir communiti safer strengthen public trust polic agreement ensur yonker polic polic keep citizen safe protect constitut said preet bharara southern york measur place agreement includ clear reason forc polici guidanc proper evalu respond forc incid make yonker safer citizen polic alik thank yonker polic citi yonker cooper join effort ensur yonker polic protect citizen physic harm also violat constitut agreement care tailor address remain concern also take account seek build upon posit reform alreadi made follow agreement among thing agreement also provid consult retain conduct complianc review ensur implement measur requir agreement issu public report complianc review handl special litig section southern york yonker polic agreement
6905,16-740,"Justice Department Reaches Settlement to Reform Criminal Justice System in Hinds County, Mississippi","The Justice Department today reached a landmark settlement agreement to reform the criminal justice system in Hinds County, Mississippi. The agreement resolves the department’s findings that the Hinds County Adult Detention Center and the Jackson City Detention Center – which together form the Hinds County Jail – failed to protect prisoners from violence and excessive force and held them past their court-ordered release dates, in violation of the Civil Rights of Institutionalized Persons Act (CRIPA). The settlement agreement is the first of its kind to incorporate broader criminal justice system reform through diversion at the front end and reentry to the community after incarceration. It creates a criminal justice coordinating committee that will help ensure the county’s systems operate effectively and efficiently, develop interventions to divert individuals in appropriate cases from arrest, detention and incarceration, and engage in community outreach. To promote successful reentry, the agreement includes mechanisms for notifying community health providers when a person with serious mental illness is released to help the person transition safely back to the community. The agreement also addresses unlawful enforcement of court-ordered fines and fees by ensuring that the county cannot incarcerate an individual for non-payment if the court does not first assess whether the individual is indigent. “Across the board, this settlement will make the Hinds County criminal justice system smarter and fairer,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “If implemented, these reforms will make pretrial detainees, prisoners, corrections staff and the entire community safer, while also ensuring that vulnerable individuals get access to the treatment, care and community services they need and deserve. We commend the county for its commitment to making these reforms a reality.” “For too long, the conditions in the Jail have posed a serious challenge to law enforcement and the safety of our community,” said U.S. Attorney Gregory K. Davis of the Southern District of Mississippi. “I appreciate the commitment made by Hinds County officials to turn the page and begin making necessary reforms.” The settlement agreement – subject to approval by the U.S. District Court of the Southern District of Mississippi – requires the county to implement a series of reforms across various stages of the criminal justice system, including the following: Together these reforms aim to improve communication and coordination among criminal justice entities and community service providers to help individuals with mental illness transition back to the community and to reduce recidivism. If approved by the federal district court, an independent monitor will be appointed to assess the county’s compliance. In May 2015, the Justice Department completed a comprehensive investigation – which included on-site inspections, document reviews and stakeholder interviews by department experts and staff – and issued a findings letter that determined that Hinds County Adult Detention Center and the Jackson City Detention Center violated CRIPA by failing to protect prisoners from violence by other prisoners and from improper use of force by staff. The department also found that inadequate staffing and training, a backlog in record filing and a lack of centralized information resulted in prisoners being held beyond court-ordered release dates. CRIPA authorizes the department to seek a remedy for a pattern or practice of conduct that violates the constitutional rights of persons confined in a jail, prison or other correctional facility. For more information on the Civil Rights Division’s work in this area, please visit www.justice.gov/crt. Hinds County Settlement Agreement Hinds County Fact Sheet",2016-06-23T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - Mississippi, Southern",0.205,0.659,0.205,0.9888,today reach landmark settlement agreement reform crimin system hind counti mississippi agreement resolv find hind counti adult detent center jackson citi detent center togeth form hind counti jail fail protect prison violenc excess forc held past court order releas date violat institution person cripa settlement agreement first kind incorpor broader crimin system reform divers front reentri communiti incarcer creat crimin coordin committe help ensur counti system oper effect effici develop intervent divert individu appropri case arrest detent incarcer engag communiti outreach promot success reentri agreement includ mechan notifi communiti health provid person serious mental ill releas help person transit safe back communiti agreement also address unlaw enforc court order fine fee ensur counti cannot incarcer individu payment court first assess whether individu indig across board settlement make hind counti crimin system smarter fairer said princip deputi general vanita gupta head implement reform make pretrial detaine prison correct staff entir communiti safer also ensur vulner individu access treatment care communiti servic need deserv commend counti commit make reform realiti long condit jail pose serious challeng enforc safeti communiti said gregori davi southern mississippi appreci commit made hind counti offici turn page begin make necessari reform settlement agreement subject approv court southern mississippi requir counti implement seri reform across various stage crimin system includ follow togeth reform improv communic coordin among crimin entiti communiti servic provid help individu mental ill transit back communiti reduc recidiv approv feder court independ monitor appoint assess counti complianc complet comprehens includ site inspect document review stakehold interview expert staff issu find letter determin hind counti adult detent center jackson citi detent center violat cripa fail protect prison violenc prison improp forc staff also found inadequ staf train backlog record file lack central inform result prison held beyond court order releas date cripa author seek remedi pattern practic conduct violat constitut person confin jail prison correct facil inform work area pleas visit hind counti settlement agreement hind counti fact sheet
11066,16-163,"Statement from Head of the Civil Rights Division Vanita Gupta Regarding Ferguson, Missouri, City Council Vote on Proposed Consent Decree","Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division, released the following statement regarding the Ferguson, Missouri, City Council vote on the proposed consent decree with the Department of Justice: “The Ferguson City Council has attempted to unilaterally amend the negotiated agreement. Their vote to do so creates an unnecessary delay in the essential work to bring constitutional policing to the city, and marks an unfortunate outcome for concerned community members and Ferguson police officers. Both parties engaged in thoughtful negotiations over many months to create an agreement with cost-effective remedies that would ensure Ferguson brings policing and court practices in line with the Constitution. The agreement already negotiated by the department and the city will provide Ferguson residents a police department and municipal court that fully respects civil rights and operates free from racial discrimination. “The Department of Justice will take the necessary legal actions to ensure that Ferguson’s policing and court practices comply with the Constitution and relevant federal laws.”",2016-02-10T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section,0.217,0.752,0.217,0.9811,princip deputi general vanita gupta head releas follow statement regard ferguson missouri citi council vote propos consent decre ferguson citi council attempt unilater amend negoti agreement vote creat unnecessari delay essenti work bring constitut polic citi mark unfortun outcom concern communiti member ferguson polic offic parti engag thought negoti mani month creat agreement cost effect remedi would ensur ferguson bring polic court practic line constitut agreement alreadi negoti citi provid ferguson resid polic municip court fulli respect oper free racial discrimin take necessari legal action ensur ferguson polic court practic compli constitut relev feder law


In [101]:
## your code showing the examples

doj_subset_wscore[(doj_subset_wscore['id'] == '16-718') | (doj_subset_wscore['id'] == '16-217')][['id', 'contents', 'processed_text']]


Unnamed: 0,id,contents,processed_text
11593,16-718,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",nine count indict unseal today mississippi correct offic charg beat inmat third charg help cover indict charg lawardrick marsher robert sturdiv offic mississippi state penitentiari parchman mississippi beat includ kick punch throw victim ground marsher sturdiv charg violat right convict prison free cruel unusu punish sturdiv also charg fail interven marsher punch beat indict alleg action involv danger weapon result bodili injuri victim third offic deont pate charg along marsher sturdiv conspir cover beat indict alleg three offic submit fals report three lie convict marsher sturdiv face maximum sentenc year prison excess forc charg three offic face five year prison conspiraci fals statement charg year prison fals report charg indict mere accus defend presum innoc unless proven guilti investig jackson cooper mississippi correct prosecut robert coleman northern mississippi dana mulhaus crimin section marsher indict
6727,16-217,"The Justice Department has reached a comprehensive settlement agreement with the city of Miami and the Miami Police Department (MPD) resolving the Justice Department’s investigation of officer-involved shootings by MPD officers, announced Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division and U.S. Attorney Wifredo A. Ferrer of the Southern District of Florida. The settlement, which was approved by Miami’s city commission today and will go into effect when the agreement is signed by all parties, resolves claims stemming from the Justice Department’s investigation into officer-involved shootings by MPD officers, which was conducted under the Violent Crime Control and Law Enforcement Act of 1994. The investigation’s findings, issued in July 2013, identified a pattern or practice of excessive use of force through officer-involved shootings in violation of the Fourth Amendment of the Constitution. The city’s compliance with the settlement will be monitored by an independent reviewer, former Tampa, Florida, Police Chief Jane Castor. Under the settlement agreement, the city will implement comprehensive reforms to ensure constitutional policing and support public trust. The settlement agreement is designed to minimize officer-involved shootings and to more effectively and quickly investigate officer-involved shootings that do occur, through measures that include: “This settlement represents a renewed commitment by the city of Miami and Chief Rodolfo Llanes to provide constitutional policing for Miami residents and to protect public safety through sustainable reform,” said Principal Deputy Assistant Attorney General Gupta. “The agreement will help to strengthen the relationship between the MPD and the communities they serve by improving accountability for officers who fire their weapons unlawfully, and provides for community participation in the enforcement of this agreement.” “Today's agreement is the result of a joint effort between the Department of Justice and the City of Miami to ensure that the Miami Police Department continues its efforts to make our community safe while protecting the sacred Constitutional rights of all of our citizens,” said U.S. Attorney Ferrer. “Through oversight and communication, the agreement seeks to make permanent the positive changes that former Chief Orosa and Chief Llanes have made, and we applaud the City Commission’s vote.” The settlement agreement builds upon important reforms implemented by the city since the Justice Department issued its findings, including: The investigation was conducted by attorneys and staff from the Civil Rights Division’s Special Litigation Section and the Civil Division of the U. S. Attorney’s Office of the Southern District of Florida.",reach comprehens settlement agreement citi miami miami polic resolv offic involv shoot offic announc princip deputi general vanita gupta head wifredo ferrer southern florida settlement approv miami citi commiss today effect agreement sign parti resolv claim stem offic involv shoot offic conduct violent crime control enforc find issu juli identifi pattern practic excess forc offic involv shoot violat fourth amend constitut citi complianc settlement monitor independ review former tampa florida polic chief jane castor settlement agreement citi implement comprehens reform ensur constitut polic support public trust settlement agreement design minim offic involv shoot effect quick investig offic involv shoot occur measur includ settlement repres renew commit citi miami chief rodolfo llane provid constitut polic miami resid protect public safeti sustain reform said princip deputi general gupta agreement help strengthen relationship communiti serv improv account offic fire weapon unlaw provid communiti particip enforc agreement today agreement result joint effort citi miami ensur miami polic continu effort make communiti safe protect sacr constitut citizen said ferrer oversight communic agreement seek make perman posit chang former chief orosa chief llane made applaud citi commiss vote settlement agreement build upon import reform implement citi sinc issu find includ conduct attorney staff special litig section southern florida


## 2.2 Create a document-term matrix from the preprocessed press releases and to explore top words (5 points)

A. Use the `create_dtm` function I provide (alternately, feel free to write your own!) and create a document-term matrix using the preprocessed press releases; make sure metadata contains the following columns: `id`, `compound` sentiment column you added, and the `topics_clean` column

B. Print the top 10 words for press releases with compound sentiment in the top 5% (so the most positive sentiment)

C. Print the top 10 words for press releases with compound sentiment in the bottom 5% (so the most negative sentiment)

**Hint**: for these, remember the pandas quantile function from pset one.  

D. Print the top 10 words for press releases in each of the three `topics_clean`

For steps B - D, to receive full credit, write a function `get_topwords` that helps you avoid duplicated code when you find top words for the different subsets of the data. There are different ways to structure it but one way is to feed it subsetted data (so data subsetted to one topic etc.) and for it to get the top words for that subset.


In [167]:

def create_dtm(list_of_strings, metadata):
    vectorizer = CountVectorizer(lowercase = True)
    dtm_sparse = vectorizer.fit_transform(list_of_strings)
    dtm_dense_named = pd.DataFrame(dtm_sparse.todense(), 
        columns=vectorizer.get_feature_names_out()) #had to edit from get_feature_names because version difference
    dtm_dense_named_withid = pd.concat([metadata.reset_index(), dtm_dense_named], axis = 1)
    return(dtm_dense_named_withid)

In [178]:
# your code here

doj_subset_wscore.rename(columns={'compound': 'Compound'}, inplace=True)

doj_subset_wscore.rename_axis('Index', inplace=True)

dtm_doj = create_dtm(list_of_strings= doj_subset_wscore.processed_text,
                metadata = 
                doj_subset_wscore[['id','Compound', 'topics_clean']])



In [179]:
list1 = dtm_doj.keys().to_list()

In [180]:
seen = set()
dupes = [x for x in list1 if x in seen or seen.add(x)]    
dupes

[]

In [204]:
def get_topwords(data, subset, decimal=None, specific_value=None):
    if decimal is not None:
        press = data[data[subset] > data[subset].quantile(decimal)]
    else:
        press = data
        
    if specific_value is not None:
        press = press[press[subset] == specific_value]
        
    top_terms = press[press.columns[4:]].sum(axis=0)
    return top_terms.sort_values(ascending=False).head(10)

In [205]:
#B

top_positive_words = get_topwords(data=dtm_doj, subset='Compound', decimal=0.95)
top_positive_words

agreement     175
enforc        121
state         118
ensur         110
disabl        107
communiti      97
court          91
servic         90
student        87
settlement     87
dtype: int64

In [207]:
#C

top_negative_words = get_topwords(data=dtm_doj, subset='Compound', decimal=0.05)
top_negative_words

child       1079
feder       1070
victim       996
prosecut     948
sentenc      904
said         896
state        881
charg        857
year         834
general      817
dtype: int64

In [203]:
dtm_doj.topics_clean.unique()

array(['Hate Crimes', 'Project Safe Childhood', 'Civil Rights'],
      dtype=object)

In [214]:
#D

top_hatecrime_words = get_topwords(data=dtm_doj, subset='topics_clean', specific_value='Hate Crimes')
top_hatecrime_words

top_childhood_words = get_topwords(data=dtm_doj, subset='topics_clean', specific_value='Project Safe Childhood')
top_childhood_words

top_civilrights_words = get_topwords(data=dtm_doj, subset='topics_clean', specific_value='Civil Rights')
top_civilrights_words

victim      591
crime       557
hate        524
defend      484
prosecut    478
charg       463
sentenc     455
american    451
feder       432
guilti      430
dtype: int64

child          1022
exploit         701
sexual          572
safe            479
childhood       474
project         472
pornographi     452
children        423
crimin          405
prosecut        374
dtype: int64

offic        637
hous         633
discrimin    616
enforc       544
disabl       532
said         497
feder        479
violat       477
state        452
court        414
dtype: int64

## 2.3 Estimate a topic model using those preprocessed words (5 points)

A. Going back to the preprocessed words from part 2.3.1, estimate a topic model with 3 topics, since you want to see if the unsupervised topic models recover different themes for each of the three manually-labeled areas (civil rights; hate crimes; project safe childhood). You have free rein over the other topic model parameters beyond the number of topics.

B. After estimating the topic model, print the top 15 words in each topic.

**Hints and Resources**:

- Same topic modeling resources linked to above
- Make sure to use the `random_state` argument within the model so that the numbering of topics does not move around between runs of your code

In [224]:
# A
## Step 1: re-tokenize and store in list using the preprocessed texts

text_raw_tokens = [wordpunct_tokenize(one_text) for one_text in 
                  doj_subset_wscore['processed_text']]


## Step 2: use gensim create dictionary - gets all unique words across documents
text_raw_dict = corpora.Dictionary(text_raw_tokens)
raw_len = len(text_raw_dict) # get length for comparison below

### explore first few keys and values
### see that key is just an arbitrary counter; value is the word itself
{k: text_raw_dict[k] for k in list(text_raw_dict)[:5]}


## Step 3: filter out very rare and very common words
## here, we are using the threshold that a word needs to appear in at least
## 5% of docs but not more than 95%
## this is an integer count of docs so we round
lower_bound = round(doj_subset_wscore.shape[0]*0.05)
upper_bound = round(doj_subset_wscore.shape[0]*0.95)

### apply filtering to dictionary
text_raw_dict.filter_extremes(no_below = lower_bound,
                             no_above = upper_bound)
print(f'Filtering out very rare and very common words reduced the \
length of dictionary from {str(raw_len)} to {str(len(text_raw_dict))}.')
{k: text_raw_dict[k] for k in list(text_raw_dict)[:5]} # show first five entries after filtering


## Step 4: apply dictionary to TOKENIZED texts
## this creates a mapping between each word 
## in a specific listing and the key in the dictionary.
## for words that remain in the filtered dictionary,
## output is a list where len(list) == n documents
## and each element in the list is a list of tuples
## containing the mappings
corpus_fromdict = [text_raw_dict.doc2bow(one_text) 
                   for one_text in text_raw_tokens]

{0: 'access', 1: 'accord', 2: 'act', 3: 'addit', 4: 'afraid'}

Filtering out very rare and very common words reduced the length of dictionary from 6866 to 623.


{0: 'access', 1: 'accord', 2: 'act', 3: 'addit', 4: 'african'}

In [245]:
## Step 5: estimate the model
## full documentation here - https://radimrehurek.com/gensim/models/ldamodel.html
## here, we're feed the lda function (1) the corpus we created from the dictionary
## (2) a parameter we decide on for the number of topics,
## (3) the dictionary itself,
## (4) parameter for number of passes through training data
## (5) parameter that returns, for each word remaining in dict, the 
## topic probabilities
## see documentation for many other arguments you can vary
num_topics = 3
ldamod = gensim.models.ldamodel.LdaModel(corpus_fromdict, 
                                         num_topics = num_topics, 
                                         id2word=text_raw_dict, 
                                         passes=6, 
                                         alpha = 'auto',
                                         per_word_topics = True, 
                                         random_state = 91988)

In [237]:
### print topics and words
topics = ldamod.print_topics(num_words = 15)
counter = 1
for topic in topics:
    print("Topic " + str(counter))
    print(topic)
    counter += 1

Topic 1
(0, '0.018*"discrimin" + 0.017*"hous" + 0.015*"disabl" + 0.011*"enforc" + 0.011*"agreement" + 0.011*"state" + 0.010*"court" + 0.010*"said" + 0.009*"alleg" + 0.009*"requir" + 0.009*"settlement" + 0.009*"feder" + 0.008*"fair" + 0.008*"inform" + 0.008*"violat"')
Topic 2
(1, '0.015*"victim" + 0.014*"charg" + 0.013*"prosecut" + 0.013*"sentenc" + 0.013*"defend" + 0.013*"feder" + 0.013*"crime" + 0.012*"said" + 0.012*"guilti" + 0.011*"hate" + 0.010*"year" + 0.010*"indict" + 0.010*"american" + 0.010*"investig" + 0.010*"prison"')
Topic 3
(2, '0.037*"child" + 0.025*"exploit" + 0.022*"sexual" + 0.017*"safe" + 0.017*"project" + 0.017*"childhood" + 0.016*"pornographi" + 0.015*"children" + 0.015*"crimin" + 0.014*"prosecut" + 0.013*"sentenc" + 0.013*"victim" + 0.011*"ceo" + 0.011*"year" + 0.011*"minor"')


In [222]:
import pyLDAvis.gensim_models as gensimvis
import pyLDAvis
pyLDAvis.enable_notebook()

In [223]:
lda_display = gensimvis.prepare(ldamod, corpus_fromdict, text_raw_dict)
pyLDAvis.display(lda_display)

## 2.4 Add topics back to main data and explore correlation between manual labels and our estimated topics (10 points)

A. Extract the document-level topic probabilities. Within `get_document_topics`, use the argument `minimum_probability` = 0 to make sure all 3 topic probabilities are returned. Write an assert statement to make sure the length of the list is equal to the number of rows in the `doj_subset_wscores` dataframe

B. Add the topic probabilities to the `doj_subset_wscores` dataframe as columns and create a column, `top_topic`, that reflects each document to its highest-probability topic (eg topic 1, 2, or 3)

C. For each of the manual labels in `topics_clean` (Hate Crime, Civil Rights, Project Safe Childhood), print the breakdown of the % of documents with each top topic (so, for instance, Hate Crime has 246 documents-- if 123 of those documents are coded to topic_1, that would be 50%; and so on). **Hint**: pd.crosstab and normalize may be helpful: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.crosstab.html

D. Using a couple press releases as examples, write a 1-2 sentence interpretation of why some of the manual topics map on more cleanly to an estimated topic than other manual topic(s)


In [241]:
topic_probs_bydoc = [ldamod.get_document_topics(item, minimum_probability=0) for item in corpus_fromdict]

In [243]:

# Write an assert statement to check the length of topic_probs_bydoc
assert len(topic_probs_bydoc) == len(doj_subset_wscore), "Length of topic probabilities and doj_subset_wscore DataFrame rows are not equal"

# Print a message if the assertion passes
print("Assertion passed: Length of topic probabilities equals the number of rows in doj_subset_wscore DataFrame.")


Assertion passed: Length of topic probabilities equals the number of rows in doj_subset_wscore DataFrame.


In [252]:
for topic_id in range(num_topics):
    col_name = "topic_" + str(topic_id + 1) + "_prob"
    doj_subset_wscore[col_name] = [doc_probs[topic_id][1] if len(doc_probs) > topic_id else 0 for doc_probs in topic_probs_bydoc]

In [253]:
# List to store the top topic for each document
top_topics = []

# Iterate through the topic probabilities for each document
for doc_probs in topic_probs_bydoc:
    # Find the topic with the highest probability
    top_topic = max(doc_probs, key=lambda x: x[1])[0]
    top_topics.append(top_topic)

# Top_topic column based on the highest-probability topic for each document
doj_subset_wscore['top_topic'] = top_topics

doj_subset_wscore

Unnamed: 0_level_0,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,Compound,processed_text,topic_1_prob,topic_2_prob,topic_3_prob,top_topic
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
5247,18-913,Grapevine Texas Man Pleads Guilty to Federal Hate Crime Against an African-American Family,"Glenn Eugene Halfin, 64, from Grapevine, Texas, appeared today before U.S. Magistrate Judge Jeffrey L. Cureton in the U.S. District Court for the Northern District of Texas and pleaded guilty to a federal charge of interfering with an African-American family’s housing rights, announced Acting Assistant Attorney General John Gore of the Civil Rights Division and U.S. Attorney Erin Nealy Cox of the Northern District of Texas. According to court documents, Halfin threatened force, intimidated, and interfered with a family because of their race and occupancy of an apartment that was located directly above his own apartment. According to documents filed in connection with the guilty plea, on Dec. 19, 2017, Halfin purchased a baby doll at a Wal-Mart in Grapevine, Texas. He took a rope, fashioned it into a noose, and hung the baby doll from the noose. Halfin then hung the rope noose and baby doll on the railing directly in front of the only staircase the family could use to access their apartment. Halfin did so, knowing that this display would be particularly intimidating for the family who had a young daughter. In addition, the defendant referenced in his factual basis repeated intimidation of and interference with the same African-American family on other occasions. “The Justice Department will not tolerate acts of intimidation and fear, or illegal threats against any individual or family because of their race,” said Acting Assistant Attorney John Gore. “We will continue to prosecute hate crime offenders.” “No one should be afraid to go home at night,” said U.S. Attorney Erin Nealy Cox. “Our community will not tolerate crimes of intimidation or bigotry, and my office will continue to prosecute all those who persecute others based on their race, color, ethnicity, or religious beliefs.” Halfin faces a statutory maximum penalty of no more one year in federal prison and a $100,000 fine. His sentencing is scheduled for October 24. This case was investigated by the FBI and the Grapevine Police Department. The case was prosecuted by Trial Attorney Rebekah Bailey of the Civil Rights Division’s Criminal Section and Assistant United States Attorney Nicole Dana.",2018-07-12T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.005,0.806,0.005,-0.9955,glenn eugen halfin grapevin texa appear today magistr judg jeffrey cureton court northern texa plead guilti feder charg interf african american famili hous announc act general john gore erin neali northern texa accord court document halfin threaten forc intimid interf famili race occup apart locat direct apart accord document file connect guilti plea halfin purchas babi doll mart grapevin texa took rope fashion noos hung babi doll noos halfin hung rope noos babi doll rail direct front staircas famili could access apart halfin know display would particular intimid famili young daughter addit defend referenc factual basi repeat intimid interfer african american famili occas toler act intimid fear illeg threat individu famili race said act john gore continu prosecut hate crime offend afraid home night said erin neali communiti toler crime intimid bigotri continu prosecut persecut other base race color ethnic religi belief halfin face statutori maximum penalti year feder prison fine sentenc schedul octob investig grapevin polic prosecut rebekah bailey crimin section unit state nicol dana,0.000479,0.999077,0.000444,1
8075,11-1245,Louisiana Man Convicted of Producing and Possessing Child Pornography,"WASHINGTON – Paul W. Miller, of Denham Springs, La., was convicted late yesterday of two counts of producing and one count of possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division and U.S. Attorney Donald J. Cazayoux Jr. of the Middle District of Louisiana. Miller, 44, was convicted by a federal jury following a two-day trial. U.S. District Judge James J. Brady presided over the trial. Evidence presented at trial showed that from October 2007 to May 2008, Miller sexually abused a 12-year-old girl and an 11-year-old girl and produced numerous photographs of the abuse. According to trial evidence, forensic examination of Miller’s computer revealed that Miller had used his computer to print and possess numerous images of child pornography, including both the images of child pornography he had produced and images of other child victims. Miller faces a maximum statutory sentence of 30 years in prison for each count of production of child pornography and 10 years in prison for the possession of child pornography count. The case is being prosecuted by Trial Attorney Alecia Riewerts Wolak of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Richard L. Bourgeois Jr. of the Middle District of Louisiana. The investigation was conducted by the FBI, the Denham Springs Police Department and the Louisiana Attorney General’s Office.",2011-09-22T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.007,0.887,0.007,-0.9565,washington paul miller denham spring convict late yesterday count produc count possess child pornographi announc general lanni breuer crimin donald cazayoux middl louisiana miller convict feder juri follow judg jame bradi presid evid present show octob miller sexual abus year girl year girl produc numer photograph abus accord evid forens examin miller comput reveal miller use comput print possess numer imag child pornographi includ imag child pornographi produc imag child victim miller face maximum statutori sentenc year prison count product child pornographi year prison possess child pornographi count prosecut alecia riewert wolak crimin child exploit obscen section ceo richard bourgeoi middl louisiana conduct denham spring polic louisiana general,0.000628,0.001000,0.998371,2
8353,11-1331,Massachusetts Man Pleads Guilty to Receiving and Possessing Child Pornography,"WASHINGTON – A Springfield, Mass., man pleaded guilty today to receiving and possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division. Robert Rosenbeck, 48, pleaded guilty before U.S. District Judge Denise J. Casper in Boston to one count of receipt of child pornography and two counts of possession of child pornography. He was indicted on those charges on Dec. 10, 2009. According to court documents, Rosenbeck possessed two different computers containing child pornography in 2007. Additionally, from approximately July 22, 2007, to July 25, 2007, Rosenbeck received computer files containing child pornography from an Internet website. At sentencing, scheduled for Dec. 5, 2011, Rosenbeck faces a maximum statutory sentence of 20 years in prison for the receipt of child pornography count and 10 years in prison for each count of possession of child pornography. Rosenbeck also faces a term of supervised release of at least five years and up to life. The case is being prosecuted by Trial Attorneys Alecia Riewerts Wolak and Michael W. Grant of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS). The investigation was conducted by the FBI with assistance provided by the Springfield Police Department.",2011-10-06T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.008,0.874,0.008,-0.9451,washington springfield mass plead guilti today receiv possess child pornographi announc general lanni breuer crimin robert rosenbeck plead guilti judg denis casper boston count receipt child pornographi count possess child pornographi indict charg accord court document rosenbeck possess differ comput contain child pornographi addit approxim juli juli rosenbeck receiv comput file contain child pornographi internet websit sentenc schedul rosenbeck face maximum statutori sentenc year prison receipt child pornographi count year prison count possess child pornographi rosenbeck also face term supervis releas least five year life prosecut attorney alecia riewert wolak michael grant crimin child exploit obscen section ceo conduct provid springfield polic,0.000636,0.001012,0.998352,2
7618,17-242,"Justice Department Sues Edmonds, Washington Landlords for Discriminating Against Families With Children","The U.S. Department of Justice today filed a lawsuit in U.S. District Court for the Western District of Washington alleging that the owners and manager of three Edmonds, Washington apartment buildings refused to rent their apartments to families with children, in violation of the Fair Housing Act. “The Fair Housing Act prohibits landlords from denying apartments to families just because they have children,” said Acting Assistant Attorney General Tom Wheeler of the Justice Department’s Civil Rights Division. “Many families already face challenges finding affordable housing, and they should not also have to deal with unlawful discrimination.” “Equal access to housing is essential for all Americans, including families with young children,” said U.S. Attorney Annette L. Hayes of the Western District of Washington. “Particularly in our tight housing market, landlords must follow the law and make units available without discrimination based on race, color, religion, sex, national origin, disability or familial status.” The complaint concerns three apartment buildings – located at 201 5th Ave. N., 621 5th Ave. S., and 401 Pine Street in Edmonds – that are managed by defendant Debbie A. Appleby, of Stanwood, Washington. The properties are owned by three Limited Liability Corporations (LLCs) controlled by Appleby – Apple One, LLC, Apple Two, LLC, and Apple Three, LLC—which are also named as defendants in the suit. The complaint alleges that in March 2014, defendant Appleby told a woman seeking an apartment for herself, her husband, and their one year old child that the apartment buildings were “adult only” and therefore not available to her family. The complaint also alleges that at various other times from April 2014 to November 2015, defendants advertised their available apartments as being restricted to adults only. The family filed a complaint with the Department of Housing and Urban Development (“HUD”) which conducted an investigation, issued a charge of discrimination against the defendants, and referred the case to the Department of Justice. The complaint seeks a court order requiring defendants to cease their discriminatory housing practices, damages for the family that filed the HUD complaint and any other families against whom the defendants discriminated against because they had children, and civil penalties. Any individuals who have information relevant to this case are encouraged to contact the Civil Rights Division at 1-800-896-7743, Option 96. The federal Fair Housing Act prohibits discrimination in housing on the basis of race, color, religion, sex, familial status, national origin and disability. More information about the Civil Rights Division and the civil rights laws it enforces is available at www.usdoj.gov/crt and https://www.justice.gov/usao-wdwa/civil-rights. Individuals who believe that they have been victims of housing discrimination may call the Justice Department at 1-800-896-7743, email the Justice Department at fairhousing@usdoj.gov, or contact HUD at 1-800-669-9777 or through its website at www.hud.gov. The case is being jointly handled by the Department’s Civil Rights Division and the U.S. Attorney’s Office for the Western District of Washington. The complaint is an allegation of unlawful conduct. The allegations must still be proven in federal court.",2017-03-03T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Housing and Civil Enforcement Section; USAO - Washington, Western",0.009,0.912,0.009,-0.9753,today file lawsuit court western washington alleg owner manag three edmond washington apart build refus rent apart famili children violat fair hous fair hous prohibit landlord deni apart famili children said act general wheeler mani famili alreadi face challeng find afford hous also deal unlaw discrimin equal access hous essenti american includ famili young children said annett hay western washington particular tight hous market landlord must follow make unit avail without discrimin base race color religion nation origin disabl famili status complaint concern three apart build locat pine street edmond manag defend debbi applebi stanwood washington properti own three limit liabil corpor llcs control applebi appl appl appl three also name defend suit complaint alleg march defend applebi told woman seek apart husband year child apart build adult therefor avail famili complaint also alleg various time april novemb defend advertis avail apart restrict adult famili file complaint hous urban develop conduct issu charg discrimin defend refer complaint seek court order requir defend ceas discriminatori hous practic damag famili file complaint famili defend discrimin children penalti individu inform relev encourag contact option feder fair hous prohibit discrimin hous basi race color religion famili status nation origin disabl inform law enforc avail usdoj https usao wdwa individu believ victim hous discrimin call email fairhous usdoj contact websit joint handl western washington complaint alleg unlaw conduct alleg must still proven feder court,0.999255,0.000471,0.000274,0
8847,17-550,Mother Sentenced to 26 Months in Prison for Taking Child from Illinois to Canada in International Parental Kidnapping Case,"A Canadian woman was sentenced to serve 26 months in prison following her December conviction for international parental kidnapping, announced Acting Assistant Attorney General Kenneth A. Blanco of the Justice Department’s Criminal Division and Acting U.S. Attorney Patrick D. Hansen of the Central District of Illinois. Sarah M. Nixon, 48, of Montreal, Canada, was sentenced before U.S. District Judge Colin S. Bruce of the Central District of Illinois. On Dec. 21, 2016, a federal jury found Nixon guilty of one count of international parental kidnapping for taking her minor child from the United States to Canada in July 2015, with the intent to obstruct the lawful exercise of the father’s rights. Evidence at trial established that after a custody trial where it was apparent that Nixon would lose custody of her six-year-old daughter, Nixon fled the United States with the child in the middle of the night. When she did not appear for the custody ruling and neither she nor her daughter could be located, law enforcement issued a child abduction alert. Nixon and the child were eventually located in a farmhouse in rural Ontario, Canada. Authorities then returned the child to the father. Nixon was arrested in New York on Sept. 20, 2015, as she attempted to return to the United States. Trial Attorneys Elly M. Peirson and Lauren S. Kupersmith of the Criminal Division’s Child Exploitation and Obscenity Section prosecuted the case. The FBI; Urbana, Illinois, Police Department; University of Illinois Police Department; Illinois Department of Children and Family Services; Ontario Provincial Police; and U.S. Customs and Border Protection investigated the case, with assistance from the Champaign County, Illinois, State’s Attorney’s Office and the Criminal Division’s Office of International Affairs.",2017-05-19T00:00:00-04:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Illinois, Central",0.010,0.894,0.010,-0.9460,canadian woman sentenc serv month prison follow decemb convict intern parent kidnap announc act general kenneth blanco crimin act patrick hansen central illinoi sarah nixon montreal canada sentenc judg colin bruce central illinoi feder juri found nixon guilti count intern parent kidnap take minor child unit state canada juli intent obstruct law exercis father evid establish custodi appar nixon would lose custodi year daughter nixon fled unit state child middl night appear custodi rule neither daughter could locat enforc issu child abduct alert nixon child eventu locat farmhous rural ontario canada author return child father nixon arrest york sept attempt return unit state attorney elli peirson lauren kupersmith crimin child exploit obscen section prosecut urbana illinoi polic univers illinoi polic illinoi children famili servic ontario provinci polic custom border protect investig champaign counti illinoi state crimin intern affair,0.000651,0.214084,0.785265,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6089,18-119,Justice Department Announces Religious Liberty Update to U.S. Attorneys’ Manual and Directs the Designation of Religious Liberty Point of Contact for All U.S. Attorney's Offices,"The Department of Justice today announced the update of the United States Attorneys’ Manual (USAM) with a new section titled, “Associate Attorney General’s Approval and Notice Requirements for Issues Implicating Religious Liberty.” On Oct. 6, 2017, the Attorney General issued a Memorandum for All Executive Departments and Agencies entitled Federal Law Protections for Religious Liberty. The memo directed components and United States Attorney’s Offices to use the guidance in litigation, advice to the Executive Branch, operations, grants, and all other aspects of the Department’s work. In order to ensure compliance with the Attorney General’s memo, the USAM will be updated with language that directs relevant Department of Justice components to: The updated USAM will also instruct relevant Justice Department components to consult the 20 religious liberty principles laid out in the Attorney General’s October 6 memo when considering whether the notice or approval requirements are initiated. In order to fully effectuate the approval and notice requirements in the updated USAM, the Department will instruct all U.S. Attorneys to designate a point of contact to lead these efforts for their office. “Religious liberty is an inalienable right protected by the Constitution, and defending it is one of the most important things we do at the Department of Justice,” said Associate Attorney General Rachel Brand. At President Trump's direction, Attorney General Sessions issued a robust and clear guidance document in October that clearly explains how the federal government is to apply the religious liberty protections currently on the books. The requirement that each of the U.S. Attorney offices designate a religious liberty point of contact will ensure that the Attorney General’s Memorandum is effectively implemented. The designees will be responsible for working directly with the leadership offices on civil cases related to religious liberty, ensuring that these cases receive the rigorous attention they deserve.",2018-01-31T00:00:00-05:00,Civil Rights,Civil Rights Division; Office of the Associate Attorney General,0.199,0.788,0.199,0.9944,today announc updat unit state attorney manual usam section titl associ general approv notic requir issu implic religi liberti general issu memorandum execut depart agenc entitl feder protect religi liberti memo direct compon unit state offic guidanc litig advic execut branch oper grant aspect work order ensur complianc general memo usam updat languag direct relev compon updat usam also instruct relev compon consult religi liberti principl laid general octob memo consid whether notic approv requir initi order fulli effectu approv notic requir updat usam instruct attorney design point contact lead effort religi liberti inalien right protect constitut defend import thing said associ general rachel brand presid trump direct general session issu robust clear guidanc document octob clear explain feder govern appli religi liberti protect current book requir offic design religi liberti point contact ensur general memorandum effect implement designe respons work direct leadership offic case relat religi liberti ensur case receiv rigor attent deserv,0.998535,0.000925,0.000540,0
6733,16-1321,"Justice Department Reaches Agreement with City of Yonkers, New York, to Enhance Police Department Policies and Procedures","The Justice Department announced today that it has reached an agreement with the city of Yonkers, New York, and the Yonkers Police Department (YPD) to resolve the department’s investigation of YPD and ensure constitutional policing. The agreement is the result of the department’s investigation of YPD under the Violent Crime Control and Law Enforcement Act of 1994 and the Omnibus Crime Control and Safe Streets Act of 1968. In June 2009, the United States sent the city a technical assistance letter that identified necessary reforms to YPD practices and policies in the areas of use of force, civilian complaints, investigations, supervisory oversight and training. After receiving the department’s technical assistance letter, the city and YPD made substantial changes to its policies and procedures. This agreement implements and further improves those policies and procedures and addresses the department’s remaining concerns. “This agreement will ensure that the Yonkers Police Department continues to advance constitutional, effective and community-oriented policing,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “Through clear policy guidance, data analysis and accountability systems, we believe these reforms will make the entire community safer and strengthen public trust in the police.” “This agreement ensures that the Yonkers Police Department polices in a way that keeps its citizens safe, while protecting their constitutional rights,” said U.S. Attorney Preet Bharara of the Southern District of New York. “The measures put in place with this agreement, including clear and reasonable use-of-force policies and guidance on how to properly evaluate and respond to use-of-force incidents, will make Yonkers safer for citizens and police alike. We thank the Yonkers Police Department and the city of Yonkers for cooperating with our investigation, and for joining our effort to ensure that the Yonkers Police Department protects its citizens not only from physical harm, but also from violations of their constitutional rights.” The agreement is carefully tailored to address the department’s remaining concerns while also taking into account and seeking to build upon the positive reforms YPD has already made following the department’s investigation. Under the agreement, the YPD will, among other things: The agreement also provides that consultants retained by the department will conduct compliance reviews to ensure that YPD has implemented the measures required by the agreement and issue public reports of those compliance reviews. This case is being handled by the Civil Rights Division’s Special Litigation Section and the U.S. Attorney’s Office of the Southern District of New York. Yonkers Police Department Agreement",2016-11-14T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - New York, Southern",0.199,0.778,0.199,0.9950,announc today reach agreement citi yonker york yonker polic resolv ensur constitut polic agreement result violent crime control enforc omnibus crime control safe street june unit state sent citi technic letter identifi necessari reform practic polici area forc civilian complaint investig supervisori oversight train receiv technic letter citi made substanti chang polici procedur agreement implement improv polici procedur address remain concern agreement ensur yonker polic continu advanc constitut effect communiti orient polic said princip deputi general vanita gupta head clear polici guidanc data analysi account system believ reform make entir communiti safer strengthen public trust polic agreement ensur yonker polic polic keep citizen safe protect constitut said preet bharara southern york measur place agreement includ clear reason forc polici guidanc proper evalu respond forc incid make yonker safer citizen polic alik thank yonker polic citi yonker cooper join effort ensur yonker polic protect citizen physic harm also violat constitut agreement care tailor address remain concern also take account seek build upon posit reform alreadi made follow agreement among thing agreement also provid consult retain conduct complianc review ensur implement measur requir agreement issu public report complianc review handl special litig section southern york yonker polic agreement,0.999066,0.000590,0.000344,0
6905,16-740,"Justice Department Reaches Settlement to Reform Criminal Justice System in Hinds County, Mississippi","The Justice Department today reached a landmark settlement agreement to reform the criminal justice system in Hinds County, Mississippi. The agreement resolves the department’s findings that the Hinds County Adult Detention Center and the Jackson City Detention Center – which together form the Hinds County Jail – failed to protect prisoners from violence and excessive force and held them past their court-ordered release dates, in violation of the Civil Rights of Institutionalized Persons Act (CRIPA). The settlement agreement is the first of its kind to incorporate broader criminal justice system reform through diversion at the front end and reentry to the community after incarceration. It creates a criminal justice coordinating committee that will help ensure the county’s systems operate effectively and efficiently, develop interventions to divert individuals in appropriate cases from arrest, detention and incarceration, and engage in community outreach. To promote successful reentry, the agreement includes mechanisms for notifying community health providers when a person with serious mental illness is released to help the person transition safely back to the community. The agreement also addresses unlawful enforcement of court-ordered fines and fees by ensuring that the county cannot incarcerate an individual for non-payment if the court does not first assess whether the individual is indigent. “Across the board, this settlement will make the Hinds County criminal justice system smarter and fairer,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “If implemented, these reforms will make pretrial detainees, prisoners, corrections staff and the entire community safer, while also ensuring that vulnerable individuals get access to the treatment, care and community services they need and deserve. We commend the county for its commitment to making these reforms a reality.” “For too long, the conditions in the Jail have posed a serious challenge to law enforcement and the safety of our community,” said U.S. Attorney Gregory K. Davis of the Southern District of Mississippi. “I appreciate the commitment made by Hinds County officials to turn the page and begin making necessary reforms.” The settlement agreement – subject to approval by the U.S. District Court of the Southern District of Mississippi – requires the county to implement a series of reforms across various stages of the criminal justice system, including the following: Together these reforms aim to improve communication and coordination among criminal justice entities and community service providers to help individuals with mental illness transition back to the community and to reduce recidivism. If approved by the federal district court, an independent monitor will be appointed to assess the county’s compliance. In May 2015, the Justice Department completed a comprehensive investigation – which included on-site inspections, document reviews and stakeholder interviews by department experts and staff – and issued a findings letter that determined that Hinds County Adult Detention Center and the Jackson City Detention Center violated CRIPA by failing to protect prisoners from violence by other prisoners and from improper use of force by staff. The department also found that inadequate staffing and training, a backlog in record filing and a lack of centralized information resulted in prisoners being held beyond court-ordered release dates. CRIPA authorizes the department to seek a remedy for a pattern or practice of conduct that violates the constitutional rights of persons confined in a jail, prison or other correctional facility. For more information on the Civil Rights Division’s work in this area, please visit www.justice.gov/crt. Hinds County Settlement Agreement Hinds County Fact Sheet",2016-06-23T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - Mississippi, Southern",0.205,0.659,0.205,0.9888,today reach landmark settlement agreement reform crimin system hind counti mississippi agreement resolv find hind counti adult detent center jackson citi detent center togeth form hind counti jail fail protect prison violenc excess forc held past court order releas date violat institution person cripa settlement agreement first kind incorpor broader crimin system reform divers front reentri communiti incarcer creat crimin coordin committe help ensur counti system oper effect effici develop intervent divert individu appropri case arrest detent incarcer engag communiti outreach promot success reentri agreement includ mechan notifi communiti health provid person serious mental ill releas help person transit safe back communiti agreement also address unlaw enforc court order fine fee ensur counti cannot incarcer individu payment court first assess whether individu indig across board settlement make hind counti crimin system smarter fairer said princip deputi general vanita gupta head implement reform make pretrial detaine prison correct staff entir communiti safer also ensur vulner individu access treatment care communiti servic need deserv commend counti commit make reform realiti long condit jail pose serious challeng enforc safeti communiti said gregori davi southern mississippi appreci commit made hind counti offici turn page begin make necessari reform settlement agreement subject approv court southern mississippi requir counti implement seri reform across various stage crimin system includ follow togeth reform improv communic coordin among crimin entiti communiti servic provid help individu mental ill transit back communiti reduc recidiv approv feder court independ monitor appoint assess counti complianc complet comprehens includ site inspect document review stakehold interview expert staff issu find letter determin hind counti adult detent center jackson citi detent center violat cripa fail protect prison violenc prison improp forc staff also found inadequ staf train backlog record file lack central inform result prison held beyond court order releas date cripa author seek remedi pattern practic conduct violat constitut person confin jail prison correct facil inform work area pleas visit hind counti settlement agreement hind counti fact sheet,0.887984,0.111796,0.000220,0
11066,16-163,"Statement from Head of the Civil Rights Division Vanita Gupta Regarding Ferguson, Missouri, City Council Vote on Proposed Consent Decree","Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division, released the following statement regarding the Ferguson, Missouri, City Council vote on the proposed consent decree with the Department of Justice: “The Ferguson City Council has attempted to unilaterally amend the negotiated agreement. Their vote to do so creates an unnecessary delay in the essential work to bring constitutional policing to the city, and marks an unfortunate outcome for concerned community members and Ferguson police officers. Both parties engaged in thoughtful negotiations over many months to create an agreement with cost-effective remedies that would ensure Ferguson brings policing and court practices in line with the Constitution. The agreement already negotiated by the department and the city will provide Ferguson residents a police department and municipal court that fully respects civil rights and operates free from racial discrimination. “The Department of Justice will take the necessary legal actions to ensure that Ferguson’s policing and court practices comply with the Constitution and relevant federal laws.”",2016-02-10T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section,0.217,0.752,0.217,0.9811,princip deputi general vanita gupta head releas follow statement regard ferguson missouri citi council vote propos consent decre ferguson citi council attempt unilater amend negoti agreement vote creat unnecessari delay essenti work bring constitut polic citi mark unfortun outcom concern communiti member ferguson polic offic parti engag thought negoti mani month creat agreement cost effect remedi would ensur ferguson bring polic court practic line constitut agreement alreadi negoti citi provid ferguson resid polic municip court fulli respect oper free racial discrimin take necessari legal action ensur ferguson polic court practic compli constitut relev feder law,0.997998,0.001264,0.000737,0


In [None]:
## your code here to add those topic probabilities to the dataframe

In [None]:
## your code here to summarize the topic proportions for each of the topics_clean 

In [254]:
#C
pd.crosstab(doj_subset_wscore['topics_clean'], doj_subset_wscore['top_topic'], normalize='index')

top_topic,0,1,2
topics_clean,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Civil Rights,0.659016,0.340984,0.0
Hate Crimes,0.0,1.0,0.0
Project Safe Childhood,0.006024,0.0,0.993976


**write interpretation**

# 3. Extend the analysis from unigrams to bigrams (10 points)

In the previous question, you found top words via a unigram representation of the text. Now, we want to see how those top words change with bigrams (pairs of words)

A. Using the `doj_subset_wscore` data and the `processed_text` column (so the words after stemming/other preprocessing), create a column in the data called `processed_text_bigrams` that combines each consecutive pairs of word into a bigram separated by an underscore. Eg:

"depart reach settlem" would become "depart_reach reach_settlem"

Do this by writing a function `create_bigram_onedoc` that takes in a single `processed_text` string and returns a string with its bigrams structured similarly to above example
 
**Hint**: there are many ways to solve but `zip` may be helpful: https://stackoverflow.com/questions/21303224/iterate-over-all-pairs-of-consecutive-items-in-a-list

B. Print the `id`, `processed_text`, and `processed_text_bigram` columns for press release with id = 16-217

In [260]:
doj_subset_wscore.columns

Index(['id', 'title', 'contents', 'date', 'topics_clean', 'components_clean',
       'neg', 'neu', 'pos', 'Compound', 'processed_text', 'topic_1_prob',
       'topic_2_prob', 'topic_3_prob', 'top_topic'],
      dtype='object')

In [263]:
bigrams = []

def create_bigram_onedoc(processed_text):
    tokens = processed_text.split()
    
    for first, second in zip(tokens, tokens[1:]):
        string = first + "_" + second
        bigrams.append(string)
    
    return bigrams

text_to_process = doj_subset_wscore.iloc[1]['processed_text']
create_bigram_onedoc(text_to_process)


['washington_paul',
 'paul_miller',
 'miller_denham',
 'denham_spring',
 'spring_convict',
 'convict_late',
 'late_yesterday',
 'yesterday_count',
 'count_produc',
 'produc_count',
 'count_possess',
 'possess_child',
 'child_pornographi',
 'pornographi_announc',
 'announc_general',
 'general_lanni',
 'lanni_breuer',
 'breuer_crimin',
 'crimin_donald',
 'donald_cazayoux',
 'cazayoux_middl',
 'middl_louisiana',
 'louisiana_miller',
 'miller_convict',
 'convict_feder',
 'feder_juri',
 'juri_follow',
 'follow_judg',
 'judg_jame',
 'jame_bradi',
 'bradi_presid',
 'presid_evid',
 'evid_present',
 'present_show',
 'show_octob',
 'octob_miller',
 'miller_sexual',
 'sexual_abus',
 'abus_year',
 'year_girl',
 'girl_year',
 'year_girl',
 'girl_produc',
 'produc_numer',
 'numer_photograph',
 'photograph_abus',
 'abus_accord',
 'accord_evid',
 'evid_forens',
 'forens_examin',
 'examin_miller',
 'miller_comput',
 'comput_reveal',
 'reveal_miller',
 'miller_use',
 'use_comput',
 'comput_print',
 'pri

In [266]:
doj_subset_wscore['processed_text_bigrams'] = doj_subset_wscore['processed_text'].apply(create_bigram_onedoc)
doj_subset_wscore

#for some reason, it's applying the function and matching to the wrong row

Unnamed: 0_level_0,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,Compound,processed_text,topic_1_prob,topic_2_prob,topic_3_prob,top_topic,processed_text_bigrams
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
5247,18-913,Grapevine Texas Man Pleads Guilty to Federal Hate Crime Against an African-American Family,"Glenn Eugene Halfin, 64, from Grapevine, Texas, appeared today before U.S. Magistrate Judge Jeffrey L. Cureton in the U.S. District Court for the Northern District of Texas and pleaded guilty to a federal charge of interfering with an African-American family’s housing rights, announced Acting Assistant Attorney General John Gore of the Civil Rights Division and U.S. Attorney Erin Nealy Cox of the Northern District of Texas. According to court documents, Halfin threatened force, intimidated, and interfered with a family because of their race and occupancy of an apartment that was located directly above his own apartment. According to documents filed in connection with the guilty plea, on Dec. 19, 2017, Halfin purchased a baby doll at a Wal-Mart in Grapevine, Texas. He took a rope, fashioned it into a noose, and hung the baby doll from the noose. Halfin then hung the rope noose and baby doll on the railing directly in front of the only staircase the family could use to access their apartment. Halfin did so, knowing that this display would be particularly intimidating for the family who had a young daughter. In addition, the defendant referenced in his factual basis repeated intimidation of and interference with the same African-American family on other occasions. “The Justice Department will not tolerate acts of intimidation and fear, or illegal threats against any individual or family because of their race,” said Acting Assistant Attorney John Gore. “We will continue to prosecute hate crime offenders.” “No one should be afraid to go home at night,” said U.S. Attorney Erin Nealy Cox. “Our community will not tolerate crimes of intimidation or bigotry, and my office will continue to prosecute all those who persecute others based on their race, color, ethnicity, or religious beliefs.” Halfin faces a statutory maximum penalty of no more one year in federal prison and a $100,000 fine. His sentencing is scheduled for October 24. This case was investigated by the FBI and the Grapevine Police Department. The case was prosecuted by Trial Attorney Rebekah Bailey of the Civil Rights Division’s Criminal Section and Assistant United States Attorney Nicole Dana.",2018-07-12T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.005,0.806,0.005,-0.9955,glenn eugen halfin grapevin texa appear today magistr judg jeffrey cureton court northern texa plead guilti feder charg interf african american famili hous announc act general john gore erin neali northern texa accord court document halfin threaten forc intimid interf famili race occup apart locat direct apart accord document file connect guilti plea halfin purchas babi doll mart grapevin texa took rope fashion noos hung babi doll noos halfin hung rope noos babi doll rail direct front staircas famili could access apart halfin know display would particular intimid famili young daughter addit defend referenc factual basi repeat intimid interfer african american famili occas toler act intimid fear illeg threat individu famili race said act john gore continu prosecut hate crime offend afraid home night said erin neali communiti toler crime intimid bigotri continu prosecut persecut other base race color ethnic religi belief halfin face statutori maximum penalti year feder prison fine sentenc schedul octob investig grapevin polic prosecut rebekah bailey crimin section unit state nicol dana,0.000479,0.999077,0.000444,1,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
8075,11-1245,Louisiana Man Convicted of Producing and Possessing Child Pornography,"WASHINGTON – Paul W. Miller, of Denham Springs, La., was convicted late yesterday of two counts of producing and one count of possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division and U.S. Attorney Donald J. Cazayoux Jr. of the Middle District of Louisiana. Miller, 44, was convicted by a federal jury following a two-day trial. U.S. District Judge James J. Brady presided over the trial. Evidence presented at trial showed that from October 2007 to May 2008, Miller sexually abused a 12-year-old girl and an 11-year-old girl and produced numerous photographs of the abuse. According to trial evidence, forensic examination of Miller’s computer revealed that Miller had used his computer to print and possess numerous images of child pornography, including both the images of child pornography he had produced and images of other child victims. Miller faces a maximum statutory sentence of 30 years in prison for each count of production of child pornography and 10 years in prison for the possession of child pornography count. The case is being prosecuted by Trial Attorney Alecia Riewerts Wolak of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Richard L. Bourgeois Jr. of the Middle District of Louisiana. The investigation was conducted by the FBI, the Denham Springs Police Department and the Louisiana Attorney General’s Office.",2011-09-22T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.007,0.887,0.007,-0.9565,washington paul miller denham spring convict late yesterday count produc count possess child pornographi announc general lanni breuer crimin donald cazayoux middl louisiana miller convict feder juri follow judg jame bradi presid evid present show octob miller sexual abus year girl year girl produc numer photograph abus accord evid forens examin miller comput reveal miller use comput print possess numer imag child pornographi includ imag child pornographi produc imag child victim miller face maximum statutori sentenc year prison count product child pornographi year prison possess child pornographi count prosecut alecia riewert wolak crimin child exploit obscen section ceo richard bourgeoi middl louisiana conduct denham spring polic louisiana general,0.000628,0.001000,0.998371,2,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
8353,11-1331,Massachusetts Man Pleads Guilty to Receiving and Possessing Child Pornography,"WASHINGTON – A Springfield, Mass., man pleaded guilty today to receiving and possessing child pornography, announced Assistant Attorney General Lanny A. Breuer of the Criminal Division. Robert Rosenbeck, 48, pleaded guilty before U.S. District Judge Denise J. Casper in Boston to one count of receipt of child pornography and two counts of possession of child pornography. He was indicted on those charges on Dec. 10, 2009. According to court documents, Rosenbeck possessed two different computers containing child pornography in 2007. Additionally, from approximately July 22, 2007, to July 25, 2007, Rosenbeck received computer files containing child pornography from an Internet website. At sentencing, scheduled for Dec. 5, 2011, Rosenbeck faces a maximum statutory sentence of 20 years in prison for the receipt of child pornography count and 10 years in prison for each count of possession of child pornography. Rosenbeck also faces a term of supervised release of at least five years and up to life. The case is being prosecuted by Trial Attorneys Alecia Riewerts Wolak and Michael W. Grant of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS). The investigation was conducted by the FBI with assistance provided by the Springfield Police Department.",2011-10-06T00:00:00-04:00,Project Safe Childhood,Criminal Division,0.008,0.874,0.008,-0.9451,washington springfield mass plead guilti today receiv possess child pornographi announc general lanni breuer crimin robert rosenbeck plead guilti judg denis casper boston count receipt child pornographi count possess child pornographi indict charg accord court document rosenbeck possess differ comput contain child pornographi addit approxim juli juli rosenbeck receiv comput file contain child pornographi internet websit sentenc schedul rosenbeck face maximum statutori sentenc year prison receipt child pornographi count year prison count possess child pornographi rosenbeck also face term supervis releas least five year life prosecut attorney alecia riewert wolak michael grant crimin child exploit obscen section ceo conduct provid springfield polic,0.000636,0.001012,0.998352,2,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
7618,17-242,"Justice Department Sues Edmonds, Washington Landlords for Discriminating Against Families With Children","The U.S. Department of Justice today filed a lawsuit in U.S. District Court for the Western District of Washington alleging that the owners and manager of three Edmonds, Washington apartment buildings refused to rent their apartments to families with children, in violation of the Fair Housing Act. “The Fair Housing Act prohibits landlords from denying apartments to families just because they have children,” said Acting Assistant Attorney General Tom Wheeler of the Justice Department’s Civil Rights Division. “Many families already face challenges finding affordable housing, and they should not also have to deal with unlawful discrimination.” “Equal access to housing is essential for all Americans, including families with young children,” said U.S. Attorney Annette L. Hayes of the Western District of Washington. “Particularly in our tight housing market, landlords must follow the law and make units available without discrimination based on race, color, religion, sex, national origin, disability or familial status.” The complaint concerns three apartment buildings – located at 201 5th Ave. N., 621 5th Ave. S., and 401 Pine Street in Edmonds – that are managed by defendant Debbie A. Appleby, of Stanwood, Washington. The properties are owned by three Limited Liability Corporations (LLCs) controlled by Appleby – Apple One, LLC, Apple Two, LLC, and Apple Three, LLC—which are also named as defendants in the suit. The complaint alleges that in March 2014, defendant Appleby told a woman seeking an apartment for herself, her husband, and their one year old child that the apartment buildings were “adult only” and therefore not available to her family. The complaint also alleges that at various other times from April 2014 to November 2015, defendants advertised their available apartments as being restricted to adults only. The family filed a complaint with the Department of Housing and Urban Development (“HUD”) which conducted an investigation, issued a charge of discrimination against the defendants, and referred the case to the Department of Justice. The complaint seeks a court order requiring defendants to cease their discriminatory housing practices, damages for the family that filed the HUD complaint and any other families against whom the defendants discriminated against because they had children, and civil penalties. Any individuals who have information relevant to this case are encouraged to contact the Civil Rights Division at 1-800-896-7743, Option 96. The federal Fair Housing Act prohibits discrimination in housing on the basis of race, color, religion, sex, familial status, national origin and disability. More information about the Civil Rights Division and the civil rights laws it enforces is available at www.usdoj.gov/crt and https://www.justice.gov/usao-wdwa/civil-rights. Individuals who believe that they have been victims of housing discrimination may call the Justice Department at 1-800-896-7743, email the Justice Department at fairhousing@usdoj.gov, or contact HUD at 1-800-669-9777 or through its website at www.hud.gov. The case is being jointly handled by the Department’s Civil Rights Division and the U.S. Attorney’s Office for the Western District of Washington. The complaint is an allegation of unlawful conduct. The allegations must still be proven in federal court.",2017-03-03T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Housing and Civil Enforcement Section; USAO - Washington, Western",0.009,0.912,0.009,-0.9753,today file lawsuit court western washington alleg owner manag three edmond washington apart build refus rent apart famili children violat fair hous fair hous prohibit landlord deni apart famili children said act general wheeler mani famili alreadi face challeng find afford hous also deal unlaw discrimin equal access hous essenti american includ famili young children said annett hay western washington particular tight hous market landlord must follow make unit avail without discrimin base race color religion nation origin disabl famili status complaint concern three apart build locat pine street edmond manag defend debbi applebi stanwood washington properti own three limit liabil corpor llcs control applebi appl appl appl three also name defend suit complaint alleg march defend applebi told woman seek apart husband year child apart build adult therefor avail famili complaint also alleg various time april novemb defend advertis avail apart restrict adult famili file complaint hous urban develop conduct issu charg discrimin defend refer complaint seek court order requir defend ceas discriminatori hous practic damag famili file complaint famili defend discrimin children penalti individu inform relev encourag contact option feder fair hous prohibit discrimin hous basi race color religion famili status nation origin disabl inform law enforc avail usdoj https usao wdwa individu believ victim hous discrimin call email fairhous usdoj contact websit joint handl western washington complaint alleg unlaw conduct alleg must still proven feder court,0.999255,0.000471,0.000274,0,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
8847,17-550,Mother Sentenced to 26 Months in Prison for Taking Child from Illinois to Canada in International Parental Kidnapping Case,"A Canadian woman was sentenced to serve 26 months in prison following her December conviction for international parental kidnapping, announced Acting Assistant Attorney General Kenneth A. Blanco of the Justice Department’s Criminal Division and Acting U.S. Attorney Patrick D. Hansen of the Central District of Illinois. Sarah M. Nixon, 48, of Montreal, Canada, was sentenced before U.S. District Judge Colin S. Bruce of the Central District of Illinois. On Dec. 21, 2016, a federal jury found Nixon guilty of one count of international parental kidnapping for taking her minor child from the United States to Canada in July 2015, with the intent to obstruct the lawful exercise of the father’s rights. Evidence at trial established that after a custody trial where it was apparent that Nixon would lose custody of her six-year-old daughter, Nixon fled the United States with the child in the middle of the night. When she did not appear for the custody ruling and neither she nor her daughter could be located, law enforcement issued a child abduction alert. Nixon and the child were eventually located in a farmhouse in rural Ontario, Canada. Authorities then returned the child to the father. Nixon was arrested in New York on Sept. 20, 2015, as she attempted to return to the United States. Trial Attorneys Elly M. Peirson and Lauren S. Kupersmith of the Criminal Division’s Child Exploitation and Obscenity Section prosecuted the case. The FBI; Urbana, Illinois, Police Department; University of Illinois Police Department; Illinois Department of Children and Family Services; Ontario Provincial Police; and U.S. Customs and Border Protection investigated the case, with assistance from the Champaign County, Illinois, State’s Attorney’s Office and the Criminal Division’s Office of International Affairs.",2017-05-19T00:00:00-04:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Illinois, Central",0.010,0.894,0.010,-0.9460,canadian woman sentenc serv month prison follow decemb convict intern parent kidnap announc act general kenneth blanco crimin act patrick hansen central illinoi sarah nixon montreal canada sentenc judg colin bruce central illinoi feder juri found nixon guilti count intern parent kidnap take minor child unit state canada juli intent obstruct law exercis father evid establish custodi appar nixon would lose custodi year daughter nixon fled unit state child middl night appear custodi rule neither daughter could locat enforc issu child abduct alert nixon child eventu locat farmhous rural ontario canada author return child father nixon arrest york sept attempt return unit state attorney elli peirson lauren kupersmith crimin child exploit obscen section prosecut urbana illinoi polic univers illinoi polic illinoi children famili servic ontario provinci polic custom border protect investig champaign counti illinoi state crimin intern affair,0.000651,0.214084,0.785265,2,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6089,18-119,Justice Department Announces Religious Liberty Update to U.S. Attorneys’ Manual and Directs the Designation of Religious Liberty Point of Contact for All U.S. Attorney's Offices,"The Department of Justice today announced the update of the United States Attorneys’ Manual (USAM) with a new section titled, “Associate Attorney General’s Approval and Notice Requirements for Issues Implicating Religious Liberty.” On Oct. 6, 2017, the Attorney General issued a Memorandum for All Executive Departments and Agencies entitled Federal Law Protections for Religious Liberty. The memo directed components and United States Attorney’s Offices to use the guidance in litigation, advice to the Executive Branch, operations, grants, and all other aspects of the Department’s work. In order to ensure compliance with the Attorney General’s memo, the USAM will be updated with language that directs relevant Department of Justice components to: The updated USAM will also instruct relevant Justice Department components to consult the 20 religious liberty principles laid out in the Attorney General’s October 6 memo when considering whether the notice or approval requirements are initiated. In order to fully effectuate the approval and notice requirements in the updated USAM, the Department will instruct all U.S. Attorneys to designate a point of contact to lead these efforts for their office. “Religious liberty is an inalienable right protected by the Constitution, and defending it is one of the most important things we do at the Department of Justice,” said Associate Attorney General Rachel Brand. At President Trump's direction, Attorney General Sessions issued a robust and clear guidance document in October that clearly explains how the federal government is to apply the religious liberty protections currently on the books. The requirement that each of the U.S. Attorney offices designate a religious liberty point of contact will ensure that the Attorney General’s Memorandum is effectively implemented. The designees will be responsible for working directly with the leadership offices on civil cases related to religious liberty, ensuring that these cases receive the rigorous attention they deserve.",2018-01-31T00:00:00-05:00,Civil Rights,Civil Rights Division; Office of the Associate Attorney General,0.199,0.788,0.199,0.9944,today announc updat unit state attorney manual usam section titl associ general approv notic requir issu implic religi liberti general issu memorandum execut depart agenc entitl feder protect religi liberti memo direct compon unit state offic guidanc litig advic execut branch oper grant aspect work order ensur complianc general memo usam updat languag direct relev compon updat usam also instruct relev compon consult religi liberti principl laid general octob memo consid whether notic approv requir initi order fulli effectu approv notic requir updat usam instruct attorney design point contact lead effort religi liberti inalien right protect constitut defend import thing said associ general rachel brand presid trump direct general session issu robust clear guidanc document octob clear explain feder govern appli religi liberti protect current book requir offic design religi liberti point contact ensur general memorandum effect implement designe respons work direct leadership offic case relat religi liberti ensur case receiv rigor attent deserv,0.998535,0.000925,0.000540,0,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
6733,16-1321,"Justice Department Reaches Agreement with City of Yonkers, New York, to Enhance Police Department Policies and Procedures","The Justice Department announced today that it has reached an agreement with the city of Yonkers, New York, and the Yonkers Police Department (YPD) to resolve the department’s investigation of YPD and ensure constitutional policing. The agreement is the result of the department’s investigation of YPD under the Violent Crime Control and Law Enforcement Act of 1994 and the Omnibus Crime Control and Safe Streets Act of 1968. In June 2009, the United States sent the city a technical assistance letter that identified necessary reforms to YPD practices and policies in the areas of use of force, civilian complaints, investigations, supervisory oversight and training. After receiving the department’s technical assistance letter, the city and YPD made substantial changes to its policies and procedures. This agreement implements and further improves those policies and procedures and addresses the department’s remaining concerns. “This agreement will ensure that the Yonkers Police Department continues to advance constitutional, effective and community-oriented policing,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “Through clear policy guidance, data analysis and accountability systems, we believe these reforms will make the entire community safer and strengthen public trust in the police.” “This agreement ensures that the Yonkers Police Department polices in a way that keeps its citizens safe, while protecting their constitutional rights,” said U.S. Attorney Preet Bharara of the Southern District of New York. “The measures put in place with this agreement, including clear and reasonable use-of-force policies and guidance on how to properly evaluate and respond to use-of-force incidents, will make Yonkers safer for citizens and police alike. We thank the Yonkers Police Department and the city of Yonkers for cooperating with our investigation, and for joining our effort to ensure that the Yonkers Police Department protects its citizens not only from physical harm, but also from violations of their constitutional rights.” The agreement is carefully tailored to address the department’s remaining concerns while also taking into account and seeking to build upon the positive reforms YPD has already made following the department’s investigation. Under the agreement, the YPD will, among other things: The agreement also provides that consultants retained by the department will conduct compliance reviews to ensure that YPD has implemented the measures required by the agreement and issue public reports of those compliance reviews. This case is being handled by the Civil Rights Division’s Special Litigation Section and the U.S. Attorney’s Office of the Southern District of New York. Yonkers Police Department Agreement",2016-11-14T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - New York, Southern",0.199,0.778,0.199,0.9950,announc today reach agreement citi yonker york yonker polic resolv ensur constitut polic agreement result violent crime control enforc omnibus crime control safe street june unit state sent citi technic letter identifi necessari reform practic polici area forc civilian complaint investig supervisori oversight train receiv technic letter citi made substanti chang polici procedur agreement implement improv polici procedur address remain concern agreement ensur yonker polic continu advanc constitut effect communiti orient polic said princip deputi general vanita gupta head clear polici guidanc data analysi account system believ reform make entir communiti safer strengthen public trust polic agreement ensur yonker polic polic keep citizen safe protect constitut said preet bharara southern york measur place agreement includ clear reason forc polici guidanc proper evalu respond forc incid make yonker safer citizen polic alik thank yonker polic citi yonker cooper join effort ensur yonker polic protect citizen physic harm also violat constitut agreement care tailor address remain concern also take account seek build upon posit reform alreadi made follow agreement among thing agreement also provid consult retain conduct complianc review ensur implement measur requir agreement issu public report complianc review handl special litig section southern york yonker polic agreement,0.999066,0.000590,0.000344,0,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
6905,16-740,"Justice Department Reaches Settlement to Reform Criminal Justice System in Hinds County, Mississippi","The Justice Department today reached a landmark settlement agreement to reform the criminal justice system in Hinds County, Mississippi. The agreement resolves the department’s findings that the Hinds County Adult Detention Center and the Jackson City Detention Center – which together form the Hinds County Jail – failed to protect prisoners from violence and excessive force and held them past their court-ordered release dates, in violation of the Civil Rights of Institutionalized Persons Act (CRIPA). The settlement agreement is the first of its kind to incorporate broader criminal justice system reform through diversion at the front end and reentry to the community after incarceration. It creates a criminal justice coordinating committee that will help ensure the county’s systems operate effectively and efficiently, develop interventions to divert individuals in appropriate cases from arrest, detention and incarceration, and engage in community outreach. To promote successful reentry, the agreement includes mechanisms for notifying community health providers when a person with serious mental illness is released to help the person transition safely back to the community. The agreement also addresses unlawful enforcement of court-ordered fines and fees by ensuring that the county cannot incarcerate an individual for non-payment if the court does not first assess whether the individual is indigent. “Across the board, this settlement will make the Hinds County criminal justice system smarter and fairer,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “If implemented, these reforms will make pretrial detainees, prisoners, corrections staff and the entire community safer, while also ensuring that vulnerable individuals get access to the treatment, care and community services they need and deserve. We commend the county for its commitment to making these reforms a reality.” “For too long, the conditions in the Jail have posed a serious challenge to law enforcement and the safety of our community,” said U.S. Attorney Gregory K. Davis of the Southern District of Mississippi. “I appreciate the commitment made by Hinds County officials to turn the page and begin making necessary reforms.” The settlement agreement – subject to approval by the U.S. District Court of the Southern District of Mississippi – requires the county to implement a series of reforms across various stages of the criminal justice system, including the following: Together these reforms aim to improve communication and coordination among criminal justice entities and community service providers to help individuals with mental illness transition back to the community and to reduce recidivism. If approved by the federal district court, an independent monitor will be appointed to assess the county’s compliance. In May 2015, the Justice Department completed a comprehensive investigation – which included on-site inspections, document reviews and stakeholder interviews by department experts and staff – and issued a findings letter that determined that Hinds County Adult Detention Center and the Jackson City Detention Center violated CRIPA by failing to protect prisoners from violence by other prisoners and from improper use of force by staff. The department also found that inadequate staffing and training, a backlog in record filing and a lack of centralized information resulted in prisoners being held beyond court-ordered release dates. CRIPA authorizes the department to seek a remedy for a pattern or practice of conduct that violates the constitutional rights of persons confined in a jail, prison or other correctional facility. For more information on the Civil Rights Division’s work in this area, please visit www.justice.gov/crt. Hinds County Settlement Agreement Hinds County Fact Sheet",2016-06-23T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Special Litigation Section; USAO - Mississippi, Southern",0.205,0.659,0.205,0.9888,today reach landmark settlement agreement reform crimin system hind counti mississippi agreement resolv find hind counti adult detent center jackson citi detent center togeth form hind counti jail fail protect prison violenc excess forc held past court order releas date violat institution person cripa settlement agreement first kind incorpor broader crimin system reform divers front reentri communiti incarcer creat crimin coordin committe help ensur counti system oper effect effici develop intervent divert individu appropri case arrest detent incarcer engag communiti outreach promot success reentri agreement includ mechan notifi communiti health provid person serious mental ill releas help person transit safe back communiti agreement also address unlaw enforc court order fine fee ensur counti cannot incarcer individu payment court first assess whether individu indig across board settlement make hind counti crimin system smarter fairer said princip deputi general vanita gupta head implement reform make pretrial detaine prison correct staff entir communiti safer also ensur vulner individu access treatment care communiti servic need deserv commend counti commit make reform realiti long condit jail pose serious challeng enforc safeti communiti said gregori davi southern mississippi appreci commit made hind counti offici turn page begin make necessari reform settlement agreement subject approv court southern mississippi requir counti implement seri reform across various stage crimin system includ follow togeth reform improv communic coordin among crimin entiti communiti servic provid help individu mental ill transit back communiti reduc recidiv approv feder court independ monitor appoint assess counti complianc complet comprehens includ site inspect document review stakehold interview expert staff issu find letter determin hind counti adult detent center jackson citi detent center violat cripa fail protect prison violenc prison improp forc staff also found inadequ staf train backlog record file lack central inform result prison held beyond court order releas date cripa author seek remedi pattern practic conduct violat constitut person confin jail prison correct facil inform work area pleas visit hind counti settlement agreement hind counti fact sheet,0.887984,0.111796,0.000220,0,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"
11066,16-163,"Statement from Head of the Civil Rights Division Vanita Gupta Regarding Ferguson, Missouri, City Council Vote on Proposed Consent Decree","Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division, released the following statement regarding the Ferguson, Missouri, City Council vote on the proposed consent decree with the Department of Justice: “The Ferguson City Council has attempted to unilaterally amend the negotiated agreement. Their vote to do so creates an unnecessary delay in the essential work to bring constitutional policing to the city, and marks an unfortunate outcome for concerned community members and Ferguson police officers. Both parties engaged in thoughtful negotiations over many months to create an agreement with cost-effective remedies that would ensure Ferguson brings policing and court practices in line with the Constitution. The agreement already negotiated by the department and the city will provide Ferguson residents a police department and municipal court that fully respects civil rights and operates free from racial discrimination. “The Department of Justice will take the necessary legal actions to ensure that Ferguson’s policing and court practices comply with the Constitution and relevant federal laws.”",2016-02-10T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section,0.217,0.752,0.217,0.9811,princip deputi general vanita gupta head releas follow statement regard ferguson missouri citi council vote propos consent decre ferguson citi council attempt unilater amend negoti agreement vote creat unnecessari delay essenti work bring constitut polic citi mark unfortun outcom concern communiti member ferguson polic offic parti engag thought negoti mani month creat agreement cost effect remedi would ensur ferguson bring polic court practic line constitut agreement alreadi negoti citi provid ferguson resid polic municip court fulli respect oper free racial discrimin take necessari legal action ensur ferguson polic court practic compli constitut relev feder law,0.997998,0.001264,0.000737,0,"[washington_paul, paul_miller, miller_denham, denham_spring, spring_convict, convict_late, late_yesterday, yesterday_count, count_produc, produc_count, count_possess, possess_child, child_pornographi, pornographi_announc, announc_general, general_lanni, lanni_breuer, breuer_crimin, crimin_donald, donald_cazayoux, cazayoux_middl, middl_louisiana, louisiana_miller, miller_convict, convict_feder, feder_juri, juri_follow, follow_judg, judg_jame, jame_bradi, bradi_presid, presid_evid, evid_present, present_show, show_octob, octob_miller, miller_sexual, sexual_abus, abus_year, year_girl, girl_year, year_girl, girl_produc, produc_numer, numer_photograph, photograph_abus, abus_accord, accord_evid, evid_forens, forens_examin, examin_miller, miller_comput, comput_reveal, reveal_miller, miller_use, use_comput, comput_print, print_possess, possess_numer, numer_imag, imag_child, child_pornographi, pornographi_includ, includ_imag, imag_child, child_pornographi, pornographi_produc, produc_imag, imag_child, child_victim, victim_miller, miller_face, face_maximum, maximum_statutori, statutori_sentenc, sentenc_year, year_prison, prison_count, count_product, product_child, child_pornographi, pornographi_year, year_prison, prison_possess, possess_child, child_pornographi, pornographi_count, count_prosecut, prosecut_alecia, alecia_riewert, riewert_wolak, wolak_crimin, crimin_child, child_exploit, exploit_obscen, obscen_section, section_ceo, ceo_richard, richard_bourgeoi, bourgeoi_middl, ...]"


C. Use the create_dtm function and the `processed_text_bigrams` column to create a document-term matrix (`dtm_bigram`) with these bigrams. Keep the following three columns in the data: `id`, `topics_clean`, and `compound` 

D. Print the (1) dimensions of the `dtm` matrix from question 2.2  and (2) the dimensions of the `dtm_bigram` matrix. Comment on why the bigram matrix has more dimensions than the unigram matrix 

E. Find and print the 10 most prevelant bigrams for each of the three topics_clean using the `get_topwords` function from 2.2

In [None]:
# your code here

# 4. Optional extra credit (2 points)

You notice that the pharmaceutical kickbacks press release we analyzed in question 1 was for an indictment, and that in the original data, there's not a clear label for whether a press release outlines an indictment (charging someone with a crime), a conviction (convicting them after that charge either via a settlement or trial), or a sentencing (how many years of prison or supervised release a defendant is sentenced to after their conviction).

You want to see if you can identify pairs of press releases where one press release is from one stage (e.g., indictment) and another is from a different stage (e.g., a sentencing).

You decide that one way to approach is to find the pairwise string similarity between each of the processed press releases in `doj_subset`. There are many ways to do this, so Google for some approaches, focusing on ones that work well for entire documents rather than small strings.

Find the top two pairs (so four press releases total)-- do they seem like different stages of the same crime or just press releases covering similar crimes?

In [None]:
# your code here 