# Problem set 3: Text analysis of DOJ press releases

**Total points (without extra credit)**: 52 

- For background:

    - DOJ is the federal law enforcement agency responsible for federal prosecutions; this contrasts with the local prosecutions in the Cook County dataset we analyzed earlier. Here's a short explainer on which crimes get prosecuted federally versus locally: https://www.criminaldefenselawyer.com/resources/criminal-defense/federal-crime/state-vs-federal-crimes.htm#:~:text=Federal%20criminal%20prosecutions%20are%20handled,of%20state%20and%20local%20law. 
    - Here's the Kaggle that contains the data: https://www.kaggle.com/jbencina/department-of-justice-20092018-press-releases 
    - Here's the code the dataset creator used to scrape those press releases here if you're interested: https://github.com/jbencina/dojreleases

## 0.0 Import packages

In [1]:
## helpful packages
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import random
import re
import string

## nltk imports
import nltk
### uncomment and run these lines if you haven't downloaded relevant nltk add-ons yet
### nltk.download('averaged_perceptron_tagger')
### nltk.download('stopwords')
from nltk import pos_tag
from nltk.tokenize import word_tokenize, wordpunct_tokenize
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords

## spacy imports
import spacy
### uncomment and run the below line if you haven't loaded the en_core_web_sm library yet
### ! python -m spacy download en_core_web_sm
import en_core_web_sm
nlp = en_core_web_sm.load()

## vectorizer
from sklearn.feature_extraction.text import CountVectorizer

## sentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## lda
from gensim import corpora
import gensim

import pyLDAvis
import pyLDAvis.gensim_models as gensimvis

## repeated printouts and wide-format text
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_colwidth', None)

## 0.1 Load and clean text data

In [2]:
## first, unzip the file pset3_inputdata.zip 
## then, run this code to load the unzipped json file and convert to a dataframe
## (may need to change the pathname depending on where you store stuff)
## and convert some of the attributes from lists to values
doj = pd.read_json("combined.json", lines = True)

## due to json, topics are in a list so remove them and concatenate with ;
doj['topics_clean'] = ["; ".join(topic) 
                      if len(topic) > 0 else "No topic" 
                      for topic in doj.topics]

## similarly with components
doj['components_clean'] = ["; ".join(comp) 
                           if len(comp) > 0 else "No component" 
                           for comp in doj.components]

## drop older columns from data
doj = doj[['id', 'title', 'contents', 'date', 'topics_clean', 
           'components_clean']].copy()

doj.head()

Unnamed: 0,id,title,contents,date,topics_clean,components_clean
0,,Convicted Bomb Plotter Sentenced to 30 Years,"PORTLAND, Oregon. – Mohamed Osman Mohamud, 23, who was convicted in 2013 of attempting to use a weapon of mass destruction (explosives) in connection with a plot to detonate a vehicle bomb at an annual Christmas tree lighting ceremony in Portland, was sentenced today to serve 30 years in prison, followed by a lifetime term of supervised release. Mohamud, a naturalized U.S. citizen from Somalia and former resident of Corvallis, Oregon, was arrested on Nov. 26, 2010, after he attempted to detonate what he believed to be an explosives-laden van that was parked near the tree lighting ceremony in Portland. The arrest was the culmination of a long-term undercover operation, during which Mohamud was monitored closely for months as his bomb plot developed. The device was in fact inert, and the public was never in danger from the device. At sentencing, United States District Court Judge Garr M. King, who presided over Mohamed’s 14-day trial, said “the intended crime was horrific,” and that the defendant, even though he was presented with options by undercover FBI employees, “never once expressed a change of heart.” King further noted that the Christmas tree ceremony was attended by up to 10,000 people, and that the defendant “wanted everyone to leave either dead or injured.” King said his sentence was necessary in view of the seriousness of the crime and to serve as deterrence to others who might consider similar acts. “With today’s sentencing, Mohamed Osman Mohamud is being held accountable for his attempted use of what he believed to be a massive bomb to attack innocent civilians attending a public Christmas tree lighting ceremony in Portland,” said John P. Carlin, Assistant Attorney General for National Security. “The evidence clearly indicated that Mohamud was intent on killing as many people as possible with his attack. Fortunately, law enforcement was able to identify him as a threat, insert themselves in the place of a terrorist that Mohamud was trying to contact, and thwart Mohamud’s efforts to conduct an attack on our soil. This case highlights how the use of undercover operations against would-be terrorists allows us to engage and disrupt those who wish to commit horrific acts of violence against the innocent public. The many agents, analysts, and prosecutors who have worked on this case deserve great credit for their roles in protecting Portland from the threat posed by this defendant and ensuring that he was brought to justice.” “This trial provided a rare glimpse into the techniques Al Qaeda employs to radicalize home-grown extremists,” said Amanda Marshall, U.S. Attorney for the District of Oregon. “With the sentencing today, the court has held this defendant accountable. I thank the dedicated professionals in the law enforcement and intelligence communities who were responsible for this successful outcome. I look forward to our continued work with Muslim communities in Oregon who are committed to ensuring that all young people are safe from extremists who seek to radicalize others to engage in violence.” According to the trial evidence, in February 2009, Mohamud began communicating via e-mail with Samir Khan, a now-deceased al Qaeda terrorist who published Jihad Recollections, an online magazine that advocated violent jihad, and who also published Inspire, the official magazine of al-Qaeda in the Arabian Peninsula. Between February and August 2009, Mohamed exchanged approximately 150 emails with Khan. Mohamud wrote several articles for Jihad Recollections that were published under assumed names. In August 2009, Mohamud was in email contact with Amro Al-Ali, a Saudi national who was in Yemen at the time and is today in custody in Saudi Arabia for terrorism offenses. Al-Ali sent Mohamud detailed e-mails designed to facilitate Mohamud’s travel to Yemen to train for violent jihad. In December 2009, while Al-Ali was in the northwest frontier province of Pakistan, Mohamud and Al-Ali discussed the possibility of Mohamud traveling to Pakistan to join Al-Ali in terrorist activities. Mohamud responded to Al-Ali in an e-mail: “yes, that would be wonderful, just tell me what I need to do.” Al-Ali referred Mohamud to a second associate overseas and provided Mohamud with a name and email address to facilitate the process. In the following months, Mohamud made several unsuccessful attempts to contact Al-Ali’s associate. Ultimately, an FBI undercover operative contacted Mohamud via email under the guise of being an associate of Al-Ali’s. Mohamud and the FBI undercover operative agreed to meet in Portland in July 2010. At the meeting, Mohamud told the FBI undercover operative he had written articles that were published in Jihad Recollections. Mohamud also said that he wanted to become “operational.” Asked what he meant by “operational,” Mohamud said he wanted to put an explosion together, but needed help. According to evidence presented at trial, at a meeting in August 2010, Mohamud told undercover FBI operatives he had been thinking of committing violent jihad since the age of 15. Mohamud then told the undercover FBI operatives that he had identified a potential target for a bomb: the annual Christmas tree lighting ceremony in Portland’s Pioneer Courthouse Square on Nov. 26, 2010. The undercover FBI operatives cautioned Mohamud several times about the seriousness of this plan, noting there would be many people at the event, including children, and emphasized that Mohamud could abandon his attack plans at any time with no shame. Mohamud indicated the deaths would be justified and that he would not mind carrying out a suicide attack on the crowd. According to evidence presented at trial, in the ensuing months Mohamud continued to express his interest in carrying out the attack and worked on logistics. On Nov. 4, 2010, Mohamud and the undercover FBI operatives traveled to a remote location in Lincoln County, Oregon, where they detonated a bomb concealed in a backpack as a trial run for the upcoming attack. During the drive back to Corvallis, Mohamud was asked if was capable looking at all the bodies of those who would be killed during the explosion. In response, Mohamud noted, “I want whoever is attending that event to be, to leave either dead or injured.” Mohamud later recorded a video of himself, with the assistance of the undercover FBI operatives, in which he read a statement that offered his rationale for his bomb attack. On Nov. 18, 2010, undercover FBI operatives picked up Mohamud to travel to Portland to finalize the details of the attack. On Nov. 26, 2010, just hours before the planned attack, Mohamud examined the 1,800 pound bomb in the van and remarked that it was “beautiful.” Later that day, Mohamud was arrested after he attempted to remotely detonate the inert vehicle bomb rked near the Christmas tree lighting ceremony This case was investigated by the FBI, with assistance from the Oregon State Police, the Corvallis Police Department, the Lincoln County Sheriff’s Office and the Portland Police Bureau. The prosecution was handled by Assistant U.S. Attorneys Ethan D. Knight and Pamala Holsinger from the U.S. Attorney’s Office for the District of Oregon. Trial Attorney Jolie F. Zimmerman, from the Counterterrorism Section of the Justice Department’s National Security Division, assisted. # # # 14-1077",2014-10-01T00:00:00-04:00,No topic,National Security Division (NSD)
1,12-919,$1 Million in Restitution Payments Announced to Preserve North Carolina Wetlands,"WASHINGTON – North Carolina’s Waccamaw River watershed will benefit from a $1 million restitution order from a federal court, funding environmental projects to acquire and preserve wetlands in an area damaged by illegal releases of wastewater from a corporate hog farm, announced Ignacia S. Moreno, Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division; U.S. Attorney for the Eastern District of North Carolina Thomas G. Walker; Director Greg McLeod from the North Carolina State Bureau of Investigation; and Camilla M. Herlevich, Executive Director of the North Carolina Coastal Land Trust. Freedman Farms Inc. was sentenced in February 2012 to five years of probation and ordered to pay $1.5 million in fines, restitution and community service payments for violating the Clean Water Act when it discharged hog waste into a stream that leads to the Waccamaw River. William B. Freedman, president of Freedman Farms, was sentenced to six months in prison to be followed by six months of home confinement. Freedman Farms also is required to implement a comprehensive environmental compliance program and institute an annual training program. In an order issued on April 19, 2012, the court ordered that the defendants would be responsible for restitution of $1 million in the form of five annual payments starting in January 2013, which the court will direct to the North Carolina Coastal Land Trust (NCCLT). The NCCLT plans to use the money to acquire and conserve land along streams in the Waccamaw watershed. The court also directed a $75,000 community service payment to the Southern Environmental Enforcement Network, an organization dedicated to environmental law enforcement training and information sharing in the region. “The resolution of the case against Freedman Farms demonstrates the commitment of the Department of Justice to enforcing the Clean Water Act to ensure the protection of human health and the environment,” said Assistant Attorney General Moreno. “The court-ordered restitution in this case will conserve wetlands for the benefit of the people of North Carolina. By enforcing the nation’s environmental laws, we will continue to ensure that concentrated animal feeding operations (CAFOs) operate without threatening our drinking water, the health of our communities and the environment.” “This office is committed to doing our part to hold accountable those who commit crimes against our environment, which can cause serious health problems to residents and damage the environment that makes North Carolina such a beautiful place to live and visit,” said U.S. Attorney Walker. “This case shows what we can accomplish when our SBI agents work closely with their local, state and federal partners to investigate environmental crimes and hold the polluters accountable,” said Director McLeod. “We’ll continue our efforts to fight illegal pollution that damages our water and puts the public’s health at risk.” “The Waccamaw is unique and wild,” said Director Herlevich of the North Carolina Coastal Land Trust. “Its watershed includes some of the most extensive cypress gum swamps in the state, and its headwaters at Lake Waccamaw contain fish that are found nowhere else on Earth. We appreciate the trust of the court and the U. S. Attorney, and we look forward to using these funds for conservation projects in a river system that is one of our top conservation priorities.” According to evidence presented in court, in December 2007 Freedman Farms discharged hog waste into Browder’s Branch, a tributary to the Waccamaw River that flows through the White Marsh, a large wetlands complex. Freedman Farms, located in Columbus County, N.C., is in the business of raising hogs for market, and this particular farm had some 4,800 hogs. The hog waste was supposed to be directed to two lagoons for treatment and disposal. Instead, hog waste was discharged from Freedman Farms directly into Browder’s Branch. The Clean Water Act is a federal law that makes it illegal to knowingly or negligently discharge a pollutant into a water of the United States. The Freedman case was investigated by the U.S. Environmental Protection Agency (EPA) Criminal Investigation Division, the U.S. Army Corps of Engineers and the North Carolina State Bureau of Investigation, with assistance from the EPA Science and Ecosystem Support Division. The case was prosecuted by Assistant U.S. Attorney J. Gaston B. Williams of the Eastern District of North Carolina and Trial Attorney Mary Dee Carraway of the Environmental Crimes Section of the Justice Department’s Environment and Natural Resources Division. The North Carolina Coastal Land Trust is celebrating its 20th anniversary of saving special lands in eastern North Carolina. The organization has protected nearly 50,000 acres of lands with scenic, recreational, historic and ecological values. North Carolina Coastal Land Trust has saved streams and wetlands that provide clean water, forests that are havens for wildlife, working farms that provide local food and nature parks that everyone can enjoy. More information about the Coastal Land Trust is available at www.coastallandtrust.org.",2012-07-25T00:00:00-04:00,No topic,Environment and Natural Resources Division
2,11-1002,$1 Million Settlement Reached for Natural Resource Damages at Superfund Site in Massachusetts,"BOSTON– A $1-million settlement has been reached for natural resource damages (NRD) at the Blackburn & Union Privileges Superfund Site in Walpole, Mass., the Departments of Justice and Interior (DOI), and the Office of the Massachusetts Attorney General announced today. The Blackburn & Union Privileges Superfund Site includes 22 acres of contaminated land and water in Walpole. The contamination resulted from the operations of various industrial facilities dating back to the 19th century that exposed the site to asbestos, arsenic, lead and other hazardous substances. The private parties involved in the settlement include two former owners and operators of the site, W.R. Grace & Co.– Conn. and Tyco Healthcare Group LP, as well as the current owners, BIM Investment Corp. and Shaffer Realty Nominee Trust. From about 1915 to 1936, a predecessor of W.R. Grace manufactured asbestos brake linings and clutch linings on a large portion of the property. From 1946 to about 1983, a predecessor of Tyco Healthcare operated a cotton fabric manufacturing business, which used caustic solutions, on a portion of the property. In a 2010 settlement with U.S. Environmental Protection Agency (EPA), the four private parties agreed to perform a remedial action to clean up the site at an estimated cost of $13 million. The consent decree lodged today resolves both state and federal NRD liability claims; it requires the parties to pay $1,094,169.56 to the state and federal natural resource trustees, the Massachusetts Executive Office of Energy and Environmental Affairs (EEA) and DOI, for injuries to ecological resources including groundwater and wetlands, which provide habitat for waterfowl and wading birds, including black ducks and great blue herons. The trustees will use the settlement funds for natural resource restoration projects in the area. “This settlement demonstrates our commitment to recovering damages from the parties responsible for injury to natural resources, in partnership with state trustees,” said Bruce Gelber, Acting Deputy Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division. “The citizens of Walpole have had to live with the environmental impact of this contamination for many years,” Attorney General Martha Coakley said. “We are pleased that today’s agreement will not only require the responsible parties to reimburse taxpayer dollars, but will also provide funding to begin restoring or replacing the wetland and other natural resources.” The consent decree was lodged in the U.S. District Court for Massachusetts. A portion of the funds, $300,000, will be distributed to the EEA-sponsored groundwater restoration projects; $575,000 will be used for ecological restoration projects jointly sponsored by EEA and the U.S. Fish and Wildlife Service (FWS). In addition, $125,000 will go for projects jointly sponsored by EEA and FWS that achieve both ecological and groundwater restoration; $57,491.34 will be allocated for reimbursement for the FWS’s assessment costs; and $36,678.22 will be distributed as reimbursement for the commonwealth’s assessment costs. “This settlement provides the means for a range of projects designed to compensate the public for decades of groundwater and other ecological damage at this site. I encourage local citizens and organizations to become engaged in the public process that will take place as we solicit, take comment on, and choose these projects in the months ahead,” said Energy and Environmental Affairs Secretary Richard K. Sullivan Jr., who serves as the Commonwealth’s Natural Resources Damages trustee. “This settlement will help restore habitat for fish and wildlife in the Neponset River watershed,” said Tom Chapman of the FWS New England Field Office. “We look forward to working with the commonwealth and local stakeholders to implement restoration.” “More than 100 years-worth of industrial activities at this site caused major environmental contamination to the Neponset River, nearby wetlands and to groundwater below the site,” said Commissioner Kenneth Kimmell of the Massachusetts Department of Environmental Protection (MassDEP), which will staff the Trustee Council for the Commonwealth. “We will ensure that the community and the public will be active participants in the process to use these NRD funds to restore the injured natural resources.” Under the federal Comprehensive Environmental Response, Compensation and Liability Act, EEA and DOI, acting through the FWS, are the designated state and federal natural resource Trustees for the site. The site has been listed on the EPA’s National Priorities List since 1994. The consent decree is subject to a public comment period and court approval. A copy of the consent decree and instructions about how to submit comments is available on www.usdoj.gov/enrd/Consent_Decrees.html . After the consent decree is approved, EEA and FWS will develop proposed restoration plans to use the settlement funds for restoration projects. The proposed restoration plans will also be made available to the public for review and comment. Assistant Attorney General Matthew Brock of Massachusetts Attorney General Coakley's Environmental Protection Division handled this matter. Attorney Jennifer Davis of MassDEP, Attorney Anna Blumkin of EEA and MassDEP’s NRD Coordinator Karen Pelto also worked on this settlement.",2011-08-03T00:00:00-04:00,No topic,Environment and Natural Resources Division
3,10-015,10 Las Vegas Men Indicted \r\nfor Falsifying Vehicle Emissions Tests,"WASHINGTON—A federal grand jury in Las Vegas today returned indictments against 10 Nevada-certified emissions testers for falsifying vehicle emissions test reports, the Justice Department announced. Each defendant faces one felony Clean Air Act count for falsifying reports between November 2007 and May 2009. The number of falsifications varied by defendant, with some defendants having falsified approximately 250 records, while others falsified more than double that figure. One defendant is alleged to have falsified over 700 reports. The individuals indicted include: Escudero resides in Pahrump, Nev. All other individuals are from Clark County, Nev. The 10 defendants are alleged to have engaged in a practice known as ""clean scanning"" vehicles. The scheme involved entering the Vehicle Identification Number (VIN) for a vehicle that would not pass the emissions test into the computerized system, then connecting a different vehicle the testers knew would pass the test. These falsifications were allegedly performed for anywhere from $10 to $100 over and above the usual emissions testing fee. The U.S. Environmental Protection Agency (EPA), under the Clean Air Act, requires the state of Nevada to conduct vehicle emissions testing in certain areas because the areas exceed national standards for carbon monoxide and ozone. Las Vegas is currently required to perform emissions testing. To obtain a registration renewal, vehicle owners bring the vehicles to a licensed inspection station for testing. The emissions inspector logs into a computer to activate the system by using a unique password issued to the emissions inspector. The emissions inspector manually inputs the vehicle’s VIN to identify the tested vehicle, then connects the vehicle for model year 1996 and later to an onboard diagnostics port connected to an analyzer. The analyzer downloads data from the vehicle’s computer, analyzes the data and provides a ""pass"" or ""fail"" result. The pass or fail result and vehicle identification data are reported on the Vehicle Inspection Report. It is a crime to knowingly alter or conceal any record or other document required to be maintained by the Clean Air Act. ""Falsifications of vehicle emissions testing, such as those alleged in the indictments unsealed today, are serious matters and we intend to use all of our enforcement tools to stop this harmful practice. These actions undermine a system that is designed to reduce air pollutants including smog and provide better air quality for the citizens of Nevada,"" said Ignacia S. Moreno, Assistant Attorney General for the Justice Department’s Environment and Natural Resources Division. ""The residents of Nevada deserve to know that the vast majority of licensed vehicle emission inspectors are not corrupt and are not circumventing emission testing procedures,"" said U.S. Attorney Bogden. ""These indictments should serve as a clear warning to offenders that the Department of Justice will prosecute you if you make fraudulent statements and reports concerning compliance with the federal Clean Air Act."" ""Lying about car emissions means dirtier air, which is especially of concern in areas like Las Vegas that are already experiencing air quality problems,"" said Cynthia Giles, Assistant Administrator for Enforcement and Compliance Assurance at EPA. ""We will take aggressive action to ensure communities have clean air."" The maximum penalty for the felony violations contained in the indictments includes up to two years in prison and a fine of up to $250,000. An indictment is merely an accusation, and a defendant is presumed innocent unless and until proven guilty in a court of law. The case was investigated by the EPA, Criminal Investigation Division; and the Nevada Department of Motor Vehicles Compliance Enforcement Division. The case is being prosecuted by the U.S. Attorney’s Office for the District of Nevada and the Justice Department’s Environmental Crimes Section.",2010-01-08T00:00:00-05:00,No topic,Environment and Natural Resources Division
4,18-898,"$100 Million Settlement Will Speed Cleanup Work at Centredale Manor Superfund Site in North Providence, R.I.","The U.S. Department of Justice, the U.S. Environmental Protection Agency (EPA), and the Rhode Island Department of Environmental Management (RIDEM) announced today that two subsidiaries of Stanley Black & Decker Inc.—Emhart Industries Inc. and Black & Decker Inc.—have agreed to clean up dioxin contaminated sediment and soil at the Centredale Manor Restoration Project Superfund Site in North Providence and Johnston, Rhode Island. “We are pleased to reach a resolution through collaborative work with the responsible parties, EPA, and other stakeholders,” said Acting Assistant Attorney General Jeffrey H. Wood for the Justice Department's Environment and Natural Resources Division . “Today’s settlement ends protracted litigation and allows for important work to get underway to restore a healthy environment for citizens living in and around the Centredale Manor Site and the Woonasquatucket River.” “This settlement demonstrates the tremendous progress we are achieving working with responsible parties, states, and our federal partners to expedite sites through the entire Superfund remediation process,” said EPA Acting Administrator Andrew Wheeler. “The Centredale Manor Site has been on the National Priorities List for 18 years; we are taking charge and ensuring the Agency makes good on its promise to clean it up for the betterment of the environment and those communities affected.” “Successfully concluding this settlement paves the way for EPA to make good on our commitment to aggressively pursue cleaning up the Centredale Manor Superfund Site,” said EPA New England Regional Administrator Alexandra Dunn. “We are excited to get to work on the cleanup at this site, and get it closer to the goal of being fully utilized by the North Providence and Johnston communities.” “We are pleased that the collective efforts of the State of Rhode Island, EPA, and DOJ in these negotiations have concluded in this major milestone toward the cleanup of the Centredale Manor Restoration Superfund site and are consistent with our long-standing efforts to make the polluter pay,” said RIDEM Director Janet Coit. “The settlement will speed up a remedy that protects public health and the river environment, and moves us closer to the day that we can reclaim recreational uses of this beautiful river resource.” The settlement, which includes cleanup work in the Woonasquatucket River (River) and bordering residential and commercial properties along the River, requires the companies to perform the remedy selected by EPA for the Site in 2012, which is estimated to cost approximately $100 million, and resolves longstanding litigation. The cleanup remedy includes excavation of contaminated sediment and floodplain soil from the Woonasquatucket River, including from adjacent residential properties. Once the cleanup remedy is completed, full access to the Woonasquatucket River should be restored for local citizens. The cleanup will be a step toward the State’s goal of a fishable and swimmable river. The work will also include upgrading caps over contaminated soil in the peninsula area of the Site that currently house two high-rise apartment buildings. The settlement also ensures that the long-term monitoring and maintenance of the site, as directed in the remedy, will be implemented to ensure that public health is protected. Under the settlement, Emhart and Black & Decker will reimburse EPA for approximately $42 million in past costs incurred at the Site. The companies will also reimburse EPA and the State of Rhode Island for future costs incurred by those agencies in overseeing the work required by the settlement. The settlement will also include payments on behalf of two federal agencies to resolve claims against those agencies. These payments, along with prior settlements related to the Site, will result in a 100 percent recovery for the United States of its past and future response costs related to the Site. Litigation related to the Site has been ongoing for nearly eight years. While the Federal District Court found Black & Decker and Emhart to be liable for their hazardous waste and responsible to conduct the cleanup of the Site, it had also ruled that EPA needed to reconsider certain aspects of that cleanup. EPA appealed the decision requiring it to reconsider aspects of the cleanup. This settlement, once entered by the District Court, will resolve the litigation between the United States, Rhode Island, and Emhart and Black and Decker, allowing the cleanup of the Site to begin. The Site spans a one and a half mile stretch of the Woonasquatucket River and encompasses a nine-acre peninsula, two ponds and a significant forested wetland. From the 1940s to the early 1970s, Emhart’s predecessor operated a chemical manufacturing facility on the peninsula and used a raw material that was contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin, a toxic form of dioxin. The Site property was also previously used by a barrel refurbisher. Elevated levels of dioxins and other contaminants have been detected in soil, groundwater, sediment, surface water and fish. The Site was added to the National Priorities List (NPL) in 2000, and in December 2017, EPA included the Centredale Manor Restoration Project Superfund Site on a list of Superfund sites targeted for immediate and intense attention. Several short-term actions were previously performed at the Site to address immediate threats to the residents and minimize potential erosion and downstream transport of contaminated soil and sediment. This settlement is the latest agreement EPA has reached since the Site was listed on the NPL. Prior agreements addressed the performance and recovery of costs for the past environmental investigations and interim cleanup actions from Emhart, the barrel reconditioning company, the current owners of the peninsula portion of the Site, and other potentially responsible parties. The Consent Decree, lodged in the U.S. District Court of Rhode Island, will be posted in the Federal Register and available for public comment for a period of 30 days. The Consent Decree can be viewed on the Justice Department website: www.justice.gov/enrd/Consent_Decrees.html. EPA information on the Centredale Manor Superfund Site: www.epa.gov/superfund/centredale.",2018-07-09T00:00:00-04:00,Environment,Environment and Natural Resources Division


## 1. Tagging and sentiment scoring (17 points)

Focus on the following press release: `id` == "17-1204" about this pharmaceutical kickback prosecution: https://www.forbes.com/sites/michelatindera/2017/11/16/fentanyl-billionaire-john-kapoor-to-plead-not-guilty-in-opioid-kickback-case/?sh=21b8574d6c6c 

The `contents` column is the one we're treating as a document. You may need to to convert it from a pandas series to a single string.

We'll call the raw string of this press release `pharma`

In [3]:
## your code to subset to one press release and take the string

pharma = doj.loc[doj['id'] == '17-1204', 'contents'].to_string(index=False)
pharma

'The founder and majority owner of Insys Therapeutics Inc., was arrested today and charged with leading a nationwide conspiracy to profit by using bribes and fraud to cause the illegal distribution of a Fentanyl spray intended for cancer patients experiencing breakthrough pain.\xa0"More than 20,000 Americans died of synthetic opioid overdoses last year, and millions are addicted to opioids. And yet some medical professionals would rather take advantage of the addicts than try to help them," said Attorney General Jeff Sessions. "This Justice Department will not tolerate this.\xa0 We will hold accountable anyone – from street dealers to corporate executives -- who illegally contributes to this nationwide epidemic.\xa0 And under the leadership of President Trump, we are fully committed to defeating this threat to the American people.”John N. Kapoor, 74, of Phoenix, Ariz., a current member of the Board of Directors of Insys, was arrested this morning in Arizona and charged with RICO conspi

### 1.1 part of speech tagging (3 points)

A. Preprocess the `pharma` press release to remove all punctuation / digits (you can use `.isalpha()` to subset)

B. With the preprocessed press release from part A, use the part of speech tagger within nltk to tag all the words in that one press release with their part of speech. 

C. Using the output from B, extract the adjectives and sort those adjectives from most occurrences to fewest occurrences. Print a dataframe with the 5 most frequent adjectives and their counts in the `pharma` release. See here for a list of the names of adjectives within nltk: https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/

**Resources**:

- Documentation for `.isalpha()`: https://www.w3schools.com/python/ref_string_isalpha.asp

In [4]:
#tokenize the text
tokens = word_tokenize(pharma)

#A
#remove all punc/digits
alpha_tokens = [token for token in tokens if token.isalpha()]


In [5]:
#B
#uses speech tagger 
pos_tags = pos_tag(alpha_tokens)


#use code from class to extract adj
all_adj_noun = [one_tok[0] for one_tok in pos_tags
                if one_tok[1] == "JJ" or
                one_tok[1] == "JJR" or one_tok[1] == "JJS"]

#C
#create a pandas Series from the list of adjectives
adjective_series = pd.Series(all_adj_noun)

#count occurrences of each adjective
adjective_counts = adjective_series.value_counts()

#convert the Series into a df and rename the index and column
df_adjectives = adjective_counts.reset_index()
df_adjectives.columns = ['Adjective', 'Count']

#get the top 5 most frequent adjectives
top_adjectives = df_adjectives.head(5)

top_adjectives

Unnamed: 0,Adjective,Count
0,former,8
1,opioid,5
2,nationwide,4
3,addictive,3
4,other,3


## 1.2 named entity recognition (4 points)

A. Using the original `pharma` press release (so the one before stripping punctuation/digits), use spaCy to extract all named entities from the press release.

B. Print the unique named entities with the tag: `LAW`

In [6]:
# A
#extract all named entities
spacy_pharma = nlp(pharma)

for one_tok in spacy_pharma.ents:
    print("Entity: " + one_tok.text + "; NER tag: " + one_tok.label_)

Entity: Insys Therapeutics Inc.; NER tag: ORG
Entity: today; NER tag: DATE
Entity: Fentanyl; NER tag: PERSON
Entity: More than 20,000; NER tag: CARDINAL
Entity: Americans; NER tag: NORP
Entity: last year; NER tag: DATE
Entity: millions; NER tag: CARDINAL
Entity: Jeff Sessions; NER tag: PERSON
Entity: This Justice Department; NER tag: ORG
Entity: Trump; NER tag: PERSON
Entity: American; NER tag: NORP
Entity: ”John N. Kapoor; NER tag: PERSON
Entity: 74; NER tag: DATE
Entity: Phoenix; NER tag: GPE
Entity: Ariz.; NER tag: GPE
Entity: the Board of Directors; NER tag: ORG
Entity: Insys; NER tag: ORG
Entity: this morning; NER tag: TIME
Entity: Arizona; NER tag: GPE
Entity: RICO; NER tag: LAW
Entity: Kapoor; NER tag: PERSON
Entity: Executive; NER tag: ORG
Entity: Board; NER tag: ORG
Entity: Insys; NER tag: ORG
Entity: Phoenix; NER tag: GPE
Entity: today; NER tag: DATE
Entity: U.S.; NER tag: GPE
Entity: District Court; NER tag: ORG
Entity: Boston; NER tag: GPE
Entity: a later date; NER tag: DAT

In [7]:
# B
#create an empty set
law_entities = set()

#check if each entity is law
for entity in spacy_pharma.ents:
    if entity.label_ == 'LAW':
        law_entities.add(entity.text + ": " + entity.label_)

# print entities with LAW tag
for item in law_entities:
    print(item)

RICO: LAW
the Controlled Substances Act: LAW


C. Use Google to summarize in one sentence what the `RICO` named entity means and why this might apply to a pharmaceutical kickbacks case (and not just a mafia case...) 

**Summarize in one sentence what the `RICO` named entity means and why this might apply to a pharmaceutical kickbacks case**


RICO stands for "Racketeer Influenced and Corrupt Organizations Act," and is a federal law which sets guidelines for who can be charged with racketeering based on their past activity. RICO could apply to a pharmaceutical kickback case because there could be a coordinated effort by pharma people to defraud consumers, the SEC, etc. with the profits they are making or the drugs they have produced (i.e. they are engaging in unlawful behavior with racketeering or fraud).

D. You want to extract the possible sentence lengths the CEO is facing; pull out the named entities with (1) the label `DATE` and (2) that contain the word year or years (hint: you may want to use the `re` module for that second part). Print these named entities.

In [8]:
# Regex to find 'year' or 'years' in the entity text
year_regex = re.compile(r'\b(year|years)\b', re.IGNORECASE)

# List to hold relevant DATE entities
relevant_dates = []

# Filtering entities
for ent in spacy_pharma.ents:
    if ent.label_ == "DATE" and year_regex.search(ent.text):
        relevant_dates.append(ent.text + ": " + ent.label_)

# Printing the filtered entities
for date in relevant_dates:
    print(date)

last year: DATE
20 years: DATE
three years: DATE
five years: DATE
three years: DATE


E. Pull and print the original parts of the press releases where those year lengths are mentioned (e.g., the sentences or rough region of the press release). Describe in your own words (1 sentence) what length of sentence (prison) and probation (supervised release) the CEO may be facing if convicted after this indictment (if there are multiple lengths mentioned describe the maximum). 

**Hint**: you may want to use re.search or re.findall 

- For part E, you can use `re.search` and `re.findall`, or anything that works 😳.

In [9]:
## your code here

sentences_with_years = []

for ent in spacy_pharma.ents:
    if ent.label_ == "DATE" and year_regex.search(ent.text):
        # Add the whole sentence containing the entity to the list
        sentences_with_years.append(ent.sent.text.strip())

# Print the relevant sentences
for sentence in sentences_with_years:
    print("Relevant Sentence:", sentence)

Relevant Sentence: "More than 20,000 Americans died of synthetic opioid overdoses last year, and millions are addicted to opioids.
Relevant Sentence: The charges of conspiracy to commit RICO and conspiracy to commit mail and wire fraud each provide for a sentence of no greater than 20 years in prison, three years of supervised release and a fine of $250,000, or twice the amount of pecuniary gain or loss.
Relevant Sentence: The charges of conspiracy to commit RICO and conspiracy to commit mail and wire fraud each provide for a sentence of no greater than 20 years in prison, three years of supervised release and a fine of $250,000, or twice the amount of pecuniary gain or loss.
Relevant Sentence: The charges of conspiracy to violate the Anti-Kickback Law provide for a sentence of no greater than five years in prison, three years of supervised release and a $25,000 fine.
Relevant Sentence: The charges of conspiracy to violate the Anti-Kickback Law provide for a sentence of no greater than

In [None]:
# hmm first sentence doesn't seem relevant? 

# CHECK! above

**Describe in your own words (1 sentence) what length of sentence (prison) and probation (supervised release) the CEO may be facing if convicted after this indictment (if there are multiple lengths mentioned describe the maximum).**


The CEO is looking at a maximum of 20 years in prison and 3 years of supervised released. There are also associated fines with the charge,

## 1.3 sentiment analysis  (10 points)

A. Subset the press releases to those labeled with one of three topics via `topics_clean`: Civil Rights, Hate Crimes, and Project Safe Childhood. We'll call this `doj_subset` going forward and it should have 717 rows.



In [10]:
##creating the subset
topics_of_interest = ["Civil Rights", "Hate Crimes", "Project Safe Childhood"]
doj_subset = doj[doj['topics_clean'].isin(topics_of_interest)]

#test the number of rows
print("Number of rows in the subset:", len(doj_subset))

Number of rows in the subset: 717


B. Write a function that takes one press release string as an input and:

- Removes named entities from each press release string (**Hint**: you may want to use `re.sub` with an or condition)
- Scores the sentiment of the entire press release using the `SentimentIntensityAnalyzer` and `polarity_scores`
- Returns the length-four (negative, positive, neutral, compound) sentiment dictionary (any order is fine)

Apply that function to each of the press releases in `doj_subset`. 

**Hints**: 

- A function + list comprehension to execute will takes about 30 seconds on a respectable local machine and about 2 mins on jhub; if it's taking a very long time, you may want to check your code for inefficiencies. If you can't fix those, for partial credit on this part/full credit on remainder, you can take a small random sample of the 717


In [11]:
test = doj_subset.loc[77]
test_str = test.contents
test_str

# Generate named entities
spacy_test = nlp(test_str)
for token in spacy_test.ents:
    print("Entity: " + token.text + "; NER tag: " + token.label_)
    
# Named entities that need to be removed
remove_entities = [token.text for token in spacy_test.ents if token.label_]
remove_entities

# Join tokens together
join_entities = "|".join(remove_entities)
join_entities

# Remove named entities from string
new_str = re.sub(entity.text, "", test_str)
#new_str = re.sub(join_entities, " ", test_str)
new_str

# Initialize scorer
sent_obj = SentimentIntensityAnalyzer()
sentiment_ex = sent_obj.polarity_scores(new_str)
sentiment_ex

'A former supervisory correctional officer at Louisiana State Penitentiary in Angola, Louisiana, pleaded guilty yesterday in connection with the beating of a handcuffed and shackled inmate, in addition to conspiring to cover up their misconduct by falsifying official records and lying to internal investigators about what happened.\xa0  \xa0 James Savoy, 39, of Marksville, Louisiana, admitted during his plea hearing that he witnessed other officers using excessive force against the inmate and failed to intervene; that he conspired with other officers to cover up the beating by engaging in a variety of obstructive acts; and that he personally falsified official prison records to cover up the attack. \xa0 Scotty Kennedy, 48, of Beebe, Arkansas, and John Sanders, 30, of Marksville, Louisiana previously pleaded guilty in November 2016, and September 2017, for their roles in the beating and cover up. \xa0 “Every citizen has the right to due process and protection from unreasonable force, and

Entity: Louisiana State Penitentiary; NER tag: ORG
Entity: Angola; NER tag: GPE
Entity: Louisiana; NER tag: GPE
Entity: yesterday; NER tag: DATE
Entity: James Savoy; NER tag: PERSON
Entity: 39; NER tag: DATE
Entity: Marksville; NER tag: GPE
Entity: Louisiana; NER tag: GPE
Entity: Scotty Kennedy; NER tag: PERSON
Entity: 48; NER tag: DATE
Entity: Beebe; NER tag: GPE
Entity: Arkansas; NER tag: GPE
Entity: John Sanders; NER tag: PERSON
Entity: 30; NER tag: DATE
Entity: Marksville; NER tag: GPE
Entity: Louisiana; NER tag: GPE
Entity: November 2016; NER tag: DATE
Entity: September 2017; NER tag: DATE
Entity: Constitutional; NER tag: LAW
Entity: John Gore; NER tag: PERSON
Entity: the Civil Rights Division; NER tag: ORG
Entity: The Justice Department; NER tag: ORG
Entity: Yesterday; NER tag: DATE
Entity: United States; NER tag: GPE
Entity: the Middle District; NER tag: LOC
Entity: Louisiana; NER tag: GPE
Entity: Corey Amundson; NER tag: PERSON
Entity: the Justice Department’s Civil Rights Divi

['Louisiana State Penitentiary',
 'Angola',
 'Louisiana',
 'yesterday',
 'James Savoy',
 '39',
 'Marksville',
 'Louisiana',
 'Scotty Kennedy',
 '48',
 'Beebe',
 'Arkansas',
 'John Sanders',
 '30',
 'Marksville',
 'Louisiana',
 'November 2016',
 'September 2017',
 'Constitutional',
 'John Gore',
 'the Civil Rights Division',
 'The Justice Department',
 'Yesterday',
 'United States',
 'the Middle District',
 'Louisiana',
 'Corey Amundson',
 'the Justice Department’s Civil Rights Division',
 'FBI',
 'FBI',
 'Baton Rouge Resident Agency',
 'U.S.',
 'Frederick A. Menner',
 'the Middle District',
 'Trial',
 'Christopher J. Perras',
 'the Civil Rights Division’s Criminal Section']

'Louisiana State Penitentiary|Angola|Louisiana|yesterday|James Savoy|39|Marksville|Louisiana|Scotty Kennedy|48|Beebe|Arkansas|John Sanders|30|Marksville|Louisiana|November 2016|September 2017|Constitutional|John Gore|the Civil Rights Division|The Justice Department|Yesterday|United States|the Middle District|Louisiana|Corey Amundson|the Justice Department’s Civil Rights Division|FBI|FBI|Baton Rouge Resident Agency|U.S.|Frederick A. Menner|the Middle District|Trial|Christopher J. Perras|the Civil Rights Division’s Criminal Section'

'A former supervisory correctional officer at Louisiana State Penitentiary in Angola, Louisiana, pleaded guilty yesterday in connection with the beating of a handcuffed and shackled inmate, in addition to conspiring to cover up their misconduct by falsifying official records and lying to internal investigators about what happened.\xa0  \xa0 James Savoy, 39, of Marksville, Louisiana, admitted during his plea hearing that he witnessed other officers using excessive force against the inmate and failed to intervene; that he conspired with other officers to cover up the beating by engaging in a variety of obstructive acts; and that he personally falsified official prison records to cover up the attack. \xa0 Scotty Kennedy, 48, of Beebe, Arkansas, and John Sanders, 30, of Marksville, Louisiana previously pleaded guilty in November 2016, and September 2017, for their roles in the beating and cover up. \xa0 “Every citizen has the right to due process and protection from unreasonable force, and

{'neg': 0.169, 'neu': 0.763, 'pos': 0.068, 'compound': -0.9893}

In [12]:
# Initialize scorer
sent_obj = SentimentIntensityAnalyzer()

def process_press(text):
    # Generate named entities
    spacy_text = nlp(text)

    # Remove named entities
    for entity in spacy_text.ents:
        text = text.replace(entity.text, "")

    # Conduct sentiment analysis
    sentiment_ex = sent_obj.polarity_scores(text)

    return sentiment_ex

In [13]:
sentiment_scores = [process_press(str) for str in doj_subset['contents']]
sentiment_scores

[{'neg': 0.2, 'neu': 0.751, 'pos': 0.049, 'compound': -0.9931},
 {'neg': 0.129, 'neu': 0.804, 'pos': 0.066, 'compound': -0.9325},
 {'neg': 0.09, 'neu': 0.835, 'pos': 0.075, 'compound': -0.7579},
 {'neg': 0.121, 'neu': 0.798, 'pos': 0.081, 'compound': -0.9037},
 {'neg': 0.175, 'neu': 0.782, 'pos': 0.043, 'compound': -0.9864},
 {'neg': 0.148, 'neu': 0.8, 'pos': 0.053, 'compound': -0.987},
 {'neg': 0.146, 'neu': 0.78, 'pos': 0.074, 'compound': -0.9559},
 {'neg': 0.09, 'neu': 0.847, 'pos': 0.063, 'compound': -0.7783},
 {'neg': 0.103, 'neu': 0.838, 'pos': 0.059, 'compound': -0.9136},
 {'neg': 0.162, 'neu': 0.784, 'pos': 0.055, 'compound': -0.9801},
 {'neg': 0.216, 'neu': 0.748, 'pos': 0.036, 'compound': -0.9973},
 {'neg': 0.091, 'neu': 0.847, 'pos': 0.061, 'compound': -0.8519},
 {'neg': 0.08, 'neu': 0.857, 'pos': 0.063, 'compound': -0.6486},
 {'neg': 0.319, 'neu': 0.643, 'pos': 0.038, 'compound': -0.995},
 {'neg': 0.175, 'neu': 0.757, 'pos': 0.067, 'compound': -0.9889},
 {'neg': 0.121, 'neu

C. Add the four sentiment scores to the `doj_subset` dataframe to create a dataframe: `doj_subset_wscore`. Sort from highest neg to lowest neg score and print the top `id`, `contents`, and `neg` columns of the two most neg press releases. 

Notes:

- Don't worry if your sentiment score differs slightly from our output on GitHub; differences in preprocessing can lead to diff scores

In [14]:
# Create new data frame of sentiment scores
sentiment = pd.DataFrame(sentiment_scores, columns = ['neg', 'neu', 'pos', 'compound'])

# Reset indices of sentiment and doj_subset
doj_subset = doj_subset.reset_index()
sentiment = sentiment.reset_index()

# Concatenate sentiment and doj_subset data frames
doj_subset_wscores = pd.concat([doj_subset, sentiment], axis = 1)

# Sort values from highest to lowest neg score
doj_subset_wscores = doj_subset_wscores.sort_values(by = 'neg', ascending = False)

# Print top two most negative press releases
top_neg = doj_subset_wscores.head(2)
top_neg[['id', 'contents', 'neg']]

Unnamed: 0,id,contents,neg
13,14-248,"The Department of Justice announced that this morning John W. Ng, 58, of Albuquerque, N.M., made his initial appearance in federal court on a criminal complaint charging him with a hate crime offense. This charge is related to anti-Semitic threats Ng made against a Jewish woman who owns and operates the Nosh Jewish Delicatessen and Bakery in Albuquerque. Ng was arrested by the FBI on March 7, 2014, based on a criminal complaint alleging that he interfered with the victim’s federally protected rights by threatening her and interfering with her business because of her religion. According to the criminal complaint, between Jan. 22, 2014, and Feb. 8, 2014, Ng allegedly posted threatening anti-Semitic notes on and in the vicinity of the victim’s business. A criminal complaint merely establishes probable cause, and Ng is presumed innocent unless proven guilty. If convicted on the offense charged in the criminal complaint, Ng faces a maximum statutory penalty of one year in prison. This matter was investigated by the Albuquerque Division of the FBI and is being prosecuted by Assistant U.S. Attorney Mark T. Baker of the U.S. Attorney’s Office for the District of New Mexico and Trial Attorney AeJean Cha of the U.S. Department of Justice’s Civil Rights Division.",0.319
34,13-312,"John Hall, 27, an Aryan Brotherhood member and inmate at the Federal Correctional Institution (FCI) in Seagoville, Texas, was sentenced today by U.S. District Judge Reed O’Connor after pleading guilty to violating the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act stemming from his assault of a fellow inmate, whom he believed to be gay, the Department of Justice announced. Hall assaulted his fellow inmate with a dangerous weapon, causing bodily injury to the victim on Dec. 20, 2011. Hall was sentenced to serve 71 months in prison to be served consecutively with the sentence he is currently serving. The assault occurred on Dec. 20, 2011, inside the FCI Seagoville when Hall targeted and attacked the victim, a fellow inmate, because he believed the victim was gay or involved in a sexual relationship with another male inmate. Hall repeatedly punched, kicked and stomped on the victim’s face with his shod feet, a dangerous weapon, while yelling a homophobic slur. The victim lost consciousness during the assault and suffered multiple lacerations to his face. The victim also sustained a fractured eye socket, lost a tooth, fractured other teeth and was treated at a hospital for the injuries he sustained during Hall’s unprovoked attack. Hall pleaded guilty to violating the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act on Nov. 8, 2012. “Brutality and violence based on sexual orientation has no place in a civilized society,” said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. “The Justice Department is committed to using all the tools in our law enforcement arsenal, including the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, to prosecute acts motivated by hate.” “This prosecution sends a clear message that this office, in partnership with attorneys in the department’s Civil Rights Division, will prioritize and aggressively prosecute hate crimes and others civil rights violations in North Texas,” said U.S. Attorney Sarah R. Saldaña of the Northern District of Texas. This case was investigated by the FBI Dallas Division. The case was prosecuted by Assistant U.S. Attorney Errin Martin and Trial Attorney Adriana Vieco of the Civil Rights Division.",0.3


# WE ARE GETTING SOMETHING DIFFERENT, CHECK ABOVE ^^^

D. With the dataframe from part C, find the mean compound sentiment score for each of the three topics in `topics_clean` using group_by and agg.

E. Add a 1 sentence interpretation of why we might see the variation in scores (remember that compound is a standardized summary where -1 is most negative; +1 is most positive)


In [15]:
# Find the mean compound score for each of the three topics in topics_clean
mean_compound = doj_subset_wscores.groupby('topics_clean')['neg'].agg('mean')
mean_compound

topics_clean
Civil Rights              0.099525
Hate Crimes               0.177472
Project Safe Childhood    0.114476
Name: neg, dtype: float64

# ALSO GETTING OFF ANS PLS CHECK ABOVE ^

**Add a 1 sentence interpretation of why we might see the variation in scores (remember that compound is a standardized summary where -1 is most negative; +1 is most positive)**

# CAN SOMEONE PLS FILL IN and ans PART E plz right here <3

# 2. Topic modeling (25 points)

For this question, use the `doj_subset_wscores` data that is restricted to civil rights, hate crimes, and project safe childhood and with the sentiment scores added


## 2.1 Preprocess the data by removing stopwords, punctuation, and non-alpha words (5 points)

A. Write a function that:

- Takes in a single raw string in the `contents` column from that dataframe
- Does the following preprocessing steps:

    - Converts the words to lowercase
    - Removes stopwords, adding the custom stopwords in your code cell below to the default stopwords list
    - Only retains alpha words (so removes digits and punctuation)
    - Only retains words 4 characters or longer
    - Uses the snowball stemmer from nltk to stem

- Returns a joined preprocessed string
    
B. Use `apply` or list comprehension to execute that function and create a new column in the data called `processed_text`
    
C. Print the `id`, `contents`, and `processed_text` columns for the following press releases:

id = 16-718 (this case: https://www.seattletimes.com/nation-world/doj-miami-police-reach-settlement-in-civil-rights-case/)

id = 16-217 (this case: https://www.wlbt.com/story/32275512/three-mississippi-correctional-officers-indicted-for-inmate-assault-and-cover-up/)
    
**Resources**:

- Here's code examples for the snowball stemmer: https://www.geeksforgeeks.org/snowball-stemmer-nlp/

In [16]:
custom_doj_stopwords = ["civil", "rights", "division", "department", "justice",
                        "office", "attorney", "district", "case", "investigation", "assistant",
                       "trial", "assistance", "assist"]

In [17]:
# A

# Initialize Snowball Stemmer
porter = SnowballStemmer(language='english')

# Creates custom list of stop words
list_stopwords = stopwords.words("english")
list_stopwords_new = list_stopwords + custom_doj_stopwords

# Does the following preprocessing steps:

# Converts the words to lowercase
# HOW TO DO THIS: Removes stopwords, adding the custom stopwords in your code cell below to the default stopwords list
# Only retains alpha words (so removes digits and punctuation)
# Only retains words 4 characters or longer
# Uses the snowball stemmer from nltk to stem
# Returns a joined preprocessed string

# Define function
def preprocess_data(text):
    
    # Converts words to lowercase
    corpus_lower = text.lower()
    
    nostop = [word for word in wordpunct_tokenize(corpus_lower) if word not in list_stopwords_new]
    clean = [porter.stem(word) for word in nostop if word.isalpha() and len(word) >= 4]
    clean_str = " ".join(clean)

    return(clean_str)

# HOW TO ADD CUSTOM STOPWORDS TO YOUR CODE CELL??!?!?!?!?!?!?!?!?!?

In [18]:
# B
# List comprehension to execute function
preprocessed = [preprocess_data(str) for str in doj_subset_wscores.contents]

# Create a new column in the data called processed_text
doj_subset_wscores['processed_text'] = preprocessed

In [19]:
## your code showing the examples
# C
# Print the id, contents, and processed_text columns for the following press releases:
two_cases = doj_subset_wscores[doj_subset_wscores['id'].isin(['16-718', '16-217'])]
two_cases[['id', 'contents', 'processed_text']]

Unnamed: 0,id,contents,processed_text
632,16-718,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",nine count indict unseal today mississippi correct offic charg beat inmat third charg help cover indict charg lawardrick marsher robert sturdiv offic mississippi state penitentiari parchman mississippi beat includ kick punch throw victim ground marsher sturdiv charg violat right convict prison free cruel unusu punish sturdiv also charg fail interven marsher punch beat indict alleg action involv danger weapon result bodili injuri victim third offic deont pate charg along marsher sturdiv conspir cover beat indict alleg three offic submit fals report three lie convict marsher sturdiv face maximum sentenc year prison excess forc charg three offic face five year prison conspiraci fals statement charg year prison fals report charg indict mere accus defend presum innoc unless proven guilti investig jackson cooper mississippi correct prosecut robert coleman northern mississippi dana mulhaus crimin section marsher indict
313,16-217,"The Justice Department has reached a comprehensive settlement agreement with the city of Miami and the Miami Police Department (MPD) resolving the Justice Department’s investigation of officer-involved shootings by MPD officers, announced Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division and U.S. Attorney Wifredo A. Ferrer of the Southern District of Florida. The settlement, which was approved by Miami’s city commission today and will go into effect when the agreement is signed by all parties, resolves claims stemming from the Justice Department’s investigation into officer-involved shootings by MPD officers, which was conducted under the Violent Crime Control and Law Enforcement Act of 1994. The investigation’s findings, issued in July 2013, identified a pattern or practice of excessive use of force through officer-involved shootings in violation of the Fourth Amendment of the Constitution. The city’s compliance with the settlement will be monitored by an independent reviewer, former Tampa, Florida, Police Chief Jane Castor. Under the settlement agreement, the city will implement comprehensive reforms to ensure constitutional policing and support public trust. The settlement agreement is designed to minimize officer-involved shootings and to more effectively and quickly investigate officer-involved shootings that do occur, through measures that include: “This settlement represents a renewed commitment by the city of Miami and Chief Rodolfo Llanes to provide constitutional policing for Miami residents and to protect public safety through sustainable reform,” said Principal Deputy Assistant Attorney General Gupta. “The agreement will help to strengthen the relationship between the MPD and the communities they serve by improving accountability for officers who fire their weapons unlawfully, and provides for community participation in the enforcement of this agreement.” “Today's agreement is the result of a joint effort between the Department of Justice and the City of Miami to ensure that the Miami Police Department continues its efforts to make our community safe while protecting the sacred Constitutional rights of all of our citizens,” said U.S. Attorney Ferrer. “Through oversight and communication, the agreement seeks to make permanent the positive changes that former Chief Orosa and Chief Llanes have made, and we applaud the City Commission’s vote.” The settlement agreement builds upon important reforms implemented by the city since the Justice Department issued its findings, including: The investigation was conducted by attorneys and staff from the Civil Rights Division’s Special Litigation Section and the Civil Division of the U. S. Attorney’s Office of the Southern District of Florida.",reach comprehens settlement agreement citi miami miami polic resolv offic involv shoot offic announc princip deputi general vanita gupta head wifredo ferrer southern florida settlement approv miami citi commiss today effect agreement sign parti resolv claim stem offic involv shoot offic conduct violent crime control enforc find issu juli identifi pattern practic excess forc offic involv shoot violat fourth amend constitut citi complianc settlement monitor independ review former tampa florida polic chief jane castor settlement agreement citi implement comprehens reform ensur constitut polic support public trust settlement agreement design minim offic involv shoot effect quick investig offic involv shoot occur measur includ settlement repres renew commit citi miami chief rodolfo llane provid constitut polic miami resid protect public safeti sustain reform said princip deputi general gupta agreement help strengthen relationship communiti serv improv account offic fire weapon unlaw provid communiti particip enforc agreement today agreement result joint effort citi miami ensur miami polic continu effort make communiti safe protect sacr constitut citizen said ferrer oversight communic agreement seek make perman posit chang former chief orosa chief llane made applaud citi commiss vote settlement agreement build upon import reform implement citi sinc issu find includ conduct attorney staff special litig section southern florida


# We are getting some errors, most likely from the function above YIKES ^^ above 
# (probs from part A)

## 2.2 Create a document-term matrix from the preprocessed press releases and to explore top words (5 points)

A. Use the `create_dtm` function I provide (alternately, feel free to write your own!) and create a document-term matrix using the preprocessed press releases; make sure metadata contains the following columns: `id`, `compound` sentiment column you added, and the `topics_clean` column

B. Print the top 10 words for press releases with compound sentiment in the top 5% (so the most positive sentiment)

C. Print the top 10 words for press releases with compound sentiment in the bottom 5% (so the most negative sentiment)

**Hint**: for these, remember the pandas quantile function from pset one.  

D. Print the top 10 words for press releases in each of the three `topics_clean`

For steps B - D, to receive full credit, write a function `get_topwords` that helps you avoid duplicated code when you find top words for the different subsets of the data. There are different ways to structure it but one way is to feed it subsetted data (so data subsetted to one topic etc.) and for it to get the top words for that subset.


In [20]:
def create_dtm(list_of_strings, metadata):
    
    vectorizer = CountVectorizer(lowercase = True)
    
    dtm_sparse = vectorizer.fit_transform(list_of_strings)
    
    dtm_dense_named = pd.DataFrame(dtm_sparse.todense(), columns=vectorizer.get_feature_names_out())
    
    dtm_dense_named_withid = pd.concat([metadata.reset_index(), dtm_dense_named], axis = 1)
    
    return(dtm_dense_named_withid)

In [21]:
# your code here
# A
# Metadata list
metadata_list = ['id', 'compound', 'topics_clean']

# Create a document-term matrix using the preprocessed press releases
doj_dtm = create_dtm(list_of_strings = doj_subset_wscores['processed_text'],
                    metadata = doj_subset_wscores[metadata_list])

doj_dtm

Unnamed: 0,index,id,compound,topics_clean,aaron,abandon,abbat,abbi,abbott,abdomen,...,zane,zealand,zealous,zeeman,zero,zionism,zobel,zone,zunggeemog,zwengel
0,13,14-248,-0.9950,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34,13-312,-0.9983,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,632,16-718,-0.9964,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,22,11-626,-0.9986,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,594,10-1194,-0.9990,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
712,392,16-539,0.9854,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
713,324,17-132,0.9794,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
714,346,17-003,0.9766,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
715,72,17-271,0.9909,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
# Initialize vectorizer
vectorizer = CountVectorizer(lowercase=True)

# B
# Calculate 95%ile of compound sentiment scores (most positive)
top_5 = doj_subset_wscores['compound'].quantile(.95)

# Select press releases with compound sentiment scores in top 5%
most_pos_sent = doj_subset_wscores[doj_subset_wscores['compound'] >= top_5]

# Transform selected press releases into DTM
dtm_pos = vectorizer.fit_transform(most_pos_sent['processed_text'])

# Sum frequencies of each word in DTM
pos_freq = dtm_pos.sum(axis = 0)

# List of positive words
word_list_pos = vectorizer.get_feature_names_out()

# Top 10 words with positive sentiment
print("Top 10 words with positive sentiment:")
for idx in pos_freq.argsort()[0, -10:]:
    print(word_list_pos[idx])

# C
# Calculate 5%ile of compound sentiment scores (most negative)
bottom_5 = doj_subset_wscores['compound'].quantile(.05)

# Press releases in the bottom 5%
neg_sent = doj_subset_wscores[doj_subset_wscores['compound'] >= bottom_5]

# Make into DTM
dtm_neg = vectorizer.fit_transform(neg_sent['processed_text'])

# Sum frequencies of each word
neg_freq = dtm_neg.sum(axis = 0)

# Top 10 words with negative sentiment
word_list_neg = vectorizer.get_feature_names_out()
print("Top 10 words with negative sentiment:")
for idx in neg_freq.argsort()[0, -10:]:
    print(word_list_neg[idx])

Top 10 words with positive sentiment:
[['servic' 'general' 'student' 'settlement' 'communiti' 'disabl' 'ensur'
  'state' 'enforc' 'agreement']]
Top 10 words with negative sentiment:
[['general' 'year' 'charg' 'state' 'said' 'sentenc' 'prosecut' 'victim'
  'feder' 'child']]


In [23]:
# D
# Print the top 10 words for press releases in each of the three topics_clean
topics_3 = doj_subset_wscores.topics_clean.unique()

for topic in topics_3:

    # Filter for topics
    topic_new = doj_subset_wscores[doj_subset_wscores['topics_clean'] == topic]

    # Create dtm 
    dtm_topic = vectorizer.fit_transform(topic_new['processed_text'])

    # Sum frequency of each word
    freq_topic = dtm_topic.sum(axis = 0)

    # Convert to DataFrame
    topic_df = pd.DataFrame(freq_topic, columns=vectorizer.get_feature_names_out())
    
    # Sorting from most frequent to least frequent
    top_words = topic_df.sum().sort_values(ascending=False)
    
    # Print the top 10 words for the current topic
    print(f"\nTop 10 words for press releases in topic: {topic}")
    print(top_words.head(10))


Top 10 words for press releases in topic: Hate Crimes
victim      591
crime       557
hate        524
defend      484
prosecut    478
charg       463
sentenc     455
american    451
feder       432
guilti      430
dtype: int64

Top 10 words for press releases in topic: Civil Rights
offic        637
hous         633
discrimin    616
enforc       544
disabl       532
said         497
feder        479
violat       477
state        452
court        414
dtype: int64

Top 10 words for press releases in topic: Project Safe Childhood
child          1022
exploit         701
sexual          572
safe            479
childhood       474
project         472
pornographi     452
children        423
crimin          405
prosecut        374
dtype: int64


## 2.3 Estimate a topic model using those preprocessed words (5 points)

A. Going back to the preprocessed words from part 2.3.1, estimate a topic model with 3 topics, since you want to see if the unsupervised topic models recover different themes for each of the three manually-labeled areas (civil rights; hate crimes; project safe childhood). You have free rein over the other topic model parameters beyond the number of topics.

B. After estimating the topic model, print the top 15 words in each topic.

**Hints and Resources**:

- Same topic modeling resources linked to above
- Make sure to use the `random_state` argument within the model so that the numbering of topics does not move around between runs of your code

In [24]:
# A - Create topic model from preprocessed strings
# Retokenize
text_tokens = [word_tokenize(text) for text in doj_subset_wscores.processed_text]

# Create dictionary
text_proc_dict = corpora.Dictionary(text_tokens)

# Filter dictionary w/ 2% bounds
text_proc_dict.filter_extremes(no_below = round(doj_subset_wscores.shape[0]*0.02),
                             no_above = round(doj_subset_wscores.shape[0]*0.98))

# Create corpus from dictionary
corpus_fromdict_proc = [text_proc_dict.doc2bow(text) 
                       for text in text_tokens]

# Estimate model
n_topics = 3
lda_model = gensim.models.LdaModel(corpus_fromdict_proc, num_topics=n_topics, id2word=text_proc_dict, passes=10)
# #lda_model = gensim.models.ldamodel.LdaModel(corpus_fromdict_proc, 
#                                               num_topics = n_topics, 
#                                               id2word=text_proc_dict, 
#                                               passes=6, alpha = 'auto',
#                                               per_word_topics = True, 
#                                               random_state = 91988)

# topics = lda_model.print_topics(num_words = 15)
# for topic in topics:
#     print(topic)

# B - print the top 15 words
for topic_id, topic_words in lda_model.print_topics(num_words = 15):
    print(f"Topic {topic_id + 1}: {topic_words}")

Topic 1: 0.013*"victim" + 0.012*"charg" + 0.011*"sentenc" + 0.011*"prosecut" + 0.011*"defend" + 0.011*"crime" + 0.011*"feder" + 0.010*"guilti" + 0.010*"said" + 0.009*"hate" + 0.009*"indict" + 0.009*"year" + 0.008*"prison" + 0.008*"investig" + 0.008*"american"
Topic 2: 0.014*"hous" + 0.014*"discrimin" + 0.012*"disabl" + 0.009*"enforc" + 0.009*"agreement" + 0.008*"state" + 0.008*"said" + 0.008*"court" + 0.007*"alleg" + 0.007*"requir" + 0.007*"settlement" + 0.007*"fair" + 0.007*"feder" + 0.007*"provid" + 0.007*"violat"
Topic 3: 0.034*"child" + 0.023*"exploit" + 0.020*"sexual" + 0.016*"safe" + 0.015*"project" + 0.015*"childhood" + 0.015*"pornographi" + 0.014*"children" + 0.013*"crimin" + 0.012*"prosecut" + 0.011*"sentenc" + 0.011*"victim" + 0.010*"ceo" + 0.010*"minor" + 0.010*"year"


## 2.4 Add topics back to main data and explore correlation between manual labels and our estimated topics (10 points)

A. Extract the document-level topic probabilities. Within `get_document_topics`, use the argument `minimum_probability` = 0 to make sure all 3 topic probabilities are returned. Write an assert statement to make sure the length of the list is equal to the number of rows in the `doj_subset_wscores` dataframe

B. Add the topic probabilities to the `doj_subset_wscores` dataframe as columns and create a column, `top_topic`, that reflects each document to its highest-probability topic (eg topic 1, 2, or 3)

C. For each of the manual labels in `topics_clean` (Hate Crime, Civil Rights, Project Safe Childhood), print the breakdown of the % of documents with each top topic (so, for instance, Hate Crime has 246 documents-- if 123 of those documents are coded to topic_1, that would be 50%; and so on). **Hint**: pd.crosstab and normalize may be helpful: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.crosstab.html

D. Using a couple press releases as examples, write a 1-2 sentence interpretation of why some of the manual topics map on more cleanly to an estimated topic than other manual topic(s)


In [31]:
## your code here to get doc-level topic probabilities 
# A
topic_prob_bydoc = [lda_model.get_document_topics(item, minimum_probability=0) for item in corpus_fromdict_proc]

assert len(topic_prob_bydoc) == len(doj_subset_wscores), "The length of topic probabilities list does not match the number of rows in doj_subset_wscores dataframe"

In [32]:
## your code here to add those topic probabilities to the dataframe
# B

one_list_tup = topic_prob_bydoc[0]
one_list_tup

# create a long form dataframe by flattening the list
topic_prob_bydoc_long = pd.DataFrame([t for lst in topic_prob_bydoc for t in lst],
                                     columns = ['topic', 'probability'])

# add id var for the number of topics
topic_prob_bydoc_long['text'] = list(np.concatenate([[text] * 
                                    n_topics for text in doj_subset_wscores.processed_text]).flat)

# pivot to wide format
topic_prob_bydoc_wide = pd.pivot_table(topic_prob_bydoc_long, index = ['text'],
                        columns = ['topic']).reset_index().reset_index(drop = True)
topic_prob_bydoc_wide.columns = ['text'] + ["topic_" + str(i) for i in np.arange(0, n_topics)]

# merge w/ original data set
doj_subset_wscores = pd.merge(topic_prob_bydoc_wide, doj_subset_wscores, left_on = 'text', right_on = 'processed_text')

# # Identify topic columns
# topic_columns = [col for col in doj_subset_wscores.columns if col.startswith('topic_')]

# # Find the column with the highest probability for each row
# doj_subset_wscores['top_topic'] = doj_subset_wscores[topic_columns].idxmax(axis=1)

# # If you want to get the topic number as an integer instead of column name:
# # Extract the topic number from the column name
# doj_subset_wscores['top_topic'] = doj_subset_wscores['top_topic'].str.replace('topic_', '').astype(int)

# # create indicator for top topics
# Identify topic columns


# Create TOP TOPIC column -- HOW TO DO?!?!??!!??!?!?!

[(0, 0.87021095), (1, 0.12335572), (2, 0.006433396)]

MergeError: Passing 'suffixes' which cause duplicate columns {'topic_2_y', 'text_y', 'topic_1_y', 'topic_0_y'} is not allowed.

In [None]:
## create indicator for listing's top topic
topic_wmeta['toptopic'] = topic_wmeta[[col for col in topic_wmeta.columns if 
                                    "topic_" in col]].idxmax(axis=1)
topic_wmeta.sample(n = 5, random_state = 555)

## group by topic and find mean price
topic_wmeta.groupby('toptopic').agg({'price_rawdata': np.mean})

## group by borough and topic -- higher price for some also reflects
## diff borough composition
topic_wmeta.groupby(['toptopic', 
                    'neighbourhood_group']).agg({'price_rawdata': np.mean})

In [26]:
## your code here to summarize the topic proportions for each of the topics_clean 

**D. Using a couple press releases as examples, write a 1-2 sentence interpretation of why some of the manual topics map on more cleanly to an estimated topic than other manual topic(s)**

# FILL IN

# 3. Extend the analysis from unigrams to bigrams (10 points)

In the previous question, you found top words via a unigram representation of the text. Now, we want to see how those top words change with bigrams (pairs of words)

A. Using the `doj_subset_wscore` data and the `processed_text` column (so the words after stemming/other preprocessing), create a column in the data called `processed_text_bigrams` that combines each consecutive pairs of word into a bigram separated by an underscore. Eg:

"depart reach settlem" would become "depart_reach reach_settlem"

Do this by writing a function `create_bigram_onedoc` that takes in a single `processed_text` string and returns a string with its bigrams structured similarly to above example
 
**Hint**: there are many ways to solve but `zip` may be helpful: https://stackoverflow.com/questions/21303224/iterate-over-all-pairs-of-consecutive-items-in-a-list

B. Print the `id`, `processed_text`, and `processed_text_bigram` columns for press release with id = 16-217

In [37]:
# A

# Write bigrams function
def create_bigram_one(text):
    

# Create new column and combine words with "_"
doj_subset_wscore['processed_text_bigrams'] =

Unnamed: 0,text,topic_0,topic_1,topic_2,text_x,topic_0_x,topic_1_x,topic_2_x,text_y,topic_0_y,...,topics_clean,components_clean,index,neg,neu,pos,compound,processed_text,top_topic,processed_text_bigrams
0,act general jocelyn samuel gregori davi southern mississippi announc today feder grand juri indict john loui blalack brandon mississippi sarah adelia grave crystal spring mississippi robert henri rice brandon shelbi brook richard pearl mississippi alleg role conspiraci commit feder hate crime african american peopl jackson mississippi blalack grave richard addit charg racial motiv hate crime result death victim truck blalack rice charg addit racial motiv hate crime involv alleg assault carri firearm relat assault grave richard addit charg solicit other commit hate crime african american grave charg make fals statement defend deryl paul dedmon john aaron rice dylan wade butler william kirk montgomeri jonathan kyle gaskamp joseph dominick brandon previous enter guilti plea connect role offens indict alleg begin spring defend other conspir anoth harass assault african american peopl around jackson area accord indict numer occas conspir use danger weapon includ beer bottl sling shot motor vehicl caus attempt caus bodili injuri african american peopl conspir alleg specif target african american peopl believ homeless influenc alcohol believ individu would less like report assault conspir would often boast racial motiv assault indict detail sever assault includ fatal assault victim intent defend face statutori maximum sentenc life prison result cooper effort southern mississippi hind counti investig jackson jackson polic prosecut sheldon beer deputi chief paig fitzgerald glenda hayn southern mississippi charg forth indict mere accus defend presum innoc proven guilti,0.870248,0.123368,0.006385,act general jocelyn samuel gregori davi southern mississippi announc today feder grand juri indict john loui blalack brandon mississippi sarah adelia grave crystal spring mississippi robert henri rice brandon shelbi brook richard pearl mississippi alleg role conspiraci commit feder hate crime african american peopl jackson mississippi blalack grave richard addit charg racial motiv hate crime result death victim truck blalack rice charg addit racial motiv hate crime involv alleg assault carri firearm relat assault grave richard addit charg solicit other commit hate crime african american grave charg make fals statement defend deryl paul dedmon john aaron rice dylan wade butler william kirk montgomeri jonathan kyle gaskamp joseph dominick brandon previous enter guilti plea connect role offens indict alleg begin spring defend other conspir anoth harass assault african american peopl around jackson area accord indict numer occas conspir use danger weapon includ beer bottl sling shot motor vehicl caus attempt caus bodili injuri african american peopl conspir alleg specif target african american peopl believ homeless influenc alcohol believ individu would less like report assault conspir would often boast racial motiv assault indict detail sever assault includ fatal assault victim intent defend face statutori maximum sentenc life prison result cooper effort southern mississippi hind counti investig jackson jackson polic prosecut sheldon beer deputi chief paig fitzgerald glenda hayn southern mississippi charg forth indict mere accus defend presum innoc proven guilti,0.870248,0.123368,0.006385,act general jocelyn samuel gregori davi southern mississippi announc today feder grand juri indict john loui blalack brandon mississippi sarah adelia grave crystal spring mississippi robert henri rice brandon shelbi brook richard pearl mississippi alleg role conspiraci commit feder hate crime african american peopl jackson mississippi blalack grave richard addit charg racial motiv hate crime result death victim truck blalack rice charg addit racial motiv hate crime involv alleg assault carri firearm relat assault grave richard addit charg solicit other commit hate crime african american grave charg make fals statement defend deryl paul dedmon john aaron rice dylan wade butler william kirk montgomeri jonathan kyle gaskamp joseph dominick brandon previous enter guilti plea connect role offens indict alleg begin spring defend other conspir anoth harass assault african american peopl around jackson area accord indict numer occas conspir use danger weapon includ beer bottl sling shot motor vehicl caus attempt caus bodili injuri african american peopl conspir alleg specif target african american peopl believ homeless influenc alcohol believ individu would less like report assault conspir would often boast racial motiv assault indict detail sever assault includ fatal assault victim intent defend face statutori maximum sentenc life prison result cooper effort southern mississippi hind counti investig jackson jackson polic prosecut sheldon beer deputi chief paig fitzgerald glenda hayn southern mississippi charg forth indict mere accus defend presum innoc proven guilti,0.996376,...,Hate Crimes,Civil Rights Division,213,0.221,0.733,0.046,-0.9973,act general jocelyn samuel gregori davi southern mississippi announc today feder grand juri indict john loui blalack brandon mississippi sarah adelia grave crystal spring mississippi robert henri rice brandon shelbi brook richard pearl mississippi alleg role conspiraci commit feder hate crime african american peopl jackson mississippi blalack grave richard addit charg racial motiv hate crime result death victim truck blalack rice charg addit racial motiv hate crime involv alleg assault carri firearm relat assault grave richard addit charg solicit other commit hate crime african american grave charg make fals statement defend deryl paul dedmon john aaron rice dylan wade butler william kirk montgomeri jonathan kyle gaskamp joseph dominick brandon previous enter guilti plea connect role offens indict alleg begin spring defend other conspir anoth harass assault african american peopl around jackson area accord indict numer occas conspir use danger weapon includ beer bottl sling shot motor vehicl caus attempt caus bodili injuri african american peopl conspir alleg specif target african american peopl believ homeless influenc alcohol believ individu would less like report assault conspir would often boast racial motiv assault indict detail sever assault includ fatal assault victim intent defend face statutori maximum sentenc life prison result cooper effort southern mississippi hind counti investig jackson jackson polic prosecut sheldon beer deputi chief paig fitzgerald glenda hayn southern mississippi charg forth indict mere accus defend presum innoc proven guilti,topic_0_y,a_c_t_ _g_e_n_e_r_a_l_ _j_o_c_e_l_y_n_ _s_a_m_u_e_l_ _g_r_e_g_o_r_i_ _d_a_v_i_ _s_o_u_t_h_e_r_n_ _m_i_s_s_i_s_s_i_p_p_i_ _a_n_n_o_u_n_c_ _t_o_d_a_y_ _f_e_d_e_r_ _g_r_a_n_d_ _j_u_r_i_ _i_n_d_i_c_t_ _j_o_h_n_ _l_o_u_i_ _b_l_a_l_a_c_k_ _b_r_a_n_d_o_n_ _m_i_s_s_i_s_s_i_p_p_i_ _s_a_r_a_h_ _a_d_e_l_i_a_ _g_r_a_v_e_ _c_r_y_s_t_a_l_ _s_p_r_i_n_g_ _m_i_s_s_i_s_s_i_p_p_i_ _r_o_b_e_r_t_ _h_e_n_r_i_ _r_i_c_e_ _b_r_a_n_d_o_n_ _s_h_e_l_b_i_ _b_r_o_o_k_ _r_i_c_h_a_r_d_ _p_e_a_r_l_ _m_i_s_s_i_s_s_i_p_p_i_ _a_l_l_e_g_ _r_o_l_e_ _c_o_n_s_p_i_r_a_c_i_ _c_o_m_m_i_t_ _f_e_d_e_r_ _h_a_t_e_ _c_r_i_m_e_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _p_e_o_p_l_ _j_a_c_k_s_o_n_ _m_i_s_s_i_s_s_i_p_p_i_ _b_l_a_l_a_c_k_ _g_r_a_v_e_ _r_i_c_h_a_r_d_ _a_d_d_i_t_ _c_h_a_r_g_ _r_a_c_i_a_l_ _m_o_t_i_v_ _h_a_t_e_ _c_r_i_m_e_ _r_e_s_u_l_t_ _d_e_a_t_h_ _v_i_c_t_i_m_ _t_r_u_c_k_ _b_l_a_l_a_c_k_ _r_i_c_e_ _c_h_a_r_g_ _a_d_d_i_t_ _r_a_c_i_a_l_ _m_o_t_i_v_ _h_a_t_e_ _c_r_i_m_e_ _i_n_v_o_l_v_ _a_l_l_e_g_ _a_s_s_a_u_l_t_ _c_a_r_r_i_ _f_i_r_e_a_r_m_ _r_e_l_a_t_ _a_s_s_a_u_l_t_ _g_r_a_v_e_ _r_i_c_h_a_r_d_ _a_d_d_i_t_ _c_h_a_r_g_ _s_o_l_i_c_i_t_ _o_t_h_e_r_ _c_o_m_m_i_t_ _h_a_t_e_ _c_r_i_m_e_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _g_r_a_v_e_ _c_h_a_r_g_ _m_a_k_e_ _f_a_l_s_ _s_t_a_t_e_m_e_n_t_ _d_e_f_e_n_d_ _d_e_r_y_l_ _p_a_u_l_ _d_e_d_m_o_n_ _j_o_h_n_ _a_a_r_o_n_ _r_i_c_e_ _d_y_l_a_n_ _w_a_d_e_ _b_u_t_l_e_r_ _w_i_l_l_i_a_m_ _k_i_r_k_ _m_o_n_t_g_o_m_e_r_i_ _j_o_n_a_t_h_a_n_ _k_y_l_e_ _g_a_s_k_a_m_p_ _j_o_s_e_p_h_ _d_o_m_i_n_i_c_k_ _b_r_a_n_d_o_n_ _p_r_e_v_i_o_u_s_ _e_n_t_e_r_ _g_u_i_l_t_i_ _p_l_e_a_ _c_o_n_n_e_c_t_ _r_o_l_e_ _o_f_f_e_n_s_ _i_n_d_i_c_t_ _a_l_l_e_g_ _b_e_g_i_n_ _s_p_r_i_n_g_ _d_e_f_e_n_d_ _o_t_h_e_r_ _c_o_n_s_p_i_r_ _a_n_o_t_h_ _h_a_r_a_s_s_ _a_s_s_a_u_l_t_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _p_e_o_p_l_ _a_r_o_u_n_d_ _j_a_c_k_s_o_n_ _a_r_e_a_ _a_c_c_o_r_d_ _i_n_d_i_c_t_ _n_u_m_e_r_ _o_c_c_a_s_ _c_o_n_s_p_i_r_ _u_s_e_ _d_a_n_g_e_r_ _w_e_a_p_o_n_ _i_n_c_l_u_d_ _b_e_e_r_ _b_o_t_t_l_ _s_l_i_n_g_ _s_h_o_t_ _m_o_t_o_r_ _v_e_h_i_c_l_ _c_a_u_s_ _a_t_t_e_m_p_t_ _c_a_u_s_ _b_o_d_i_l_i_ _i_n_j_u_r_i_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _p_e_o_p_l_ _c_o_n_s_p_i_r_ _a_l_l_e_g_ _s_p_e_c_i_f_ _t_a_r_g_e_t_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _p_e_o_p_l_ _b_e_l_i_e_v_ _h_o_m_e_l_e_s_s_ _i_n_f_l_u_e_n_c_ _a_l_c_o_h_o_l_ _b_e_l_i_e_v_ _i_n_d_i_v_i_d_u_ _w_o_u_l_d_ _l_e_s_s_ _l_i_k_e_ _r_e_p_o_r_t_ _a_s_s_a_u_l_t_ _c_o_n_s_p_i_r_ _w_o_u_l_d_ _o_f_t_e_n_ _b_o_a_s_t_ _r_a_c_i_a_l_ _m_o_t_i_v_ _a_s_s_a_u_l_t_ _i_n_d_i_c_t_ _d_e_t_a_i_l_ _s_e_v_e_r_ _a_s_s_a_u_l_t_ _i_n_c_l_u_d_ _f_a_t_a_l_ _a_s_s_a_u_l_t_ _v_i_c_t_i_m_ _i_n_t_e_n_t_ _d_e_f_e_n_d_ _f_a_c_e_ _s_t_a_t_u_t_o_r_i_ _m_a_x_i_m_u_m_ _s_e_n_t_e_n_c_ _l_i_f_e_ _p_r_i_s_o_n_ _r_e_s_u_l_t_ _c_o_o_p_e_r_ _e_f_f_o_r_t_ _s_o_u_t_h_e_r_n_ _m_i_s_s_i_s_s_i_p_p_i_ _h_i_n_d_ _c_o_u_n_t_i_ _i_n_v_e_s_t_i_g_ _j_a_c_k_s_o_n_ _j_a_c_k_s_o_n_ _p_o_l_i_c_ _p_r_o_s_e_c_u_t_ _s_h_e_l_d_o_n_ _b_e_e_r_ _d_e_p_u_t_i_ _c_h_i_e_f_ _p_a_i_g_ _f_i_t_z_g_e_r_a_l_d_ _g_l_e_n_d_a_ _h_a_y_n_ _s_o_u_t_h_e_r_n_ _m_i_s_s_i_s_s_i_p_p_i_ _c_h_a_r_g_ _f_o_r_t_h_ _i_n_d_i_c_t_ _m_e_r_e_ _a_c_c_u_s_ _d_e_f_e_n_d_ _p_r_e_s_u_m_ _i_n_n_o_c_ _p_r_o_v_e_n_ _g_u_i_l_t_i
1,act general jocelyn samuel joyc white vanc northern alabama special agent charg richard shwein announc talladega counti sentenc feder court today attempt hire member klux klan murder african american neighbor allen wayn densen morgan munford plead guilti judg karon bowdr count use caus someon els interst facil travel intent commit murder hire today sentenc hear judg bowdr sentenc morgan serv month prison follow three year supervis releas morgan previous admit august attempt hire member murder neighbor accord morgan plea agreement morgan spoke phone undercov agent identifi member arrang meet three day later oxford motel discuss payment murder phone convers morgan use racial slur describ want kill brag fire sever shot toward intimid morgan also describ detail want hung tree like deer gut bodi part slow pain death august morgan agent pose member morgan offer watch necklac payment murder gave explicit direct tortur murder defend attempt neighbor tortur murder said act general samuel today sentenc demonstr continu aggress prosecut racial hatr seek inflict act violenc other morgan detail calcul desir neighbor life brutal heinous mean said vanc today sentenc reinforc vigilant accept societi prosecut crime investig prosecut attorney meadow brad felton northern alabama david rees,0.994770,0.002559,0.002671,act general jocelyn samuel joyc white vanc northern alabama special agent charg richard shwein announc talladega counti sentenc feder court today attempt hire member klux klan murder african american neighbor allen wayn densen morgan munford plead guilti judg karon bowdr count use caus someon els interst facil travel intent commit murder hire today sentenc hear judg bowdr sentenc morgan serv month prison follow three year supervis releas morgan previous admit august attempt hire member murder neighbor accord morgan plea agreement morgan spoke phone undercov agent identifi member arrang meet three day later oxford motel discuss payment murder phone convers morgan use racial slur describ want kill brag fire sever shot toward intimid morgan also describ detail want hung tree like deer gut bodi part slow pain death august morgan agent pose member morgan offer watch necklac payment murder gave explicit direct tortur murder defend attempt neighbor tortur murder said act general samuel today sentenc demonstr continu aggress prosecut racial hatr seek inflict act violenc other morgan detail calcul desir neighbor life brutal heinous mean said vanc today sentenc reinforc vigilant accept societi prosecut crime investig prosecut attorney meadow brad felton northern alabama david rees,0.994770,0.002559,0.002671,act general jocelyn samuel joyc white vanc northern alabama special agent charg richard shwein announc talladega counti sentenc feder court today attempt hire member klux klan murder african american neighbor allen wayn densen morgan munford plead guilti judg karon bowdr count use caus someon els interst facil travel intent commit murder hire today sentenc hear judg bowdr sentenc morgan serv month prison follow three year supervis releas morgan previous admit august attempt hire member murder neighbor accord morgan plea agreement morgan spoke phone undercov agent identifi member arrang meet three day later oxford motel discuss payment murder phone convers morgan use racial slur describ want kill brag fire sever shot toward intimid morgan also describ detail want hung tree like deer gut bodi part slow pain death august morgan agent pose member morgan offer watch necklac payment murder gave explicit direct tortur murder defend attempt neighbor tortur murder said act general samuel today sentenc demonstr continu aggress prosecut racial hatr seek inflict act violenc other morgan detail calcul desir neighbor life brutal heinous mean said vanc today sentenc reinforc vigilant accept societi prosecut crime investig prosecut attorney meadow brad felton northern alabama david rees,0.964960,...,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,10,0.216,0.748,0.036,-0.9973,act general jocelyn samuel joyc white vanc northern alabama special agent charg richard shwein announc talladega counti sentenc feder court today attempt hire member klux klan murder african american neighbor allen wayn densen morgan munford plead guilti judg karon bowdr count use caus someon els interst facil travel intent commit murder hire today sentenc hear judg bowdr sentenc morgan serv month prison follow three year supervis releas morgan previous admit august attempt hire member murder neighbor accord morgan plea agreement morgan spoke phone undercov agent identifi member arrang meet three day later oxford motel discuss payment murder phone convers morgan use racial slur describ want kill brag fire sever shot toward intimid morgan also describ detail want hung tree like deer gut bodi part slow pain death august morgan agent pose member morgan offer watch necklac payment murder gave explicit direct tortur murder defend attempt neighbor tortur murder said act general samuel today sentenc demonstr continu aggress prosecut racial hatr seek inflict act violenc other morgan detail calcul desir neighbor life brutal heinous mean said vanc today sentenc reinforc vigilant accept societi prosecut crime investig prosecut attorney meadow brad felton northern alabama david rees,topic_0,a_c_t_ _g_e_n_e_r_a_l_ _j_o_c_e_l_y_n_ _s_a_m_u_e_l_ _j_o_y_c_ _w_h_i_t_e_ _v_a_n_c_ _n_o_r_t_h_e_r_n_ _a_l_a_b_a_m_a_ _s_p_e_c_i_a_l_ _a_g_e_n_t_ _c_h_a_r_g_ _r_i_c_h_a_r_d_ _s_h_w_e_i_n_ _a_n_n_o_u_n_c_ _t_a_l_l_a_d_e_g_a_ _c_o_u_n_t_i_ _s_e_n_t_e_n_c_ _f_e_d_e_r_ _c_o_u_r_t_ _t_o_d_a_y_ _a_t_t_e_m_p_t_ _h_i_r_e_ _m_e_m_b_e_r_ _k_l_u_x_ _k_l_a_n_ _m_u_r_d_e_r_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _n_e_i_g_h_b_o_r_ _a_l_l_e_n_ _w_a_y_n_ _d_e_n_s_e_n_ _m_o_r_g_a_n_ _m_u_n_f_o_r_d_ _p_l_e_a_d_ _g_u_i_l_t_i_ _j_u_d_g_ _k_a_r_o_n_ _b_o_w_d_r_ _c_o_u_n_t_ _u_s_e_ _c_a_u_s_ _s_o_m_e_o_n_ _e_l_s_ _i_n_t_e_r_s_t_ _f_a_c_i_l_ _t_r_a_v_e_l_ _i_n_t_e_n_t_ _c_o_m_m_i_t_ _m_u_r_d_e_r_ _h_i_r_e_ _t_o_d_a_y_ _s_e_n_t_e_n_c_ _h_e_a_r_ _j_u_d_g_ _b_o_w_d_r_ _s_e_n_t_e_n_c_ _m_o_r_g_a_n_ _s_e_r_v_ _m_o_n_t_h_ _p_r_i_s_o_n_ _f_o_l_l_o_w_ _t_h_r_e_e_ _y_e_a_r_ _s_u_p_e_r_v_i_s_ _r_e_l_e_a_s_ _m_o_r_g_a_n_ _p_r_e_v_i_o_u_s_ _a_d_m_i_t_ _a_u_g_u_s_t_ _a_t_t_e_m_p_t_ _h_i_r_e_ _m_e_m_b_e_r_ _m_u_r_d_e_r_ _n_e_i_g_h_b_o_r_ _a_c_c_o_r_d_ _m_o_r_g_a_n_ _p_l_e_a_ _a_g_r_e_e_m_e_n_t_ _m_o_r_g_a_n_ _s_p_o_k_e_ _p_h_o_n_e_ _u_n_d_e_r_c_o_v_ _a_g_e_n_t_ _i_d_e_n_t_i_f_i_ _m_e_m_b_e_r_ _a_r_r_a_n_g_ _m_e_e_t_ _t_h_r_e_e_ _d_a_y_ _l_a_t_e_r_ _o_x_f_o_r_d_ _m_o_t_e_l_ _d_i_s_c_u_s_s_ _p_a_y_m_e_n_t_ _m_u_r_d_e_r_ _p_h_o_n_e_ _c_o_n_v_e_r_s_ _m_o_r_g_a_n_ _u_s_e_ _r_a_c_i_a_l_ _s_l_u_r_ _d_e_s_c_r_i_b_ _w_a_n_t_ _k_i_l_l_ _b_r_a_g_ _f_i_r_e_ _s_e_v_e_r_ _s_h_o_t_ _t_o_w_a_r_d_ _i_n_t_i_m_i_d_ _m_o_r_g_a_n_ _a_l_s_o_ _d_e_s_c_r_i_b_ _d_e_t_a_i_l_ _w_a_n_t_ _h_u_n_g_ _t_r_e_e_ _l_i_k_e_ _d_e_e_r_ _g_u_t_ _b_o_d_i_ _p_a_r_t_ _s_l_o_w_ _p_a_i_n_ _d_e_a_t_h_ _a_u_g_u_s_t_ _m_o_r_g_a_n_ _a_g_e_n_t_ _p_o_s_e_ _m_e_m_b_e_r_ _m_o_r_g_a_n_ _o_f_f_e_r_ _w_a_t_c_h_ _n_e_c_k_l_a_c_ _p_a_y_m_e_n_t_ _m_u_r_d_e_r_ _g_a_v_e_ _e_x_p_l_i_c_i_t_ _d_i_r_e_c_t_ _t_o_r_t_u_r_ _m_u_r_d_e_r_ _d_e_f_e_n_d_ _a_t_t_e_m_p_t_ _n_e_i_g_h_b_o_r_ _t_o_r_t_u_r_ _m_u_r_d_e_r_ _s_a_i_d_ _a_c_t_ _g_e_n_e_r_a_l_ _s_a_m_u_e_l_ _t_o_d_a_y_ _s_e_n_t_e_n_c_ _d_e_m_o_n_s_t_r_ _c_o_n_t_i_n_u_ _a_g_g_r_e_s_s_ _p_r_o_s_e_c_u_t_ _r_a_c_i_a_l_ _h_a_t_r_ _s_e_e_k_ _i_n_f_l_i_c_t_ _a_c_t_ _v_i_o_l_e_n_c_ _o_t_h_e_r_ _m_o_r_g_a_n_ _d_e_t_a_i_l_ _c_a_l_c_u_l_ _d_e_s_i_r_ _n_e_i_g_h_b_o_r_ _l_i_f_e_ _b_r_u_t_a_l_ _h_e_i_n_o_u_s_ _m_e_a_n_ _s_a_i_d_ _v_a_n_c_ _t_o_d_a_y_ _s_e_n_t_e_n_c_ _r_e_i_n_f_o_r_c_ _v_i_g_i_l_a_n_t_ _a_c_c_e_p_t_ _s_o_c_i_e_t_i_ _p_r_o_s_e_c_u_t_ _c_r_i_m_e_ _i_n_v_e_s_t_i_g_ _p_r_o_s_e_c_u_t_ _a_t_t_o_r_n_e_y_ _m_e_a_d_o_w_ _b_r_a_d_ _f_e_l_t_o_n_ _n_o_r_t_h_e_r_n_ _a_l_a_b_a_m_a_ _d_a_v_i_d_ _r_e_e_s
2,act general jocelyn samuel north dakota timothi purdon announc dominiqu jason flanigan arraign today threat charg flanigan indict seal grand juri threaten synagogu fargo count indict charg flanigan issu threaten interst communic interf feder protect activ indict unseal prior arraign indict alleg flanigan call templ beth fargo left voic mail messag threaten employe synagogu indict charg threat intimid interf templ beth employe religion indict mere accus defend presum innoc unless proven guilti investig prosecut attorney lynn jordheim megan heali north dakota dana mulhaus crimin section,0.993702,0.003127,0.003171,act general jocelyn samuel north dakota timothi purdon announc dominiqu jason flanigan arraign today threat charg flanigan indict seal grand juri threaten synagogu fargo count indict charg flanigan issu threaten interst communic interf feder protect activ indict unseal prior arraign indict alleg flanigan call templ beth fargo left voic mail messag threaten employe synagogu indict charg threat intimid interf templ beth employe religion indict mere accus defend presum innoc unless proven guilti investig prosecut attorney lynn jordheim megan heali north dakota dana mulhaus crimin section,0.993702,0.003127,0.003171,act general jocelyn samuel north dakota timothi purdon announc dominiqu jason flanigan arraign today threat charg flanigan indict seal grand juri threaten synagogu fargo count indict charg flanigan issu threaten interst communic interf feder protect activ indict unseal prior arraign indict alleg flanigan call templ beth fargo left voic mail messag threaten employe synagogu indict charg threat intimid interf templ beth employe religion indict mere accus defend presum innoc unless proven guilti investig prosecut attorney lynn jordheim megan heali north dakota dana mulhaus crimin section,0.987422,...,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,439,0.211,0.741,0.048,-0.9779,act general jocelyn samuel north dakota timothi purdon announc dominiqu jason flanigan arraign today threat charg flanigan indict seal grand juri threaten synagogu fargo count indict charg flanigan issu threaten interst communic interf feder protect activ indict unseal prior arraign indict alleg flanigan call templ beth fargo left voic mail messag threaten employe synagogu indict charg threat intimid interf templ beth employe religion indict mere accus defend presum innoc unless proven guilti investig prosecut attorney lynn jordheim megan heali north dakota dana mulhaus crimin section,topic_0,a_c_t_ _g_e_n_e_r_a_l_ _j_o_c_e_l_y_n_ _s_a_m_u_e_l_ _n_o_r_t_h_ _d_a_k_o_t_a_ _t_i_m_o_t_h_i_ _p_u_r_d_o_n_ _a_n_n_o_u_n_c_ _d_o_m_i_n_i_q_u_ _j_a_s_o_n_ _f_l_a_n_i_g_a_n_ _a_r_r_a_i_g_n_ _t_o_d_a_y_ _t_h_r_e_a_t_ _c_h_a_r_g_ _f_l_a_n_i_g_a_n_ _i_n_d_i_c_t_ _s_e_a_l_ _g_r_a_n_d_ _j_u_r_i_ _t_h_r_e_a_t_e_n_ _s_y_n_a_g_o_g_u_ _f_a_r_g_o_ _c_o_u_n_t_ _i_n_d_i_c_t_ _c_h_a_r_g_ _f_l_a_n_i_g_a_n_ _i_s_s_u_ _t_h_r_e_a_t_e_n_ _i_n_t_e_r_s_t_ _c_o_m_m_u_n_i_c_ _i_n_t_e_r_f_ _f_e_d_e_r_ _p_r_o_t_e_c_t_ _a_c_t_i_v_ _i_n_d_i_c_t_ _u_n_s_e_a_l_ _p_r_i_o_r_ _a_r_r_a_i_g_n_ _i_n_d_i_c_t_ _a_l_l_e_g_ _f_l_a_n_i_g_a_n_ _c_a_l_l_ _t_e_m_p_l_ _b_e_t_h_ _f_a_r_g_o_ _l_e_f_t_ _v_o_i_c_ _m_a_i_l_ _m_e_s_s_a_g_ _t_h_r_e_a_t_e_n_ _e_m_p_l_o_y_e_ _s_y_n_a_g_o_g_u_ _i_n_d_i_c_t_ _c_h_a_r_g_ _t_h_r_e_a_t_ _i_n_t_i_m_i_d_ _i_n_t_e_r_f_ _t_e_m_p_l_ _b_e_t_h_ _e_m_p_l_o_y_e_ _r_e_l_i_g_i_o_n_ _i_n_d_i_c_t_ _m_e_r_e_ _a_c_c_u_s_ _d_e_f_e_n_d_ _p_r_e_s_u_m_ _i_n_n_o_c_ _u_n_l_e_s_s_ _p_r_o_v_e_n_ _g_u_i_l_t_i_ _i_n_v_e_s_t_i_g_ _p_r_o_s_e_c_u_t_ _a_t_t_o_r_n_e_y_ _l_y_n_n_ _j_o_r_d_h_e_i_m_ _m_e_g_a_n_ _h_e_a_l_i_ _n_o_r_t_h_ _d_a_k_o_t_a_ _d_a_n_a_ _m_u_l_h_a_u_s_ _c_r_i_m_i_n_ _s_e_c_t_i_o_n
3,act general jocelyn samuel tammi dickinson western missouri announc woman independ sentenc feder court today violat african american famili set fire resid logan smith victoria cheek herrera plead guilti judg brian wime count conspir threaten intimid famili independ exercis constitut right resid home race color count violat commit racial motiv arson sentenc hear today judg wime sentenc smith serv month prison cheek herrera serv month prison smith cheek herrera previous admit june conspir injur oppress threaten intimid african american coupl children free exercis constitut occupi rent home independ commit crime victim race color accord defend plea agreement incid began defend discuss desir victim famili home fire drew swastika wrote word white power driveway defend ask juvenil acquaint gasolin creat molotov cocktail fill glass bottl gasolin insert bottl serv wick defend wick threw bottl side hous resid fire everi person america right occupi home free racial motiv violenc threat said general samuel today sentenc reflect commit work togeth unit state attorney ensur right aggress enforc today tough sentenc send strong messag racial motiv violenc threat toler communiti said dickinson american feel unwelcom unsaf neighborhood race color bring violat other hold account action prosecut first david ketchmark patel investig,0.995296,0.002391,0.002314,act general jocelyn samuel tammi dickinson western missouri announc woman independ sentenc feder court today violat african american famili set fire resid logan smith victoria cheek herrera plead guilti judg brian wime count conspir threaten intimid famili independ exercis constitut right resid home race color count violat commit racial motiv arson sentenc hear today judg wime sentenc smith serv month prison cheek herrera serv month prison smith cheek herrera previous admit june conspir injur oppress threaten intimid african american coupl children free exercis constitut occupi rent home independ commit crime victim race color accord defend plea agreement incid began defend discuss desir victim famili home fire drew swastika wrote word white power driveway defend ask juvenil acquaint gasolin creat molotov cocktail fill glass bottl gasolin insert bottl serv wick defend wick threw bottl side hous resid fire everi person america right occupi home free racial motiv violenc threat said general samuel today sentenc reflect commit work togeth unit state attorney ensur right aggress enforc today tough sentenc send strong messag racial motiv violenc threat toler communiti said dickinson american feel unwelcom unsaf neighborhood race color bring violat other hold account action prosecut first david ketchmark patel investig,0.995296,0.002391,0.002314,act general jocelyn samuel tammi dickinson western missouri announc woman independ sentenc feder court today violat african american famili set fire resid logan smith victoria cheek herrera plead guilti judg brian wime count conspir threaten intimid famili independ exercis constitut right resid home race color count violat commit racial motiv arson sentenc hear today judg wime sentenc smith serv month prison cheek herrera serv month prison smith cheek herrera previous admit june conspir injur oppress threaten intimid african american coupl children free exercis constitut occupi rent home independ commit crime victim race color accord defend plea agreement incid began defend discuss desir victim famili home fire drew swastika wrote word white power driveway defend ask juvenil acquaint gasolin creat molotov cocktail fill glass bottl gasolin insert bottl serv wick defend wick threw bottl side hous resid fire everi person america right occupi home free racial motiv violenc threat said general samuel today sentenc reflect commit work togeth unit state attorney ensur right aggress enforc today tough sentenc send strong messag racial motiv violenc threat toler communiti said dickinson american feel unwelcom unsaf neighborhood race color bring violat other hold account action prosecut first david ketchmark patel investig,0.995651,...,Hate Crimes,Civil Rights Division,490,0.155,0.768,0.077,-0.9848,act general jocelyn samuel tammi dickinson western missouri announc woman independ sentenc feder court today violat african american famili set fire resid logan smith victoria cheek herrera plead guilti judg brian wime count conspir threaten intimid famili independ exercis constitut right resid home race color count violat commit racial motiv arson sentenc hear today judg wime sentenc smith serv month prison cheek herrera serv month prison smith cheek herrera previous admit june conspir injur oppress threaten intimid african american coupl children free exercis constitut occupi rent home independ commit crime victim race color accord defend plea agreement incid began defend discuss desir victim famili home fire drew swastika wrote word white power driveway defend ask juvenil acquaint gasolin creat molotov cocktail fill glass bottl gasolin insert bottl serv wick defend wick threw bottl side hous resid fire everi person america right occupi home free racial motiv violenc threat said general samuel today sentenc reflect commit work togeth unit state attorney ensur right aggress enforc today tough sentenc send strong messag racial motiv violenc threat toler communiti said dickinson american feel unwelcom unsaf neighborhood race color bring violat other hold account action prosecut first david ketchmark patel investig,topic_0_y,a_c_t_ _g_e_n_e_r_a_l_ _j_o_c_e_l_y_n_ _s_a_m_u_e_l_ _t_a_m_m_i_ _d_i_c_k_i_n_s_o_n_ _w_e_s_t_e_r_n_ _m_i_s_s_o_u_r_i_ _a_n_n_o_u_n_c_ _w_o_m_a_n_ _i_n_d_e_p_e_n_d_ _s_e_n_t_e_n_c_ _f_e_d_e_r_ _c_o_u_r_t_ _t_o_d_a_y_ _v_i_o_l_a_t_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _f_a_m_i_l_i_ _s_e_t_ _f_i_r_e_ _r_e_s_i_d_ _l_o_g_a_n_ _s_m_i_t_h_ _v_i_c_t_o_r_i_a_ _c_h_e_e_k_ _h_e_r_r_e_r_a_ _p_l_e_a_d_ _g_u_i_l_t_i_ _j_u_d_g_ _b_r_i_a_n_ _w_i_m_e_ _c_o_u_n_t_ _c_o_n_s_p_i_r_ _t_h_r_e_a_t_e_n_ _i_n_t_i_m_i_d_ _f_a_m_i_l_i_ _i_n_d_e_p_e_n_d_ _e_x_e_r_c_i_s_ _c_o_n_s_t_i_t_u_t_ _r_i_g_h_t_ _r_e_s_i_d_ _h_o_m_e_ _r_a_c_e_ _c_o_l_o_r_ _c_o_u_n_t_ _v_i_o_l_a_t_ _c_o_m_m_i_t_ _r_a_c_i_a_l_ _m_o_t_i_v_ _a_r_s_o_n_ _s_e_n_t_e_n_c_ _h_e_a_r_ _t_o_d_a_y_ _j_u_d_g_ _w_i_m_e_ _s_e_n_t_e_n_c_ _s_m_i_t_h_ _s_e_r_v_ _m_o_n_t_h_ _p_r_i_s_o_n_ _c_h_e_e_k_ _h_e_r_r_e_r_a_ _s_e_r_v_ _m_o_n_t_h_ _p_r_i_s_o_n_ _s_m_i_t_h_ _c_h_e_e_k_ _h_e_r_r_e_r_a_ _p_r_e_v_i_o_u_s_ _a_d_m_i_t_ _j_u_n_e_ _c_o_n_s_p_i_r_ _i_n_j_u_r_ _o_p_p_r_e_s_s_ _t_h_r_e_a_t_e_n_ _i_n_t_i_m_i_d_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _c_o_u_p_l_ _c_h_i_l_d_r_e_n_ _f_r_e_e_ _e_x_e_r_c_i_s_ _c_o_n_s_t_i_t_u_t_ _o_c_c_u_p_i_ _r_e_n_t_ _h_o_m_e_ _i_n_d_e_p_e_n_d_ _c_o_m_m_i_t_ _c_r_i_m_e_ _v_i_c_t_i_m_ _r_a_c_e_ _c_o_l_o_r_ _a_c_c_o_r_d_ _d_e_f_e_n_d_ _p_l_e_a_ _a_g_r_e_e_m_e_n_t_ _i_n_c_i_d_ _b_e_g_a_n_ _d_e_f_e_n_d_ _d_i_s_c_u_s_s_ _d_e_s_i_r_ _v_i_c_t_i_m_ _f_a_m_i_l_i_ _h_o_m_e_ _f_i_r_e_ _d_r_e_w_ _s_w_a_s_t_i_k_a_ _w_r_o_t_e_ _w_o_r_d_ _w_h_i_t_e_ _p_o_w_e_r_ _d_r_i_v_e_w_a_y_ _d_e_f_e_n_d_ _a_s_k_ _j_u_v_e_n_i_l_ _a_c_q_u_a_i_n_t_ _g_a_s_o_l_i_n_ _c_r_e_a_t_ _m_o_l_o_t_o_v_ _c_o_c_k_t_a_i_l_ _f_i_l_l_ _g_l_a_s_s_ _b_o_t_t_l_ _g_a_s_o_l_i_n_ _i_n_s_e_r_t_ _b_o_t_t_l_ _s_e_r_v_ _w_i_c_k_ _d_e_f_e_n_d_ _w_i_c_k_ _t_h_r_e_w_ _b_o_t_t_l_ _s_i_d_e_ _h_o_u_s_ _r_e_s_i_d_ _f_i_r_e_ _e_v_e_r_i_ _p_e_r_s_o_n_ _a_m_e_r_i_c_a_ _r_i_g_h_t_ _o_c_c_u_p_i_ _h_o_m_e_ _f_r_e_e_ _r_a_c_i_a_l_ _m_o_t_i_v_ _v_i_o_l_e_n_c_ _t_h_r_e_a_t_ _s_a_i_d_ _g_e_n_e_r_a_l_ _s_a_m_u_e_l_ _t_o_d_a_y_ _s_e_n_t_e_n_c_ _r_e_f_l_e_c_t_ _c_o_m_m_i_t_ _w_o_r_k_ _t_o_g_e_t_h_ _u_n_i_t_ _s_t_a_t_e_ _a_t_t_o_r_n_e_y_ _e_n_s_u_r_ _r_i_g_h_t_ _a_g_g_r_e_s_s_ _e_n_f_o_r_c_ _t_o_d_a_y_ _t_o_u_g_h_ _s_e_n_t_e_n_c_ _s_e_n_d_ _s_t_r_o_n_g_ _m_e_s_s_a_g_ _r_a_c_i_a_l_ _m_o_t_i_v_ _v_i_o_l_e_n_c_ _t_h_r_e_a_t_ _t_o_l_e_r_ _c_o_m_m_u_n_i_t_i_ _s_a_i_d_ _d_i_c_k_i_n_s_o_n_ _a_m_e_r_i_c_a_n_ _f_e_e_l_ _u_n_w_e_l_c_o_m_ _u_n_s_a_f_ _n_e_i_g_h_b_o_r_h_o_o_d_ _r_a_c_e_ _c_o_l_o_r_ _b_r_i_n_g_ _v_i_o_l_a_t_ _o_t_h_e_r_ _h_o_l_d_ _a_c_c_o_u_n_t_ _a_c_t_i_o_n_ _p_r_o_s_e_c_u_t_ _f_i_r_s_t_ _d_a_v_i_d_ _k_e_t_c_h_m_a_r_k_ _p_a_t_e_l_ _i_n_v_e_s_t_i_g
4,act general vanita gupta felicia adam northern mississippi special agent charg donald alway jackson announc today charg feder crime engag threaten conduct direct african american student employe univers mississippi oxford mississippi graem phillip harri indict feder grand juri count conspiraci violat count use threat forc intimid african american student race color accord charg document harri student univers conspir other cover dark hang rope outdat version georgia state flag promin depict confeder battl flag around neck jame meredith statu campus univers mississippi intent threaten intimid african american student employe univers icon statu honor meredith role univers first african american student contenti integr incid occur earli morn hour shame ignor insult american violat strong held valu said general eric holder ever made feel threaten intimid look like take appropri action hold wrongdoer account send clear messag flagrant infring histor unnot unpunish indict mere accus defend presum innoc unless proven guilti ongo investig jackson mississippi oxford resid agenc univers mississippi polic prosecut northern mississippi,0.996268,0.001815,0.001917,act general vanita gupta felicia adam northern mississippi special agent charg donald alway jackson announc today charg feder crime engag threaten conduct direct african american student employe univers mississippi oxford mississippi graem phillip harri indict feder grand juri count conspiraci violat count use threat forc intimid african american student race color accord charg document harri student univers conspir other cover dark hang rope outdat version georgia state flag promin depict confeder battl flag around neck jame meredith statu campus univers mississippi intent threaten intimid african american student employe univers icon statu honor meredith role univers first african american student contenti integr incid occur earli morn hour shame ignor insult american violat strong held valu said general eric holder ever made feel threaten intimid look like take appropri action hold wrongdoer account send clear messag flagrant infring histor unnot unpunish indict mere accus defend presum innoc unless proven guilti ongo investig jackson mississippi oxford resid agenc univers mississippi polic prosecut northern mississippi,0.996268,0.001815,0.001917,act general vanita gupta felicia adam northern mississippi special agent charg donald alway jackson announc today charg feder crime engag threaten conduct direct african american student employe univers mississippi oxford mississippi graem phillip harri indict feder grand juri count conspiraci violat count use threat forc intimid african american student race color accord charg document harri student univers conspir other cover dark hang rope outdat version georgia state flag promin depict confeder battl flag around neck jame meredith statu campus univers mississippi intent threaten intimid african american student employe univers icon statu honor meredith role univers first african american student contenti integr incid occur earli morn hour shame ignor insult american violat strong held valu said general eric holder ever made feel threaten intimid look like take appropri action hold wrongdoer account send clear messag flagrant infring histor unnot unpunish indict mere accus defend presum innoc unless proven guilti ongo investig jackson mississippi oxford resid agenc univers mississippi polic prosecut northern mississippi,0.894007,...,Hate Crimes,Civil Rights Division,440,0.169,0.771,0.060,-0.9840,act general vanita gupta felicia adam northern mississippi special agent charg donald alway jackson announc today charg feder crime engag threaten conduct direct african american student employe univers mississippi oxford mississippi graem phillip harri indict feder grand juri count conspiraci violat count use threat forc intimid african american student race color accord charg document harri student univers conspir other cover dark hang rope outdat version georgia state flag promin depict confeder battl flag around neck jame meredith statu campus univers mississippi intent threaten intimid african american student employe univers icon statu honor meredith role univers first african american student contenti integr incid occur earli morn hour shame ignor insult american violat strong held valu said general eric holder ever made feel threaten intimid look like take appropri action hold wrongdoer account send clear messag flagrant infring histor unnot unpunish indict mere accus defend presum innoc unless proven guilti ongo investig jackson mississippi oxford resid agenc univers mississippi polic prosecut northern mississippi,topic_0,a_c_t_ _g_e_n_e_r_a_l_ _v_a_n_i_t_a_ _g_u_p_t_a_ _f_e_l_i_c_i_a_ _a_d_a_m_ _n_o_r_t_h_e_r_n_ _m_i_s_s_i_s_s_i_p_p_i_ _s_p_e_c_i_a_l_ _a_g_e_n_t_ _c_h_a_r_g_ _d_o_n_a_l_d_ _a_l_w_a_y_ _j_a_c_k_s_o_n_ _a_n_n_o_u_n_c_ _t_o_d_a_y_ _c_h_a_r_g_ _f_e_d_e_r_ _c_r_i_m_e_ _e_n_g_a_g_ _t_h_r_e_a_t_e_n_ _c_o_n_d_u_c_t_ _d_i_r_e_c_t_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _s_t_u_d_e_n_t_ _e_m_p_l_o_y_e_ _u_n_i_v_e_r_s_ _m_i_s_s_i_s_s_i_p_p_i_ _o_x_f_o_r_d_ _m_i_s_s_i_s_s_i_p_p_i_ _g_r_a_e_m_ _p_h_i_l_l_i_p_ _h_a_r_r_i_ _i_n_d_i_c_t_ _f_e_d_e_r_ _g_r_a_n_d_ _j_u_r_i_ _c_o_u_n_t_ _c_o_n_s_p_i_r_a_c_i_ _v_i_o_l_a_t_ _c_o_u_n_t_ _u_s_e_ _t_h_r_e_a_t_ _f_o_r_c_ _i_n_t_i_m_i_d_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _s_t_u_d_e_n_t_ _r_a_c_e_ _c_o_l_o_r_ _a_c_c_o_r_d_ _c_h_a_r_g_ _d_o_c_u_m_e_n_t_ _h_a_r_r_i_ _s_t_u_d_e_n_t_ _u_n_i_v_e_r_s_ _c_o_n_s_p_i_r_ _o_t_h_e_r_ _c_o_v_e_r_ _d_a_r_k_ _h_a_n_g_ _r_o_p_e_ _o_u_t_d_a_t_ _v_e_r_s_i_o_n_ _g_e_o_r_g_i_a_ _s_t_a_t_e_ _f_l_a_g_ _p_r_o_m_i_n_ _d_e_p_i_c_t_ _c_o_n_f_e_d_e_r_ _b_a_t_t_l_ _f_l_a_g_ _a_r_o_u_n_d_ _n_e_c_k_ _j_a_m_e_ _m_e_r_e_d_i_t_h_ _s_t_a_t_u_ _c_a_m_p_u_s_ _u_n_i_v_e_r_s_ _m_i_s_s_i_s_s_i_p_p_i_ _i_n_t_e_n_t_ _t_h_r_e_a_t_e_n_ _i_n_t_i_m_i_d_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _s_t_u_d_e_n_t_ _e_m_p_l_o_y_e_ _u_n_i_v_e_r_s_ _i_c_o_n_ _s_t_a_t_u_ _h_o_n_o_r_ _m_e_r_e_d_i_t_h_ _r_o_l_e_ _u_n_i_v_e_r_s_ _f_i_r_s_t_ _a_f_r_i_c_a_n_ _a_m_e_r_i_c_a_n_ _s_t_u_d_e_n_t_ _c_o_n_t_e_n_t_i_ _i_n_t_e_g_r_ _i_n_c_i_d_ _o_c_c_u_r_ _e_a_r_l_i_ _m_o_r_n_ _h_o_u_r_ _s_h_a_m_e_ _i_g_n_o_r_ _i_n_s_u_l_t_ _a_m_e_r_i_c_a_n_ _v_i_o_l_a_t_ _s_t_r_o_n_g_ _h_e_l_d_ _v_a_l_u_ _s_a_i_d_ _g_e_n_e_r_a_l_ _e_r_i_c_ _h_o_l_d_e_r_ _e_v_e_r_ _m_a_d_e_ _f_e_e_l_ _t_h_r_e_a_t_e_n_ _i_n_t_i_m_i_d_ _l_o_o_k_ _l_i_k_e_ _t_a_k_e_ _a_p_p_r_o_p_r_i_ _a_c_t_i_o_n_ _h_o_l_d_ _w_r_o_n_g_d_o_e_r_ _a_c_c_o_u_n_t_ _s_e_n_d_ _c_l_e_a_r_ _m_e_s_s_a_g_ _f_l_a_g_r_a_n_t_ _i_n_f_r_i_n_g_ _h_i_s_t_o_r_ _u_n_n_o_t_ _u_n_p_u_n_i_s_h_ _i_n_d_i_c_t_ _m_e_r_e_ _a_c_c_u_s_ _d_e_f_e_n_d_ _p_r_e_s_u_m_ _i_n_n_o_c_ _u_n_l_e_s_s_ _p_r_o_v_e_n_ _g_u_i_l_t_i_ _o_n_g_o_ _i_n_v_e_s_t_i_g_ _j_a_c_k_s_o_n_ _m_i_s_s_i_s_s_i_p_p_i_ _o_x_f_o_r_d_ _r_e_s_i_d_ _a_g_e_n_c_ _u_n_i_v_e_r_s_ _m_i_s_s_i_s_s_i_p_p_i_ _p_o_l_i_c_ _p_r_o_s_e_c_u_t_ _n_o_r_t_h_e_r_n_ _m_i_s_s_i_s_s_i_p_p_i
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
712,year kansa nativ resid panama plead guilti today sexual explicit depict minor import unit state announc act general john cronan crimin ryan patrick southern texa jebediah dishman fredonia kansa plead guilti inform charg sexual explicit depict minor import unit state judg ewe werlein southern texa sentenc juli dishman arrest houston crimin complaint grand juri court southern texa indict count engag illicit sexual conduct minor foreign countri product child pornographi traffick children obtain custodi control minor purpos produc sexual explicit visual depict minor accord admiss made conjunct plea agreement septemb dishman began approxim month trip sever countri southeast asia trip indonesia anoth tourist observ dishman engag suspici interact minor masturb watch minor use tablet take photograph three year german child tourist confront dishman seiz tablet turn local author author later review tablet pursuant search warrant discov sexual explicit imag minor includ german child well internet search indic interest traffick minor southeast asia investig cooper immigr custom enforc homeland secur investig attorney jame burk william gradi crimin child exploit obscen section ceo sherri zack southern texa prosecut elli peirson central illinoi previous detail ceo also serv vital member prosecut team earlier stage litig brought part project safe childhood nationwid initi combat grow epidem child sexual exploit abus launch attorney offic ceo project safe childhood marshal feder state local resourc better locat apprehend prosecut individu exploit children internet well identifi rescu victim inform project safe childhood pleas visit,0.003227,0.993537,0.003236,year kansa nativ resid panama plead guilti today sexual explicit depict minor import unit state announc act general john cronan crimin ryan patrick southern texa jebediah dishman fredonia kansa plead guilti inform charg sexual explicit depict minor import unit state judg ewe werlein southern texa sentenc juli dishman arrest houston crimin complaint grand juri court southern texa indict count engag illicit sexual conduct minor foreign countri product child pornographi traffick children obtain custodi control minor purpos produc sexual explicit visual depict minor accord admiss made conjunct plea agreement septemb dishman began approxim month trip sever countri southeast asia trip indonesia anoth tourist observ dishman engag suspici interact minor masturb watch minor use tablet take photograph three year german child tourist confront dishman seiz tablet turn local author author later review tablet pursuant search warrant discov sexual explicit imag minor includ german child well internet search indic interest traffick minor southeast asia investig cooper immigr custom enforc homeland secur investig attorney jame burk william gradi crimin child exploit obscen section ceo sherri zack southern texa prosecut elli peirson central illinoi previous detail ceo also serv vital member prosecut team earlier stage litig brought part project safe childhood nationwid initi combat grow epidem child sexual exploit abus launch attorney offic ceo project safe childhood marshal feder state local resourc better locat apprehend prosecut individu exploit children internet well identifi rescu victim inform project safe childhood pleas visit,0.003227,0.993537,0.003236,year kansa nativ resid panama plead guilti today sexual explicit depict minor import unit state announc act general john cronan crimin ryan patrick southern texa jebediah dishman fredonia kansa plead guilti inform charg sexual explicit depict minor import unit state judg ewe werlein southern texa sentenc juli dishman arrest houston crimin complaint grand juri court southern texa indict count engag illicit sexual conduct minor foreign countri product child pornographi traffick children obtain custodi control minor purpos produc sexual explicit visual depict minor accord admiss made conjunct plea agreement septemb dishman began approxim month trip sever countri southeast asia trip indonesia anoth tourist observ dishman engag suspici interact minor masturb watch minor use tablet take photograph three year german child tourist confront dishman seiz tablet turn local author author later review tablet pursuant search warrant discov sexual explicit imag minor includ german child well internet search indic interest traffick minor southeast asia investig cooper immigr custom enforc homeland secur investig attorney jame burk william gradi crimin child exploit obscen section ceo sherri zack southern texa prosecut elli peirson central illinoi previous detail ceo also serv vital member prosecut team earlier stage litig brought part project safe childhood nationwid initi combat grow epidem child sexual exploit abus launch attorney offic ceo project safe childhood marshal feder state local resourc better locat apprehend prosecut individu exploit children internet well identifi rescu victim inform project safe childhood pleas visit,0.001890,...,Project Safe Childhood,"Criminal Division; USAO - Texas, Southern",406,0.094,0.821,0.085,-0.5719,year kansa nativ resid panama plead guilti today sexual explicit depict minor import unit state announc act general john cronan crimin ryan patrick southern texa jebediah dishman fredonia kansa plead guilti inform charg sexual explicit depict minor import unit state judg ewe werlein southern texa sentenc juli dishman arrest houston crimin complaint grand juri court southern texa indict count engag illicit sexual conduct minor foreign countri product child pornographi traffick children obtain custodi control minor purpos produc sexual explicit visual depict minor accord admiss made conjunct plea agreement septemb dishman began approxim month trip sever countri southeast asia trip indonesia anoth tourist observ dishman engag suspici interact minor masturb watch minor use tablet take photograph three year german child tourist confront dishman seiz tablet turn local author author later review tablet pursuant search warrant discov sexual explicit imag minor includ german child well internet search indic interest traffick minor southeast asia investig cooper immigr custom enforc homeland secur investig attorney jame burk william gradi crimin child exploit obscen section ceo sherri zack southern texa prosecut elli peirson central illinoi previous detail ceo also serv vital member prosecut team earlier stage litig brought part project safe childhood nationwid initi combat grow epidem child sexual exploit abus launch attorney offic ceo project safe childhood marshal feder state local resourc better locat apprehend prosecut individu exploit children internet well identifi rescu victim inform project safe childhood pleas visit,topic_2_y,y_e_a_r_ _k_a_n_s_a_ _n_a_t_i_v_ _r_e_s_i_d_ _p_a_n_a_m_a_ _p_l_e_a_d_ _g_u_i_l_t_i_ _t_o_d_a_y_ _s_e_x_u_a_l_ _e_x_p_l_i_c_i_t_ _d_e_p_i_c_t_ _m_i_n_o_r_ _i_m_p_o_r_t_ _u_n_i_t_ _s_t_a_t_e_ _a_n_n_o_u_n_c_ _a_c_t_ _g_e_n_e_r_a_l_ _j_o_h_n_ _c_r_o_n_a_n_ _c_r_i_m_i_n_ _r_y_a_n_ _p_a_t_r_i_c_k_ _s_o_u_t_h_e_r_n_ _t_e_x_a_ _j_e_b_e_d_i_a_h_ _d_i_s_h_m_a_n_ _f_r_e_d_o_n_i_a_ _k_a_n_s_a_ _p_l_e_a_d_ _g_u_i_l_t_i_ _i_n_f_o_r_m_ _c_h_a_r_g_ _s_e_x_u_a_l_ _e_x_p_l_i_c_i_t_ _d_e_p_i_c_t_ _m_i_n_o_r_ _i_m_p_o_r_t_ _u_n_i_t_ _s_t_a_t_e_ _j_u_d_g_ _e_w_e_ _w_e_r_l_e_i_n_ _s_o_u_t_h_e_r_n_ _t_e_x_a_ _s_e_n_t_e_n_c_ _j_u_l_i_ _d_i_s_h_m_a_n_ _a_r_r_e_s_t_ _h_o_u_s_t_o_n_ _c_r_i_m_i_n_ _c_o_m_p_l_a_i_n_t_ _g_r_a_n_d_ _j_u_r_i_ _c_o_u_r_t_ _s_o_u_t_h_e_r_n_ _t_e_x_a_ _i_n_d_i_c_t_ _c_o_u_n_t_ _e_n_g_a_g_ _i_l_l_i_c_i_t_ _s_e_x_u_a_l_ _c_o_n_d_u_c_t_ _m_i_n_o_r_ _f_o_r_e_i_g_n_ _c_o_u_n_t_r_i_ _p_r_o_d_u_c_t_ _c_h_i_l_d_ _p_o_r_n_o_g_r_a_p_h_i_ _t_r_a_f_f_i_c_k_ _c_h_i_l_d_r_e_n_ _o_b_t_a_i_n_ _c_u_s_t_o_d_i_ _c_o_n_t_r_o_l_ _m_i_n_o_r_ _p_u_r_p_o_s_ _p_r_o_d_u_c_ _s_e_x_u_a_l_ _e_x_p_l_i_c_i_t_ _v_i_s_u_a_l_ _d_e_p_i_c_t_ _m_i_n_o_r_ _a_c_c_o_r_d_ _a_d_m_i_s_s_ _m_a_d_e_ _c_o_n_j_u_n_c_t_ _p_l_e_a_ _a_g_r_e_e_m_e_n_t_ _s_e_p_t_e_m_b_ _d_i_s_h_m_a_n_ _b_e_g_a_n_ _a_p_p_r_o_x_i_m_ _m_o_n_t_h_ _t_r_i_p_ _s_e_v_e_r_ _c_o_u_n_t_r_i_ _s_o_u_t_h_e_a_s_t_ _a_s_i_a_ _t_r_i_p_ _i_n_d_o_n_e_s_i_a_ _a_n_o_t_h_ _t_o_u_r_i_s_t_ _o_b_s_e_r_v_ _d_i_s_h_m_a_n_ _e_n_g_a_g_ _s_u_s_p_i_c_i_ _i_n_t_e_r_a_c_t_ _m_i_n_o_r_ _m_a_s_t_u_r_b_ _w_a_t_c_h_ _m_i_n_o_r_ _u_s_e_ _t_a_b_l_e_t_ _t_a_k_e_ _p_h_o_t_o_g_r_a_p_h_ _t_h_r_e_e_ _y_e_a_r_ _g_e_r_m_a_n_ _c_h_i_l_d_ _t_o_u_r_i_s_t_ _c_o_n_f_r_o_n_t_ _d_i_s_h_m_a_n_ _s_e_i_z_ _t_a_b_l_e_t_ _t_u_r_n_ _l_o_c_a_l_ _a_u_t_h_o_r_ _a_u_t_h_o_r_ _l_a_t_e_r_ _r_e_v_i_e_w_ _t_a_b_l_e_t_ _p_u_r_s_u_a_n_t_ _s_e_a_r_c_h_ _w_a_r_r_a_n_t_ _d_i_s_c_o_v_ _s_e_x_u_a_l_ _e_x_p_l_i_c_i_t_ _i_m_a_g_ _m_i_n_o_r_ _i_n_c_l_u_d_ _g_e_r_m_a_n_ _c_h_i_l_d_ _w_e_l_l_ _i_n_t_e_r_n_e_t_ _s_e_a_r_c_h_ _i_n_d_i_c_ _i_n_t_e_r_e_s_t_ _t_r_a_f_f_i_c_k_ _m_i_n_o_r_ _s_o_u_t_h_e_a_s_t_ _a_s_i_a_ _i_n_v_e_s_t_i_g_ _c_o_o_p_e_r_ _i_m_m_i_g_r_ _c_u_s_t_o_m_ _e_n_f_o_r_c_ _h_o_m_e_l_a_n_d_ _s_e_c_u_r_ _i_n_v_e_s_t_i_g_ _a_t_t_o_r_n_e_y_ _j_a_m_e_ _b_u_r_k_ _w_i_l_l_i_a_m_ _g_r_a_d_i_ _c_r_i_m_i_n_ _c_h_i_l_d_ _e_x_p_l_o_i_t_ _o_b_s_c_e_n_ _s_e_c_t_i_o_n_ _c_e_o_ _s_h_e_r_r_i_ _z_a_c_k_ _s_o_u_t_h_e_r_n_ _t_e_x_a_ _p_r_o_s_e_c_u_t_ _e_l_l_i_ _p_e_i_r_s_o_n_ _c_e_n_t_r_a_l_ _i_l_l_i_n_o_i_ _p_r_e_v_i_o_u_s_ _d_e_t_a_i_l_ _c_e_o_ _a_l_s_o_ _s_e_r_v_ _v_i_t_a_l_ _m_e_m_b_e_r_ _p_r_o_s_e_c_u_t_ _t_e_a_m_ _e_a_r_l_i_e_r_ _s_t_a_g_e_ _l_i_t_i_g_ _b_r_o_u_g_h_t_ _p_a_r_t_ _p_r_o_j_e_c_t_ _s_a_f_e_ _c_h_i_l_d_h_o_o_d_ _n_a_t_i_o_n_w_i_d_ _i_n_i_t_i_ _c_o_m_b_a_t_ _g_r_o_w_ _e_p_i_d_e_m_ _c_h_i_l_d_ _s_e_x_u_a_l_ _e_x_p_l_o_i_t_ _a_b_u_s_ _l_a_u_n_c_h_ _a_t_t_o_r_n_e_y_ _o_f_f_i_c_ _c_e_o_ _p_r_o_j_e_c_t_ _s_a_f_e_ _c_h_i_l_d_h_o_o_d_ _m_a_r_s_h_a_l_ _f_e_d_e_r_ _s_t_a_t_e_ _l_o_c_a_l_ _r_e_s_o_u_r_c_ _b_e_t_t_e_r_ _l_o_c_a_t_ _a_p_p_r_e_h_e_n_d_ _p_r_o_s_e_c_u_t_ _i_n_d_i_v_i_d_u_ _e_x_p_l_o_i_t_ _c_h_i_l_d_r_e_n_ _i_n_t_e_r_n_e_t_ _w_e_l_l_ _i_d_e_n_t_i_f_i_ _r_e_s_c_u_ _v_i_c_t_i_m_ _i_n_f_o_r_m_ _p_r_o_j_e_c_t_ _s_a_f_e_ _c_h_i_l_d_h_o_o_d_ _p_l_e_a_s_ _v_i_s_i_t
713,yesterday feder grand juri portland indict georg allen mason wife saraya sophia lisa gardner charg relat assault year walk boyfriend street hillsboro occur animus victim sexual orient mason charg violat matthew shepard jame byrd hate crime prevent enact octob indict alleg mason struck victim metal tool victim actual perceiv sexual orient therebi caus bodili injuri victim gardner charg count obstruct know intent mislead hillsboro polic offic statement provid connect mason indict alleg gardner lie mason whereabout time offic search misl offic repeat chang stori weapon mason use strike victim mason face statutori maximum penalti year prison gardner face statutori maximum penalti year prison investig portland cooper prosecut hannah horsley oregon fara gold indict mere accus defend presum innoc unless proven guilti,0.026373,0.969549,0.004078,yesterday feder grand juri portland indict georg allen mason wife saraya sophia lisa gardner charg relat assault year walk boyfriend street hillsboro occur animus victim sexual orient mason charg violat matthew shepard jame byrd hate crime prevent enact octob indict alleg mason struck victim metal tool victim actual perceiv sexual orient therebi caus bodili injuri victim gardner charg count obstruct know intent mislead hillsboro polic offic statement provid connect mason indict alleg gardner lie mason whereabout time offic search misl offic repeat chang stori weapon mason use strike victim mason face statutori maximum penalti year prison gardner face statutori maximum penalti year prison investig portland cooper prosecut hannah horsley oregon fara gold indict mere accus defend presum innoc unless proven guilti,0.026373,0.969549,0.004078,yesterday feder grand juri portland indict georg allen mason wife saraya sophia lisa gardner charg relat assault year walk boyfriend street hillsboro occur animus victim sexual orient mason charg violat matthew shepard jame byrd hate crime prevent enact octob indict alleg mason struck victim metal tool victim actual perceiv sexual orient therebi caus bodili injuri victim gardner charg count obstruct know intent mislead hillsboro polic offic statement provid connect mason indict alleg gardner lie mason whereabout time offic search misl offic repeat chang stori weapon mason use strike victim mason face statutori maximum penalti year prison gardner face statutori maximum penalti year prison investig portland cooper prosecut hannah horsley oregon fara gold indict mere accus defend presum innoc unless proven guilti,0.991060,...,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,547,0.218,0.748,0.034,-0.9912,yesterday feder grand juri portland indict georg allen mason wife saraya sophia lisa gardner charg relat assault year walk boyfriend street hillsboro occur animus victim sexual orient mason charg violat matthew shepard jame byrd hate crime prevent enact octob indict alleg mason struck victim metal tool victim actual perceiv sexual orient therebi caus bodili injuri victim gardner charg count obstruct know intent mislead hillsboro polic offic statement provid connect mason indict alleg gardner lie mason whereabout time offic search misl offic repeat chang stori weapon mason use strike victim mason face statutori maximum penalti year prison gardner face statutori maximum penalti year prison investig portland cooper prosecut hannah horsley oregon fara gold indict mere accus defend presum innoc unless proven guilti,topic_0_y,y_e_s_t_e_r_d_a_y_ _f_e_d_e_r_ _g_r_a_n_d_ _j_u_r_i_ _p_o_r_t_l_a_n_d_ _i_n_d_i_c_t_ _g_e_o_r_g_ _a_l_l_e_n_ _m_a_s_o_n_ _w_i_f_e_ _s_a_r_a_y_a_ _s_o_p_h_i_a_ _l_i_s_a_ _g_a_r_d_n_e_r_ _c_h_a_r_g_ _r_e_l_a_t_ _a_s_s_a_u_l_t_ _y_e_a_r_ _w_a_l_k_ _b_o_y_f_r_i_e_n_d_ _s_t_r_e_e_t_ _h_i_l_l_s_b_o_r_o_ _o_c_c_u_r_ _a_n_i_m_u_s_ _v_i_c_t_i_m_ _s_e_x_u_a_l_ _o_r_i_e_n_t_ _m_a_s_o_n_ _c_h_a_r_g_ _v_i_o_l_a_t_ _m_a_t_t_h_e_w_ _s_h_e_p_a_r_d_ _j_a_m_e_ _b_y_r_d_ _h_a_t_e_ _c_r_i_m_e_ _p_r_e_v_e_n_t_ _e_n_a_c_t_ _o_c_t_o_b_ _i_n_d_i_c_t_ _a_l_l_e_g_ _m_a_s_o_n_ _s_t_r_u_c_k_ _v_i_c_t_i_m_ _m_e_t_a_l_ _t_o_o_l_ _v_i_c_t_i_m_ _a_c_t_u_a_l_ _p_e_r_c_e_i_v_ _s_e_x_u_a_l_ _o_r_i_e_n_t_ _t_h_e_r_e_b_i_ _c_a_u_s_ _b_o_d_i_l_i_ _i_n_j_u_r_i_ _v_i_c_t_i_m_ _g_a_r_d_n_e_r_ _c_h_a_r_g_ _c_o_u_n_t_ _o_b_s_t_r_u_c_t_ _k_n_o_w_ _i_n_t_e_n_t_ _m_i_s_l_e_a_d_ _h_i_l_l_s_b_o_r_o_ _p_o_l_i_c_ _o_f_f_i_c_ _s_t_a_t_e_m_e_n_t_ _p_r_o_v_i_d_ _c_o_n_n_e_c_t_ _m_a_s_o_n_ _i_n_d_i_c_t_ _a_l_l_e_g_ _g_a_r_d_n_e_r_ _l_i_e_ _m_a_s_o_n_ _w_h_e_r_e_a_b_o_u_t_ _t_i_m_e_ _o_f_f_i_c_ _s_e_a_r_c_h_ _m_i_s_l_ _o_f_f_i_c_ _r_e_p_e_a_t_ _c_h_a_n_g_ _s_t_o_r_i_ _w_e_a_p_o_n_ _m_a_s_o_n_ _u_s_e_ _s_t_r_i_k_e_ _v_i_c_t_i_m_ _m_a_s_o_n_ _f_a_c_e_ _s_t_a_t_u_t_o_r_i_ _m_a_x_i_m_u_m_ _p_e_n_a_l_t_i_ _y_e_a_r_ _p_r_i_s_o_n_ _g_a_r_d_n_e_r_ _f_a_c_e_ _s_t_a_t_u_t_o_r_i_ _m_a_x_i_m_u_m_ _p_e_n_a_l_t_i_ _y_e_a_r_ _p_r_i_s_o_n_ _i_n_v_e_s_t_i_g_ _p_o_r_t_l_a_n_d_ _c_o_o_p_e_r_ _p_r_o_s_e_c_u_t_ _h_a_n_n_a_h_ _h_o_r_s_l_e_y_ _o_r_e_g_o_n_ _f_a_r_a_ _g_o_l_d_ _i_n_d_i_c_t_ _m_e_r_e_ _a_c_c_u_s_ _d_e_f_e_n_d_ _p_r_e_s_u_m_ _i_n_n_o_c_ _u_n_l_e_s_s_ _p_r_o_v_e_n_ _g_u_i_l_t_i
714,yesterday feder juri honolulu found state hawaii hawaii transport airport hdot discrimin former employe sherri valmoja subject sexual harass verdict return file last year alleg defend violat titl prohibit discrimin basi race color nation origin religion evid present show employ explos detect canin handler honolulu intern airport valmoja subject sexual harass form lewd unwelcom comment physic intimid worker unwelcom conduct intimid began earli novemb valmoja worker employ privat compani contract defend valmoja worker becam employ state hawaii harass intimid continu juri found despit time complaint valmoja worker conduct defend fail take prompt effect action remedi harass continu march creat abus hostil work environ juri award valmoja compens pain suffer endur harass decis addit injunct relief still pend ask perman injunct prohibit state hawaii discrimin employe review revis defend sexual harass polici complaint procedur train employe discrimin vigor enforc titl ensur peopl work free sexual harass retali said princip deputi vanita gupta head juri verdict send loud messag clear remind continu effect combat base discrimin whenev occur public sector workplac valmoja origin file sexual harass charg hdot honolulu field equal employ opportun commiss eeoc investig determin reason caus believ discrimin occur refer matter lawsuit brought result project design ensur vigor enforc titl state local government employ enhanc cooper eeoc sexual harass remain signific problem nation workforc said eeoc chair jenni yang eeoc take serious oblig obtain redress employe victim egregi practic verdict serv remind employ must remain vigil prevent remedi harass workplac inform titl feder employ law avail employ litig section websit continu enforc titl prioriti addit inform avail websit eeoc enforc feder law prohibit employ discrimin inform eeoc avail websit,0.003043,0.993878,0.003079,yesterday feder juri honolulu found state hawaii hawaii transport airport hdot discrimin former employe sherri valmoja subject sexual harass verdict return file last year alleg defend violat titl prohibit discrimin basi race color nation origin religion evid present show employ explos detect canin handler honolulu intern airport valmoja subject sexual harass form lewd unwelcom comment physic intimid worker unwelcom conduct intimid began earli novemb valmoja worker employ privat compani contract defend valmoja worker becam employ state hawaii harass intimid continu juri found despit time complaint valmoja worker conduct defend fail take prompt effect action remedi harass continu march creat abus hostil work environ juri award valmoja compens pain suffer endur harass decis addit injunct relief still pend ask perman injunct prohibit state hawaii discrimin employe review revis defend sexual harass polici complaint procedur train employe discrimin vigor enforc titl ensur peopl work free sexual harass retali said princip deputi vanita gupta head juri verdict send loud messag clear remind continu effect combat base discrimin whenev occur public sector workplac valmoja origin file sexual harass charg hdot honolulu field equal employ opportun commiss eeoc investig determin reason caus believ discrimin occur refer matter lawsuit brought result project design ensur vigor enforc titl state local government employ enhanc cooper eeoc sexual harass remain signific problem nation workforc said eeoc chair jenni yang eeoc take serious oblig obtain redress employe victim egregi practic verdict serv remind employ must remain vigil prevent remedi harass workplac inform titl feder employ law avail employ litig section websit continu enforc titl prioriti addit inform avail websit eeoc enforc feder law prohibit employ discrimin inform eeoc avail websit,0.003043,0.993878,0.003079,yesterday feder juri honolulu found state hawaii hawaii transport airport hdot discrimin former employe sherri valmoja subject sexual harass verdict return file last year alleg defend violat titl prohibit discrimin basi race color nation origin religion evid present show employ explos detect canin handler honolulu intern airport valmoja subject sexual harass form lewd unwelcom comment physic intimid worker unwelcom conduct intimid began earli novemb valmoja worker employ privat compani contract defend valmoja worker becam employ state hawaii harass intimid continu juri found despit time complaint valmoja worker conduct defend fail take prompt effect action remedi harass continu march creat abus hostil work environ juri award valmoja compens pain suffer endur harass decis addit injunct relief still pend ask perman injunct prohibit state hawaii discrimin employe review revis defend sexual harass polici complaint procedur train employe discrimin vigor enforc titl ensur peopl work free sexual harass retali said princip deputi vanita gupta head juri verdict send loud messag clear remind continu effect combat base discrimin whenev occur public sector workplac valmoja origin file sexual harass charg hdot honolulu field equal employ opportun commiss eeoc investig determin reason caus believ discrimin occur refer matter lawsuit brought result project design ensur vigor enforc titl state local government employ enhanc cooper eeoc sexual harass remain signific problem nation workforc said eeoc chair jenni yang eeoc take serious oblig obtain redress employe victim egregi practic verdict serv remind employ must remain vigil prevent remedi harass workplac inform titl feder employ law avail employ litig section websit continu enforc titl prioriti addit inform avail websit eeoc enforc feder law prohibit employ discrimin inform eeoc avail websit,0.093621,...,Civil Rights,Civil Rights Division; Civil Rights - Employment Litigation Section,105,0.158,0.765,0.077,-0.9931,yesterday feder juri honolulu found state hawaii hawaii transport airport hdot discrimin former employe sherri valmoja subject sexual harass verdict return file last year alleg defend violat titl prohibit discrimin basi race color nation origin religion evid present show employ explos detect canin handler honolulu intern airport valmoja subject sexual harass form lewd unwelcom comment physic intimid worker unwelcom conduct intimid began earli novemb valmoja worker employ privat compani contract defend valmoja worker becam employ state hawaii harass intimid continu juri found despit time complaint valmoja worker conduct defend fail take prompt effect action remedi harass continu march creat abus hostil work environ juri award valmoja compens pain suffer endur harass decis addit injunct relief still pend ask perman injunct prohibit state hawaii discrimin employe review revis defend sexual harass polici complaint procedur train employe discrimin vigor enforc titl ensur peopl work free sexual harass retali said princip deputi vanita gupta head juri verdict send loud messag clear remind continu effect combat base discrimin whenev occur public sector workplac valmoja origin file sexual harass charg hdot honolulu field equal employ opportun commiss eeoc investig determin reason caus believ discrimin occur refer matter lawsuit brought result project design ensur vigor enforc titl state local government employ enhanc cooper eeoc sexual harass remain signific problem nation workforc said eeoc chair jenni yang eeoc take serious oblig obtain redress employe victim egregi practic verdict serv remind employ must remain vigil prevent remedi harass workplac inform titl feder employ law avail employ litig section websit continu enforc titl prioriti addit inform avail websit eeoc enforc feder law prohibit employ discrimin inform eeoc avail websit,topic_1,y_e_s_t_e_r_d_a_y_ _f_e_d_e_r_ _j_u_r_i_ _h_o_n_o_l_u_l_u_ _f_o_u_n_d_ _s_t_a_t_e_ _h_a_w_a_i_i_ _h_a_w_a_i_i_ _t_r_a_n_s_p_o_r_t_ _a_i_r_p_o_r_t_ _h_d_o_t_ _d_i_s_c_r_i_m_i_n_ _f_o_r_m_e_r_ _e_m_p_l_o_y_e_ _s_h_e_r_r_i_ _v_a_l_m_o_j_a_ _s_u_b_j_e_c_t_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _v_e_r_d_i_c_t_ _r_e_t_u_r_n_ _f_i_l_e_ _l_a_s_t_ _y_e_a_r_ _a_l_l_e_g_ _d_e_f_e_n_d_ _v_i_o_l_a_t_ _t_i_t_l_ _p_r_o_h_i_b_i_t_ _d_i_s_c_r_i_m_i_n_ _b_a_s_i_ _r_a_c_e_ _c_o_l_o_r_ _n_a_t_i_o_n_ _o_r_i_g_i_n_ _r_e_l_i_g_i_o_n_ _e_v_i_d_ _p_r_e_s_e_n_t_ _s_h_o_w_ _e_m_p_l_o_y_ _e_x_p_l_o_s_ _d_e_t_e_c_t_ _c_a_n_i_n_ _h_a_n_d_l_e_r_ _h_o_n_o_l_u_l_u_ _i_n_t_e_r_n_ _a_i_r_p_o_r_t_ _v_a_l_m_o_j_a_ _s_u_b_j_e_c_t_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _f_o_r_m_ _l_e_w_d_ _u_n_w_e_l_c_o_m_ _c_o_m_m_e_n_t_ _p_h_y_s_i_c_ _i_n_t_i_m_i_d_ _w_o_r_k_e_r_ _u_n_w_e_l_c_o_m_ _c_o_n_d_u_c_t_ _i_n_t_i_m_i_d_ _b_e_g_a_n_ _e_a_r_l_i_ _n_o_v_e_m_b_ _v_a_l_m_o_j_a_ _w_o_r_k_e_r_ _e_m_p_l_o_y_ _p_r_i_v_a_t_ _c_o_m_p_a_n_i_ _c_o_n_t_r_a_c_t_ _d_e_f_e_n_d_ _v_a_l_m_o_j_a_ _w_o_r_k_e_r_ _b_e_c_a_m_ _e_m_p_l_o_y_ _s_t_a_t_e_ _h_a_w_a_i_i_ _h_a_r_a_s_s_ _i_n_t_i_m_i_d_ _c_o_n_t_i_n_u_ _j_u_r_i_ _f_o_u_n_d_ _d_e_s_p_i_t_ _t_i_m_e_ _c_o_m_p_l_a_i_n_t_ _v_a_l_m_o_j_a_ _w_o_r_k_e_r_ _c_o_n_d_u_c_t_ _d_e_f_e_n_d_ _f_a_i_l_ _t_a_k_e_ _p_r_o_m_p_t_ _e_f_f_e_c_t_ _a_c_t_i_o_n_ _r_e_m_e_d_i_ _h_a_r_a_s_s_ _c_o_n_t_i_n_u_ _m_a_r_c_h_ _c_r_e_a_t_ _a_b_u_s_ _h_o_s_t_i_l_ _w_o_r_k_ _e_n_v_i_r_o_n_ _j_u_r_i_ _a_w_a_r_d_ _v_a_l_m_o_j_a_ _c_o_m_p_e_n_s_ _p_a_i_n_ _s_u_f_f_e_r_ _e_n_d_u_r_ _h_a_r_a_s_s_ _d_e_c_i_s_ _a_d_d_i_t_ _i_n_j_u_n_c_t_ _r_e_l_i_e_f_ _s_t_i_l_l_ _p_e_n_d_ _a_s_k_ _p_e_r_m_a_n_ _i_n_j_u_n_c_t_ _p_r_o_h_i_b_i_t_ _s_t_a_t_e_ _h_a_w_a_i_i_ _d_i_s_c_r_i_m_i_n_ _e_m_p_l_o_y_e_ _r_e_v_i_e_w_ _r_e_v_i_s_ _d_e_f_e_n_d_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _p_o_l_i_c_i_ _c_o_m_p_l_a_i_n_t_ _p_r_o_c_e_d_u_r_ _t_r_a_i_n_ _e_m_p_l_o_y_e_ _d_i_s_c_r_i_m_i_n_ _v_i_g_o_r_ _e_n_f_o_r_c_ _t_i_t_l_ _e_n_s_u_r_ _p_e_o_p_l_ _w_o_r_k_ _f_r_e_e_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _r_e_t_a_l_i_ _s_a_i_d_ _p_r_i_n_c_i_p_ _d_e_p_u_t_i_ _v_a_n_i_t_a_ _g_u_p_t_a_ _h_e_a_d_ _j_u_r_i_ _v_e_r_d_i_c_t_ _s_e_n_d_ _l_o_u_d_ _m_e_s_s_a_g_ _c_l_e_a_r_ _r_e_m_i_n_d_ _c_o_n_t_i_n_u_ _e_f_f_e_c_t_ _c_o_m_b_a_t_ _b_a_s_e_ _d_i_s_c_r_i_m_i_n_ _w_h_e_n_e_v_ _o_c_c_u_r_ _p_u_b_l_i_c_ _s_e_c_t_o_r_ _w_o_r_k_p_l_a_c_ _v_a_l_m_o_j_a_ _o_r_i_g_i_n_ _f_i_l_e_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _c_h_a_r_g_ _h_d_o_t_ _h_o_n_o_l_u_l_u_ _f_i_e_l_d_ _e_q_u_a_l_ _e_m_p_l_o_y_ _o_p_p_o_r_t_u_n_ _c_o_m_m_i_s_s_ _e_e_o_c_ _i_n_v_e_s_t_i_g_ _d_e_t_e_r_m_i_n_ _r_e_a_s_o_n_ _c_a_u_s_ _b_e_l_i_e_v_ _d_i_s_c_r_i_m_i_n_ _o_c_c_u_r_ _r_e_f_e_r_ _m_a_t_t_e_r_ _l_a_w_s_u_i_t_ _b_r_o_u_g_h_t_ _r_e_s_u_l_t_ _p_r_o_j_e_c_t_ _d_e_s_i_g_n_ _e_n_s_u_r_ _v_i_g_o_r_ _e_n_f_o_r_c_ _t_i_t_l_ _s_t_a_t_e_ _l_o_c_a_l_ _g_o_v_e_r_n_m_e_n_t_ _e_m_p_l_o_y_ _e_n_h_a_n_c_ _c_o_o_p_e_r_ _e_e_o_c_ _s_e_x_u_a_l_ _h_a_r_a_s_s_ _r_e_m_a_i_n_ _s_i_g_n_i_f_i_c_ _p_r_o_b_l_e_m_ _n_a_t_i_o_n_ _w_o_r_k_f_o_r_c_ _s_a_i_d_ _e_e_o_c_ _c_h_a_i_r_ _j_e_n_n_i_ _y_a_n_g_ _e_e_o_c_ _t_a_k_e_ _s_e_r_i_o_u_s_ _o_b_l_i_g_ _o_b_t_a_i_n_ _r_e_d_r_e_s_s_ _e_m_p_l_o_y_e_ _v_i_c_t_i_m_ _e_g_r_e_g_i_ _p_r_a_c_t_i_c_ _v_e_r_d_i_c_t_ _s_e_r_v_ _r_e_m_i_n_d_ _e_m_p_l_o_y_ _m_u_s_t_ _r_e_m_a_i_n_ _v_i_g_i_l_ _p_r_e_v_e_n_t_ _r_e_m_e_d_i_ _h_a_r_a_s_s_ _w_o_r_k_p_l_a_c_ _i_n_f_o_r_m_ _t_i_t_l_ _f_e_d_e_r_ _e_m_p_l_o_y_ _l_a_w_ _a_v_a_i_l_ _e_m_p_l_o_y_ _l_i_t_i_g_ _s_e_c_t_i_o_n_ _w_e_b_s_i_t_ _c_o_n_t_i_n_u_ _e_n_f_o_r_c_ _t_i_t_l_ _p_r_i_o_r_i_t_i_ _a_d_d_i_t_ _i_n_f_o_r_m_ _a_v_a_i_l_ _w_e_b_s_i_t_ _e_e_o_c_ _e_n_f_o_r_c_ _f_e_d_e_r_ _l_a_w_ _p_r_o_h_i_b_i_t_ _e_m_p_l_o_y_ _d_i_s_c_r_i_m_i_n_ _i_n_f_o_r_m_ _e_e_o_c_ _a_v_a_i_l_ _w_e_b_s_i_t
715,yesterday file lawsuit citi lubbock texa alleg citi polic engag pattern practic employ discrimin hispan women violat titl lawsuit file court northern texa alleg lubbock polic written physic fit examin effect exclud hispan femal applic consider hire entri level polic offic without show test screen candid skill requir share lubbock goal hire qualifi applic perform critic public safeti function said princip deputi general vanita gupta head feder prohibit employ use discriminatori employ practic meaning evalu abil perform given ensur citi elimin unlaw test hope work cooper citi creat select procedur unlaw discrimin lawsuit seek court order requir stop use challeng examin develop select procedur entri level polic offic posit compli titl provid make whole relief includ appropri offer hire back retroact senior qualifi hispan women harm result challeng examin enforc feder employ discrimin law prioriti addit inform titl feder employ law avail websit http lubbock complaint,0.002724,0.994492,0.002784,yesterday file lawsuit citi lubbock texa alleg citi polic engag pattern practic employ discrimin hispan women violat titl lawsuit file court northern texa alleg lubbock polic written physic fit examin effect exclud hispan femal applic consider hire entri level polic offic without show test screen candid skill requir share lubbock goal hire qualifi applic perform critic public safeti function said princip deputi general vanita gupta head feder prohibit employ use discriminatori employ practic meaning evalu abil perform given ensur citi elimin unlaw test hope work cooper citi creat select procedur unlaw discrimin lawsuit seek court order requir stop use challeng examin develop select procedur entri level polic offic posit compli titl provid make whole relief includ appropri offer hire back retroact senior qualifi hispan women harm result challeng examin enforc feder employ discrimin law prioriti addit inform titl feder employ law avail websit http lubbock complaint,0.002724,0.994492,0.002784,yesterday file lawsuit citi lubbock texa alleg citi polic engag pattern practic employ discrimin hispan women violat titl lawsuit file court northern texa alleg lubbock polic written physic fit examin effect exclud hispan femal applic consider hire entri level polic offic without show test screen candid skill requir share lubbock goal hire qualifi applic perform critic public safeti function said princip deputi general vanita gupta head feder prohibit employ use discriminatori employ practic meaning evalu abil perform given ensur citi elimin unlaw test hope work cooper citi creat select procedur unlaw discrimin lawsuit seek court order requir stop use challeng examin develop select procedur entri level polic offic posit compli titl provid make whole relief includ appropri offer hire back retroact senior qualifi hispan women harm result challeng examin enforc feder employ discrimin law prioriti addit inform titl feder employ law avail websit http lubbock complaint,0.002926,...,Civil Rights,Civil Rights Division; Civil Rights - Employment Litigation Section,280,0.084,0.826,0.090,0.5385,yesterday file lawsuit citi lubbock texa alleg citi polic engag pattern practic employ discrimin hispan women violat titl lawsuit file court northern texa alleg lubbock polic written physic fit examin effect exclud hispan femal applic consider hire entri level polic offic without show test screen candid skill requir share lubbock goal hire qualifi applic perform critic public safeti function said princip deputi general vanita gupta head feder prohibit employ use discriminatori employ practic meaning evalu abil perform given ensur citi elimin unlaw test hope work cooper citi creat select procedur unlaw discrimin lawsuit seek court order requir stop use challeng examin develop select procedur entri level polic offic posit compli titl provid make whole relief includ appropri offer hire back retroact senior qualifi hispan women harm result challeng examin enforc feder employ discrimin law prioriti addit inform titl feder employ law avail websit http lubbock complaint,topic_1,y_e_s_t_e_r_d_a_y_ _f_i_l_e_ _l_a_w_s_u_i_t_ _c_i_t_i_ _l_u_b_b_o_c_k_ _t_e_x_a_ _a_l_l_e_g_ _c_i_t_i_ _p_o_l_i_c_ _e_n_g_a_g_ _p_a_t_t_e_r_n_ _p_r_a_c_t_i_c_ _e_m_p_l_o_y_ _d_i_s_c_r_i_m_i_n_ _h_i_s_p_a_n_ _w_o_m_e_n_ _v_i_o_l_a_t_ _t_i_t_l_ _l_a_w_s_u_i_t_ _f_i_l_e_ _c_o_u_r_t_ _n_o_r_t_h_e_r_n_ _t_e_x_a_ _a_l_l_e_g_ _l_u_b_b_o_c_k_ _p_o_l_i_c_ _w_r_i_t_t_e_n_ _p_h_y_s_i_c_ _f_i_t_ _e_x_a_m_i_n_ _e_f_f_e_c_t_ _e_x_c_l_u_d_ _h_i_s_p_a_n_ _f_e_m_a_l_ _a_p_p_l_i_c_ _c_o_n_s_i_d_e_r_ _h_i_r_e_ _e_n_t_r_i_ _l_e_v_e_l_ _p_o_l_i_c_ _o_f_f_i_c_ _w_i_t_h_o_u_t_ _s_h_o_w_ _t_e_s_t_ _s_c_r_e_e_n_ _c_a_n_d_i_d_ _s_k_i_l_l_ _r_e_q_u_i_r_ _s_h_a_r_e_ _l_u_b_b_o_c_k_ _g_o_a_l_ _h_i_r_e_ _q_u_a_l_i_f_i_ _a_p_p_l_i_c_ _p_e_r_f_o_r_m_ _c_r_i_t_i_c_ _p_u_b_l_i_c_ _s_a_f_e_t_i_ _f_u_n_c_t_i_o_n_ _s_a_i_d_ _p_r_i_n_c_i_p_ _d_e_p_u_t_i_ _g_e_n_e_r_a_l_ _v_a_n_i_t_a_ _g_u_p_t_a_ _h_e_a_d_ _f_e_d_e_r_ _p_r_o_h_i_b_i_t_ _e_m_p_l_o_y_ _u_s_e_ _d_i_s_c_r_i_m_i_n_a_t_o_r_i_ _e_m_p_l_o_y_ _p_r_a_c_t_i_c_ _m_e_a_n_i_n_g_ _e_v_a_l_u_ _a_b_i_l_ _p_e_r_f_o_r_m_ _g_i_v_e_n_ _e_n_s_u_r_ _c_i_t_i_ _e_l_i_m_i_n_ _u_n_l_a_w_ _t_e_s_t_ _h_o_p_e_ _w_o_r_k_ _c_o_o_p_e_r_ _c_i_t_i_ _c_r_e_a_t_ _s_e_l_e_c_t_ _p_r_o_c_e_d_u_r_ _u_n_l_a_w_ _d_i_s_c_r_i_m_i_n_ _l_a_w_s_u_i_t_ _s_e_e_k_ _c_o_u_r_t_ _o_r_d_e_r_ _r_e_q_u_i_r_ _s_t_o_p_ _u_s_e_ _c_h_a_l_l_e_n_g_ _e_x_a_m_i_n_ _d_e_v_e_l_o_p_ _s_e_l_e_c_t_ _p_r_o_c_e_d_u_r_ _e_n_t_r_i_ _l_e_v_e_l_ _p_o_l_i_c_ _o_f_f_i_c_ _p_o_s_i_t_ _c_o_m_p_l_i_ _t_i_t_l_ _p_r_o_v_i_d_ _m_a_k_e_ _w_h_o_l_e_ _r_e_l_i_e_f_ _i_n_c_l_u_d_ _a_p_p_r_o_p_r_i_ _o_f_f_e_r_ _h_i_r_e_ _b_a_c_k_ _r_e_t_r_o_a_c_t_ _s_e_n_i_o_r_ _q_u_a_l_i_f_i_ _h_i_s_p_a_n_ _w_o_m_e_n_ _h_a_r_m_ _r_e_s_u_l_t_ _c_h_a_l_l_e_n_g_ _e_x_a_m_i_n_ _e_n_f_o_r_c_ _f_e_d_e_r_ _e_m_p_l_o_y_ _d_i_s_c_r_i_m_i_n_ _l_a_w_ _p_r_i_o_r_i_t_i_ _a_d_d_i_t_ _i_n_f_o_r_m_ _t_i_t_l_ _f_e_d_e_r_ _e_m_p_l_o_y_ _l_a_w_ _a_v_a_i_l_ _w_e_b_s_i_t_ _h_t_t_p_ _l_u_b_b_o_c_k_ _c_o_m_p_l_a_i_n_t


C. Use the create_dtm function and the `processed_text_bigrams` column to create a document-term matrix (`dtm_bigram`) with these bigrams. Keep the following three columns in the data: `id`, `topics_clean`, and `compound` 

D. Print the (1) dimensions of the `dtm` matrix from question 2.2  and (2) the dimensions of the `dtm_bigram` matrix. Comment on why the bigram matrix has more dimensions than the unigram matrix 

E. Find and print the 10 most prevelant bigrams for each of the three topics_clean using the `get_topwords` function from 2.2

In [28]:
# your code here

# 4. Optional extra credit (2 points)

You notice that the pharmaceutical kickbacks press release we analyzed in question 1 was for an indictment, and that in the original data, there's not a clear label for whether a press release outlines an indictment (charging someone with a crime), a conviction (convicting them after that charge either via a settlement or trial), or a sentencing (how many years of prison or supervised release a defendant is sentenced to after their conviction).

You want to see if you can identify pairs of press releases where one press release is from one stage (e.g., indictment) and another is from a different stage (e.g., a sentencing).

You decide that one way to approach is to find the pairwise string similarity between each of the processed press releases in `doj_subset`. There are many ways to do this, so Google for some approaches, focusing on ones that work well for entire documents rather than small strings.

Find the top two pairs (so four press releases total)-- do they seem like different stages of the same crime or just press releases covering similar crimes?

In [29]:
# your code here 