## Applying the Classifier

Let's apply the classifier to the data we tried to manaully code

In [1]:
import pandas as pd

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 300)

%matplotlib inline

## Reading the data

In [2]:
X_train = pd.read_csv('assaults_downgraded_train.csv', index_col=0)
X_test_with_answers = pd.read_csv('assaults_downgraded_test_with_answers.csv', index_col=0)
X_test = pd.read_csv('assaults_downgraded_test.csv', index_col=0).drop(columns='downgraded').rename(columns={'serious': 'serious_you'})
X_test

Unnamed: 0,CCDESC,DO_NARRATIVE,serious_you
483580,,DO- S AND V BECAME INVOLV IN AN ARGUMENT S BECAME UPSET AND STRUCK V IN THE FACE WITH A CLOSED FIST FIVE TIMES,
745059,,DO-VICT AND SUSP INVOLVED IN A VERBAL ARGUMENT SUSP SPIT ONCE IN THE VICTS FACE SUSP FLED ON BICYCLE,
644873,,DO-SUSP AND VIC WERE INVLD IN A VERBAL ARGUMENT SUSP STRUCK VIC IN HAND WITH UNK OBJECT CAUSING HALF INCH LACERATION TO HIS LEFT THUMB,
394517,,DO-WHILE VICT WALKING TO SCHOOL SHE WAS APPROACH BY SUSP WHO ALSO IS A STUDENT TOLD VICT COME HERE SUSP PUNCH VICT IN FACE WITH A FIST AND SLAP VICT,
604009,,DO-SUSP GRABBED VICT BY THE SHIRT AND PUSHED VICT LEAVING VISIBLE INJURY,
223707,,DO-S ATT TO PUSH V OFF OF HER BIKE,
295037,,DO-SUSP PUSHED VICT DURING CHILD CUSTODY CHANGE,
216580,,DO-SUSP STABBED VIC WITH UNK WEAPON MULTIPLE TIMES SUSP FLED IN UNK DIR,
807867,,DO-VICT AND SUSP GOT INTO VERBAL ARGUMENT SUSP BECAME HEATED AND STRUCK VICT ON THE FACE STOMACH AND ARM,
685433,,DO-V WAS STRUCK WITH A CLOSED FIST BY HER HUSBAND,


## Vectorize & Classify

Vectorize and classify in one big cell!

In [3]:
from nltk.stem import SnowballStemmer
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_validate

nltk.download('omw-1.4')

# Define stemmer function
stemmer = SnowballStemmer('english')
class StemmedTfidfVectorizer(TfidfVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedTfidfVectorizer,self).build_analyzer()
        return lambda doc:(stemmer.stem(word) for word in analyzer(doc))
    
# vectorize from training set    
vectorizer = StemmedTfidfVectorizer(min_df=15, max_df=0.5)
X = vectorizer.fit_transform(X_train.DO_NARRATIVE)

# classify
y = X_train.serious
clf = LinearSVC()
clf.fit(X, y)

# get scores - cross validate
scores = cross_validate(clf, X, y, cv=10,
                        scoring=('accuracy', 'precision', 'recall', 'f1'))

# here are some other types of scores
# https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
scores_df = pd.DataFrame(scores)
display(scores_df.round(2))
pd.DataFrame(scores)[
    ['fit_time', 'score_time', 'test_accuracy','test_precision','test_recall','test_f1']]\
    .mean().round(2)

[nltk_data] Downloading package omw-1.4 to /Users/mehtad/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


Unnamed: 0,fit_time,score_time,test_accuracy,test_precision,test_recall,test_f1
0,0.99,0.01,0.87,0.78,0.67,0.72
1,1.01,0.01,0.87,0.77,0.67,0.72
2,0.99,0.01,0.87,0.77,0.67,0.72
3,0.98,0.01,0.87,0.77,0.66,0.71
4,1.0,0.01,0.87,0.78,0.67,0.72
5,1.01,0.01,0.87,0.77,0.67,0.72
6,1.16,0.01,0.87,0.77,0.67,0.72
7,1.03,0.01,0.87,0.78,0.65,0.71
8,1.0,0.01,0.87,0.76,0.66,0.71
9,1.11,0.01,0.87,0.77,0.66,0.71


fit_time          1.03
score_time        0.01
test_accuracy     0.87
test_precision    0.77
test_recall       0.66
test_f1           0.71
dtype: float64

## Predictions on the FULL dataset (not just the testing set)


In [4]:
df = pd.read_csv('assault.csv')
vectors = vectorizer.transform(df.DO_NARRATIVE)
df['prediction'] = clf.predict(vectors)
df['prediction_score'] = clf.decision_function(vectors)
df

Unnamed: 0.1,Unnamed: 0,CCDESC,DO_NARRATIVE,serious,downgraded,prediction,prediction_score
0,2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES,0,1,0,-0.037173
1,4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS,0,0,0,-0.881494
2,9,BATTERY - SIMPLE ASSAULT,DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V,0,0,0,-0.119902
3,11,BATTERY - SIMPLE ASSAULT,DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V,0,0,0,-1.078737
4,16,BATTERY - SIMPLE ASSAULT,DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER,0,0,0,-0.953604
...,...,...,...,...,...,...,...
165960,830201,BATTERY - SIMPLE ASSAULT,DO-SUSP WAS UPSET VICT WAS LEAVING HER SUSP STRUCK VICT IN THE JAW CAUSINGA VISIBLE INJURY,0,0,0,-0.866918
165961,830206,BATTERY - SIMPLE ASSAULT,DO- S1 ARRIVES AT VICTS HOUSE AND POINTS HANDGUN AT V1 AND V3 STRIKES V2 WITH HANDS,0,0,1,0.350405
165962,830207,INTIMATE PARTNER - AGGRAVATED ASSAULT,DO-SUSP APPROACHED VICT FROM BEHIND AND SCRATCHED VICT ON TOP OF HEAD CAUSING INJURY,0,1,0,-0.627456
165963,830208,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-SUSP AND VICT INVOLVED IN A VERBAL DISPUTE SUSP BECAME ENRAGED AND STABBED VICT WITH A GLASS PIPE,1,0,1,1.233102


# Let's evaluate our classifier

When you build a classifier, you'll talk about your **evaluation metric**, what you use to judge how well your algorithm performed. Typically this is **accuracy** - how often was your prediction correct?

## How often did our prediction match whether a crime was listed as serious?

In [5]:
(df.prediction == df.serious).value_counts(normalize=True)

True     0.879143
False    0.120857
dtype: float64

88% doesn't seem that bad!

Remember, though, **15% of the serious crimes have been downgraded**. We don't actually care whether the prediction matches **if the crime has been downgraded**. We need to see whether we correctly predicted reports marked as serious *or* downgraded reports.

## How often did we match the true serious/not serious value?

Since we're interested in uncovering the secretly-serious reports, we want to see whether it's serious *or* downgraded.

In [6]:
(df.prediction == (df.serious | df.downgraded)).value_counts(normalize=True)

True     0.891465
False    0.108535
dtype: float64

We actually did better when including the secrets! 89%!

While this seems good, **it isn't what we're actually after.** We're specifically doing research on **finding downgraded reports,** so what we're interested in is **how often we found reports marked as non-serious that were downgraded from serious**.

## How often did we catch downgrades?

To figure this out, we'll first make sure we're only looking at downgraded reports, and then see how many of them we predicted as being serious assault.

In [7]:
# Only select downgraded reports
downgraded_df = df[df.downgraded == 1]

# How often did we predict they were serious?
(downgraded_df.prediction == 1).value_counts(normalize=True)

True     0.64436
False    0.35564
Name: prediction, dtype: float64

In [8]:
# And again, without the percentage, in case you're curious
(downgraded_df.prediction == 1).value_counts()

True     4564
False    2519
Name: prediction, dtype: int64

We were able to find around 4,500 of our 7,000 downgraded offenses. **That's about 65% of them.**

Let's finish up for now and discuss what we think about our techniques and scoring methods. If you're interested in picking apart the ones we got wrong and investigating the algorithm a little further, I recommend the **Inspecting misclassifications** notebook. 

## Review

We reproduced an ersatz version of a Los Angeles Times piece where they uncovered **serious assaults that had been downgraded by the LAPD** to simple assault. We don't have access to the original classifications, so we used a dataset of assaults between 2008 and 2012 and downgraded a random 15% of the serious assaults.

Using **text analysis**, we first analyzed the words used in a description of assault - less common words were given more weight, and incredibly common words were left out altogether. Using these results, we then created a **classifier**, teaching the classifier which words were associated with simple assault compared to aggravated assault.

Finally, we used the classifier to **predict whether each assault was aggravated or simple assault**. If a crime was predicted as serious but marked as non-serious, it needed to be examined as a possible downgrade. Our algorithm correctly pointed out around **65%** of the randomly downgraded crimes.

## Discussion topics

* Our algorithm had 88% accuracy overall, but only 65% in detecting downgraded crimes. What's the difference here? How important is one score compared to the other?
* We only hit around 65% accuracy in finding downgraded crimes. Is this a useful score? How does it compare to random guessing, or going one-by-one through the crimes marked as non-serious?
* What techniques could we have used to find downgraded crimes if we didn't use machine learning?
* Is there a difference between looking at the prediction - the 0 or 1 - and looking at the output of `decision_function`?
* What happens if our algorithm errs on the side of calling non-serious crimes serious crimes? What if it errs on the side of calling serious crimes non-serious crimes?
* If we want to find more downgraded cases (but do more work), we'll want to err on the side of examining more potentially-serious cases. Is there a better method than picking random cases?
* One of our first steps was to eliminate all crimes that weren't assaults. How do you think this helped or hindered our analysis?
* Why did we use LinearSVC instead of another classifier such as LogisticRegression, RandomForest or Naive Bayes (MultinomialNB)? Why might we try or not try those?
* You don't work for the LAPD, so you can only be so sure what should and shouldn't be a serious crime. What can you do to help feel confident that a case should be one or the other, or that our algorithm is working as promised?
* In this case, we randomly picked serious crimes to downgrade. Would it be easier or more difficult if the LAPD was systematically downgrading certain types of serious crimes? Can you think of a way to around that sort of trickery?
* Many people say you need to release your data and analysis in order to have people trust what you've done. With something like this dataset, however, you're dealing with real things that happened to real people, many of whom would probably prefer to keep these things private. Is that a reasonable expectation? If it is, what can be done to bridge the gap between releasing all of the original data and keeping our process secret?