# Narrative Analysis
In the Williamstown police department data, there are frequently "Narrative" fields associated with each incident. We have several overarching questions regarding these narratives, which include:
* Is there evidence of the "boys will be boys" culture instilled in the department by the previous police chief in the descriptions of incidents and interactions with people of varying demographic backgrounds and incident types? (e.g. do the narratives about noise complaints for parties differ from narratives about the police being called on a young woman?)
* Is there evidence that third shift is being more problematic (more tickets, etc) than other shifts? (This question is prompted by the recent revelation that the cop with the Hitler photo in his locker had been leading third shift cops in illegal searches of racial justice activists in the town)
* How much time in each shift is, on average, occupied by responding to calls? (This question is aimed at either supporting or disproving the hypothesis that there are too many cops in the department, and that potentially something like third shift could be handled by one on-call individual)
* How often and for what types of incidents does Williamstown PD respond to incidents at Williams College? (The PD listens to the Williams College dispatch and responds if the incidents are sever enough/may affect the town itself)

In this notebook I'll be relying on [spaCy](https://spacy.io/) to manipulate text. spaCy is a natural language processing (NLP) library that provides functionality for a variety of tasks.

## Sentiment analysis
In this section, we'll explore the use of [sentiment analysis](https://monkeylearn.com/sentiment-analysis/) to see if there is any difference in percieved sentiment across police narratives for different calls. Ideas for this section include:
* Associating incident type with narartive sentiment 
* Associating citizen and cop dempgraphic data with narrative sentiment 
* Associating specific officers with the average narrative sentiment of thier calls 

### Polarity analysis
Polarity analysis examines whether a text is positive, negative, or neutral in sentiment
#### spaCy TextBlob
[spaCy TextBlob](https://spacy.io/universe/project/spacy-textblob) is a spaCy pipeline component that connects spaCy's functionality to the [TextBlob](https://github.com/sloria/TextBlob) module. In addition to polarity analysis, it also does subjectivity analysis, which is a score representing how subjective a piece of text is.

In [3]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')

<spacytextblob.spacytextblob.SpacyTextBlob at 0x7fce29930910>

In this notebook I'll be manually pasting examples of narratives; but this will have to be scaled up once the OCR pipeline is done.

In [6]:
# Call 20-4
text = ("Activated Burglar Alarm. Key Holder Notified. Advised Dispatch that they have been having trouble with one of the zones. "
        "31 advised no need for key holder all is ok."
        "Checked building. Front and rear secure. All appears in order. Key holder was on phone with dispatch and satisfied.")
doc = nlp(text)

doc._.polarity, doc._.subjectivity

(0.11666666666666665, 0.8000000000000002)

From the [TextBlob docs](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis): "The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective."

In [7]:
# Call 20-6
text = ("Active Structure fire. 1 party still inside at this time. "
        "Car 1 requesting to see if an officer can check the house aroud 0600 to make sure there are no signs of the fire reigniting."
        "House checked")
doc = nlp(text)

doc._.polarity, doc._.subjectivity

(0.18333333333333335, 0.7444444444444445)

#### Using Dirk's parser
Dirk wrote a notebook with a parser for the PDF-to-text documen tthat Elijah provided, which is a rudimentary source of all the data. Copying and pasting from the `ParseOCRed_text.ipynb` notebook:

In [8]:
import re

In [9]:
filename = '../generated_data/Logs2020OCR_avepdf.com_horizontal.txt'

In [10]:
lines = []
with open(filename) as f:
    lines = f.readlines()
    
for ind in range(len(lines)):
    lines[ind] = lines[ind].lstrip().rstrip()

In [11]:
calls = []

call = []
for line in lines:
    if line == '':
        if len(call) > 0:
            calls.append(call)
            call=[]
    else:
        call.append(line)

In [12]:
len(calls)

18070

#### Getting narrative sentiment for different officers
Here, I'm going to try to correlate narrative sentiment to the Call Taker (officer) responding to the incident. I'm going to, for each call, get the call number, the call taker, and the narrative in a df, and then do a groupby on the sentiment and subjectivity for each officer,

In [14]:
import pandas as pd

In [15]:
# Define a regex to find valid call numbers
call_num_regex = '\d+-\d+'

# Go through calls and keep only those with a valid call number 
valid_calls = []
for call in calls:
    match = re.match(call_num_regex, call[0])
    if match:
         valid_calls.append(call)

In [16]:
len(valid_calls)

7062

In [17]:
valid_calls[:10]

 ['20-2                          0128         Phone  - B.O.L.O.                                                                   SERVICES RENDERED',
  'Call    Taker:            MICHAEL STRIZZI',
  'Location/Address:                 NORTH HOOSAC RD',
  'Unit:            31',
  'Disp-01:29:45                                          Arvd-01:31:40           Clrd-01:37:57',
  'Unit:            34',
  'Disp-01:29:45          Enrt-01:30:45          Arvd-01:31:54           Clrd-01:38:02',
  'Unit:             37K',
  'Disp-01:29:45                                                                           Clrd-01:30:53',
  'Narrative:',
  'Erratic       opp  from    North    Adams.  Maroon    vehicle      VT  plates.',
  'Traveling       west    on  Mass  Ave.    Possibly      no   rear    bumper.'],
 ['20-3                          0201         Phone   - B.O.L.O.                                                                  UNABLE TO LOCATE',
  'Call     Taker:           MICHAEL  STRIZZI

In [18]:
from collections import defaultdict

In [44]:
# Make a dict for the pandas df
# Keys are 'call_num', 'call_taker', 'narrative'

call_num_regex = '\d+-\d+'
call_taker_regex = re.compile('Call\s+Taker:')

call_dict = defaultdict(list)
for call in valid_calls:
    if 'Narrative:' in call:
        
        # Get call number
        call_num_match = re.match(call_num_regex, call[0])
        call_num = call[0][:call_num_match.end()]
        
        # Get the call taker
        call_taker_list = list(filter(call_taker_regex.match, call)) # Looks for the element of the call that contains the call taker
        if len(call_taker_list) > 0:
            call_taker_match = re.match(call_taker_regex, call_taker_list[0]) 
            call_taker = call[1][call_taker_match.end():].strip() # Get rid of leading and trailing 
            call_taker = ' '.join(call_taker.split()) # Make sure all spaces are single whitespaces so that the same officers' names are always identical
        else: call_taker = None
        
        # Get narrative
        narr_idx = call.index('Narrative:')
        if narr_idx != len(call)-1:
            narrative = ' '.join(call[narr_idx+1:])
            narrative = ' '.join(narrative.split())
        else: narrative = None
            
        # Assign all to the dict
        if (call_taker is not None) and (narrative is not None): 
            call_dict["call_num"].append(call_num)
            call_dict["call_taker"].append(call_taker)
            call_dict["narrative"].append(narrative)

In [45]:
# Make pandas df 
call_df = pd.DataFrame(call_dict)

In [46]:
call_df

Unnamed: 0,call_num,call_taker,narrative
0,20-2,MICHAEL STRIZZI,Erratic opp from North Adams. Maroon vehicle V...
1,20-3,MICHAEL STRIZZI,NAPD reports light colored SUV hit a light pos...
2,20-6,MICHAEL STRIZZI,Active Structure fire. l party still inside at...
3,20-14,SERGEANT SCOTT E MCGOWAN,"One MV, speed: 21"
4,20-17,"PATROL DAVID JENNINGS, D",Garage interior door.
...,...,...,...
2337,20-9079,DISPATCHER WILLIAM C JENNINGS JR,RP reporting that the street sweeper from the ...
2338,20-9081,PATROL SHUAN N WILLIAM,done
2339,20-9084,PATROL KALVIN DZIEDZIAK,Checked Area.
2340,20-9089,SERGEANT PAUL D THOMPSON,Vehicle broke a fuel injector while traveling ...


In [49]:
# Define functions to apply to df columns
def get_polarity(text):
    """
    parameters:
        text, str: text to get sentiment of 
        
    return:
        polarity, float: value representing polarity of the text
    """
    doc = nlp(text) # Relies on jupyter allowing access to this from outside function
    
    return doc._.polarity
    
    
def get_subjectivity(text):
    """
    parameters:
        text, str: text to get sentiment of 
        
    return:
        subjectivity, float: value representing subjectivity of the text
    """
    doc = nlp(text)
    
    return doc._.subjectivity

In [50]:
# Get narrative sentiment and subjectivity 
call_df['polarity'] = call_df['narrative'].apply(get_polarity)
call_df['subjectivity'] = call_df['narrative'].apply(get_subjectivity)

In [51]:
call_df.head()

Unnamed: 0,call_num,call_taker,narrative,polarity,subjectivity
0,20-2,MICHAEL STRIZZI,Erratic opp from North Adams. Maroon vehicle V...,0.0,1.0
1,20-3,MICHAEL STRIZZI,NAPD reports light colored SUV hit a light pos...,0.4,0.7
2,20-6,MICHAEL STRIZZI,Active Structure fire. l party still inside at...,-0.133333,0.6
3,20-14,SERGEANT SCOTT E MCGOWAN,"One MV, speed: 21",0.0,0.0
4,20-17,"PATROL DAVID JENNINGS, D",Garage interior door.,0.0,0.0


Now that we've got the data, let's get some stats and visualize!

In [60]:
# Look at the overall polarity and subjectivity
call_df.polarity.mean()

-0.01080010423294552

In [61]:
call_df.polarity.median()

0.0

In [58]:
call_df.subjectivity.mean()

0.1927621841222651

In [62]:
call_df.subjectivity.median()

0.05

In [63]:
call_df.groupby('call_taker')['polarity'].mean()

call_taker
                                   -0.005136
1                                  -0.200000
3                                  -0.155556
ALL EQUIPMENT POLICE DEPARTMENT    -0.016376
BARB BRUCATO                       -0.019935
CHIEF KYLE J JOHNSON               -0.010173
DISPATCHER CHRISTINE LEMOINE       -0.030182
DISPATCHER LAURIE TIJPER           -0.022222
DISPATCHER LAURIE TOPER             0.009117
DISPATCHER LAURIE TUPER            -0.020521
DISPATCHER LAURIE WPER              0.000000
DISPATCHER WILLIAM C JENNINGS JR    0.006676
LIEUTENANI MICHAEL J ZIEMBA Jr      0.000000
LIEUTENANT MICHAEL J ZIEMBA Jr     -0.021285
MICHAEL STRIZZI                     0.014355
PATROL ANTHONY M DUPRAT            -0.048385
PATROL BRAD SACCO                  -0.027417
PATROL CRAIG A EIC!ll!AMMER         0.000000
PATROL CRAIG A EICHHAMMER          -0.026562
PATROL DAVID JENNINGS, D           -0.015972
PATROL JOHN J MCCONNELL JR         -0.018452
PATROL KALVIN DZIEDZIAK            -0.026829

In [55]:
call_df.groupby('call_taker')['polarity'].median()

call_taker
                                    0.000000
1                                  -0.200000
3                                  -0.155556
ALL EQUIPMENT POLICE DEPARTMENT     0.000000
BARB BRUCATO                        0.000000
CHIEF KYLE J JOHNSON                0.000000
DISPATCHER CHRISTINE LEMOINE        0.000000
DISPATCHER LAURIE TIJPER            0.000000
DISPATCHER LAURIE TOPER             0.000000
DISPATCHER LAURIE TUPER             0.000000
DISPATCHER LAURIE WPER              0.000000
DISPATCHER WILLIAM C JENNINGS JR    0.000000
LIEUTENANI MICHAEL J ZIEMBA Jr      0.000000
LIEUTENANT MICHAEL J ZIEMBA Jr      0.000000
MICHAEL STRIZZI                     0.000000
PATROL ANTHONY M DUPRAT             0.000000
PATROL BRAD SACCO                   0.000000
PATROL CRAIG A EIC!ll!AMMER         0.000000
PATROL CRAIG A EICHHAMMER           0.000000
PATROL DAVID JENNINGS, D            0.000000
PATROL JOHN J MCCONNELL JR          0.000000
PATROL KALVIN DZIEDZIAK             0.000000

In [64]:
call_df.groupby('call_taker')['subjectivity'].mean()

call_taker
                                    0.224305
1                                   0.050000
3                                   0.288889
ALL EQUIPMENT POLICE DEPARTMENT     0.257065
BARB BRUCATO                        0.252261
CHIEF KYLE J JOHNSON                0.096726
DISPATCHER CHRISTINE LEMOINE        0.208512
DISPATCHER LAURIE TIJPER            0.094444
DISPATCHER LAURIE TOPER             0.214231
DISPATCHER LAURIE TUPER             0.235161
DISPATCHER LAURIE WPER              0.000000
DISPATCHER WILLIAM C JENNINGS JR    0.267363
LIEUTENANI MICHAEL J ZIEMBA Jr      0.000000
LIEUTENANT MICHAEL J ZIEMBA Jr      0.206981
MICHAEL STRIZZI                     0.273514
PATROL ANTHONY M DUPRAT             0.195478
PATROL BRAD SACCO                   0.133840
PATROL CRAIG A EIC!ll!AMMER         0.000000
PATROL CRAIG A EICHHAMMER           0.134375
PATROL DAVID JENNINGS, D            0.199828
PATROL JOHN J MCCONNELL JR          0.124062
PATROL KALVIN DZIEDZIAK             0.243451

In [56]:
call_df.groupby('call_taker')['subjectivity'].median()

call_taker
                                    0.173333
1                                   0.050000
3                                   0.288889
ALL EQUIPMENT POLICE DEPARTMENT     0.269231
BARB BRUCATO                        0.241667
CHIEF KYLE J JOHNSON                0.000000
DISPATCHER CHRISTINE LEMOINE        0.062500
DISPATCHER LAURIE TIJPER            0.000000
DISPATCHER LAURIE TOPER             0.100000
DISPATCHER LAURIE TUPER             0.153333
DISPATCHER LAURIE WPER              0.000000
DISPATCHER WILLIAM C JENNINGS JR    0.233333
LIEUTENANI MICHAEL J ZIEMBA Jr      0.000000
LIEUTENANT MICHAEL J ZIEMBA Jr      0.125000
MICHAEL STRIZZI                     0.219048
PATROL ANTHONY M DUPRAT             0.000000
PATROL BRAD SACCO                   0.000000
PATROL CRAIG A EIC!ll!AMMER         0.000000
PATROL CRAIG A EICHHAMMER           0.000000
PATROL DAVID JENNINGS, D            0.066667
PATROL JOHN J MCCONNELL JR          0.000000
PATROL KALVIN DZIEDZIAK             0.150000