# Information extraction from Madison city crime incident reports using Deep Learning

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc">
<ul class="toc-item">
<li><span><a href="#Introduction" data-toc-modified-id="Introduction-1">Introduction</a></span></li>
<li><span><a href="#Prerequisites" data-toc-modified-id="Prerequisites-2">Prerequisites</a></span></li>
<li><span><a href="#Imports" data-toc-modified-id="Imports-3">Imports</a></span></li>
<li><span><a href="#Data-preparation" data-toc-modified-id="Data-preparation-4">Data preparation</a></span></li>
<li><span><a href="#EntityRecognizer-model" data-toc-modified-id="EntityRecognizer-model-5">EntityRecognizer model</a></span></li>
<ul class="toc-item">
<li><span><a href="#Finding-optimum-learning-rate" data-toc-modified-id="Finding-optimum-learning-rate-5.1">Finding optimum learning rate</a></span>    
<li><span><a href="#Model-training" data-toc-modified-id="Model-training-5.2">Model training</a></span>
<li><span><a href="#Evaluate-model-performance" data-toc-modified-id="Evaluate-model-performance-5.3">Evaluate model performance</a></span>
<li><span><a href="#Validate-results" data-toc-modified-id="Validate-results-5.4">Validate results</a></span></li>
<li><span><a href="#Save-and-load-trained-models" data-toc-modified-id="Save-and-load-trained-models-5.5">Save and load trained models</a></span></li>
</ul>
<li><span><a href="#Model-inference" data-toc-modified-id="Model-inference-6">Model inference</a></span></li>
<li><span><a href="#Publishing-the-results-as-feature-layer" data-toc-modified-id="Publishing-the-results-as-feature-layer-7">Publishing the results as feature layer</a></span></li>
<li><span><a href="#Visualize-crime-incident-on-map" data-toc-modified-id="Visualize-crime-incident-on-map- 8">Visualize crime incident on map</a></span></li>
<li><span><a href="#Create-a-hot-spot-map-of-crime-densities" data-toc-modified-id="Create-a-hot-spot-map-of-crime-densities-9">Create a hot spot map of crime densities</a></span></li>
<li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-10">Conclusion</a></span></li>
<li><span><a href="#References" data-toc-modified-id="References-11">References</a></span></li>
</ul></div>

# Introduction

This notebook investigate creating a new named entity called `“Crime”` and our ability to generate police reports that are not readily available.

A police report writing handbook will be used to prompt a large language model (LLM) to generate police reports where the “Crime” entity will be used in proper domain context.

Generated reports will then be used to retrain a NER model to identify the “Crime” entity within legal documents. This will be valuable in sorting, categorizing, and exploring future investigation legal documents used in court cases.  


### Reading the Report Writing Manual
Link: https://www.csus.edu/campus-safety/police-department/_internal/_documents/rwm.pdf

# Necessary Imports

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%cd /content/drive/MyDrive/Colab\ Notebooks/spacy_local

/content/drive/MyDrive/Colab Notebooks/spacy_local


In [3]:
%ls -ltr

total 2958
-rw------- 1 root root   1678 Jun 18 20:35 base_config.cfg
-rw------- 1 root root 508947 Jul 10 04:23 spacy_prompt_training_v1_2.json
-rw------- 1 root root 269427 Jul 11 00:51 training_data_crime.spacy
-rw------- 1 root root 505381 Sep  5 01:27 spacy_prompt_training_v1_3.json
drwx------ 2 root root   4096 Sep  5 15:35 [0m[01;34mmodel-crime[0m/
-rw------- 1 root root 271093 Sep  5 15:37 training_data_crime_v1_3.spacy
-rw------- 1 root root   2817 Sep  5 15:38 config.cfg
drwx------ 2 root root   4096 Sep  5 15:40 [01;34mmodel-last[0m/
drwx------ 2 root root   4096 Sep  5 15:40 [01;34mmodel-best[0m/
-rw------- 1 root root 503862 Sep  7 06:42 spacy_prompt_training_v1_4.json
-rw------- 1 root root 265296 Sep  7 06:46 training_data_crime_v1_4.spacy
-rw------- 1 root root 684284 Sep  7 06:48 prompt_engineering.ipynb


In [4]:
import pandas as pd
import zipfile,unicodedata
# from itertools import repeat
from pathlib import Path
import re
import os
import datetime

# Data Rule Exploration

When we started, we found success when we started with the verb that described the crime. Ensure that the format is correct and then build context around that word. With a sentence on each side.

It was overwhelming to create one report that fit all the requirements. The goal is to provide context that would be found in a police report. Even if it was a small subset. That gets us into our NER faster.

The words are not perfect however get us started by providing context.

In [1]:
verbs_by_crime = {
    "Violent_Crimes": [
        'Murdered', 'Assaulted', 'Raped', 'Robbed', 'Kidnapped',
        'Beat', 'Stabbed', 'Shot', 'Strangled', 'Abused',
        'Threatened', 'Attacked', 'Mugged', 'Harmed', 'Injured'
        'Harassed', 'Stalked','Menaced', 'Abducted', 'Kidnapped',
        'Slashed', 'Choked'
    ],
    "Property_Crimes": [
        'Mugged', 'Robbed', 'Burglarized', 'Stolen', 'Vandalized',
        'Trespassed', 'Arsoned','Damaged', 'Graffitied', 'Embezzled',
        'Shoplifted', 'Sabotaged','Looted', 'Counterfeited', 'Pocketed',
        'Stripped', 'Intruded'
    ],
    "Financial_Crimes": [
        'Defrauded', 'Laundered', 'Scammed', 'Swindled', 'Falsified',
        'Concealed', 'Misappropriated', 'Extorted', 'Manipulated', 'Bribed',
        'Racketeered', 'Evaded', 'Cheated', 'Forged', 'Misused'
    ],
    "Cybercrimes": [
        'Hacked', 'Phished', 'Stole data', 'Cyberbullied', 'Spoofed',
        'Breached', 'Doxxed', 'Distributed malware', 'Hijacked accounts', 'Ransomware attack',
        'Cyberstalking', 'Distributed denial-of-service (DDoS) attack', 'Identity theft',
        'Cyber fraud', 'Password cracking'
    ],
    "Drug_Crimes": [
        'Possessed', 'Trafficked', 'Manufactured', 'Smuggled', 'Sold',
        'Cultivated', 'Distributed', 'Synthesized', 'Abused', 'Peddled',
        'Dealt', 'Transported', 'Prescribed illegally', 'Falsified prescriptions', 'Produced illicit substances'
    ],
    "White-Collar_Crimes": [
        'Committed fraud', 'Engaged in insider trading', 'Embezzled funds',
        'Bribed officials', 'Manipulated financial records', 'Conducted money laundering',
        'Engaged in corporate espionage', 'Defrauded investors', 'Orchestrated Ponzi schemes',
        'Engaged in tax evasion', 'Falsified documents', 'Misused funds',
        'Misrepresented financial statements', 'Engaged in kickbacks', 'Violated antitrust laws'
    ],
    "Sexual_Crimes": [
        'Raped', 'Assaulted sexually', 'Harassed', 'Molested', 'Exploited',
        'Fondled', 'Groped', 'Exhibited indecent exposure', 'Engaged in non-consensual acts', 'Coerced',
        'Victimized', 'Voyeurism', 'Produced child pornography', 'Solicited prostitution', 'Sextortion'
    ],
    "Public_Order_Crimes": [
        'Intoxicated publicly', 'Disturbed the peace', 'Engaged in disorderly conduct',
        'Loitered', 'Urinated in public', 'Engaged in public drunkenness', 'Panhandled', 'Engaged in public nudity',
        'Engaged in public lewdness', 'Engaged in street racing', 'Created public disturbances', 'Organized illegal gatherings',
        'Begged', 'Engaged in public gambling', 'Solicited in public'
    ],
    "Traffic_and_Motor_Vehicle_Crimes": [
        'Speeded', 'Drove under the influence (DUI)', 'Reckless driving', 'Hit and run', 'Drove without a license',
        'Texted while driving', 'Street racing', 'Violated traffic laws', 'Failed to yield', 'Ran red lights',
        'Distracted driving', 'Improper lane change', 'Driving with expired registration', 'Driving with suspended license',
        'Carjacked'
    ]
}


In [None]:
def simple_prompt(crime):
    return f"I give you '{crime}' can you put into a 3 sentences that would be found in a police report in a chorological order"

In [None]:
def prepare_crime_prompts(verbs_by_crime):
    total_prompts = {}
    for category_key, values in verbs_by_crime.items():
        print(category_key)
        prompts = [simple_prompt(x) for x in values]
        total_prompts.update({category_key:prompts})
    return total_prompts


prepare_crime_prompts(verbs_by_crime)

Violent_Crimes
Property_Crimes
Financial_Crimes
Cybercrimes
Drug_Crimes
White-Collar_Crimes
Sexual_Crimes
Public_Order_Crimes
Traffic_and_Motor_Vehicle_Crimes


{'Violent_Crimes': ["I give you 'Murdered' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Assaulted' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Raped' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Robbed' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Kidnapped' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Beat' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Stabbed' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Shot' can you put into a 3 sentences that would be found in a police report in a chorological order",
  "I give you 'Strangled' 

In [None]:
#TODO interact with open AI, Manually at the beginning

# I give you "Assaulted" can you put into a 3 sentences that would be found in a police report in a chorological order?

# I give you X can you put into a 3 sentences that would be found in a police report in a chorological order?
# We will start with 3 sentences that would pick up the context. Larger reports can be exsperimented with in the future.


In [7]:
!pip install spacy[transformers]
!python -m spacy download en_core_web_sm


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [8]:
import spacy

nlp = spacy.load("en_core_web_sm")
print(nlp.pipe_names)

text = "What video sharing service did Steve Chen, Chad Hurley, and Jawed Karim create in 2005?"
doc = nlp(text)
from spacy import displacy
displacy.render(doc, style="ent", jupyter=True)

  from .autonotebook import tqdm as notebook_tqdm


['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [9]:
!pip install spacy[transformers]


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
# Tutorial on retraining SpaCy model
#  https://newscatcherapi.com/blog/train-custom-named-entity-recognition-ner-model-with-spacy-v3
import json

with open('Corona2.json', 'r', encoding="utf-8") as f:
    data = json.load(f)

In [None]:
training_data = {'classes' : ['MEDICINE', "MEDICALCONDITION", "PATHOGEN"], 'annotations' : []}
for example in data['examples']:
    temp_dict = {}
    temp_dict['text'] = example['content']
    temp_dict['entities'] = []
    for annotation in example['annotations']:
        start = annotation['start']
        end = annotation['end']
        label = annotation['tag_name'].upper()
        temp_dict['entities'].append((start, end, label))
    training_data['annotations'].append(temp_dict)

In [None]:
training_data['annotations'][0]

{'text': "While bismuth compounds (Pepto-Bismol) decreased the number of bowel movements in those with travelers' diarrhea, they do not decrease the length of illness.[91] Anti-motility agents like loperamide are also effective at reducing the number of stools but not the duration of disease.[8] These agents should be used only if bloody diarrhea is not present.[92]\n\nDiosmectite, a natural aluminomagnesium silicate clay, is effective in alleviating symptoms of acute diarrhea in children,[93] and also has some effects in chronic functional diarrhea, radiation-induced diarrhea, and chemotherapy-induced diarrhea.[45] Another absorbent agent used for the treatment of mild diarrhea is kaopectate.\n\nRacecadotril an antisecretory medication may be used to treat diarrhea in children and adults.[86] It has better tolerability than loperamide, as it causes less constipation and flatulence.[94]",
 'entities': [(360, 371, 'MEDICINE'),
  (383, 408, 'MEDICINE'),
  (104, 112, 'MEDICALCONDITION'),


In [2]:
import json
with open('spacy_prompt_training_v1_4.json', 'r', encoding="utf-8") as f:
    crime_prompt_data = json.load(f)

In [3]:
training_data_crime = {'classes' : verbs_by_crime.keys(), 'annotations' : []}
for example in crime_prompt_data['examples']:
    temp_dict = {}
    temp_dict['text'] = example['text']
    temp_dict['entities'] = []
    for entity, annotations in example['entities'].items():
        # print(entity, annotations)
        for annotation in annotations:
    #     # start = annotation['start']
    #     # end = annotation['end']
            entity_annotation = annotation + [entity]
            temp_dict['entities'].append(tuple(entity_annotation))
            training_data_crime['annotations'].append(temp_dict)


In [4]:
training_data_crime['annotations'][16]

{'text': "On June 15, 2030, at approximately 1:00 PM, the victim, identified as Sarah White, reported being beaten by a male suspect matching the description of her ex-boyfriend. The victim stated that the suspect had kicked in the door of her residence before accosting her and savagely beating her.\n        Responding officers arrived at the scene within minutes of the report and observed the victim appearing to be in a state of shock and with visible signs of injury, including swelling and bruising of the face and arms. Officers immediately called for medical assistance and provided initial first aid until paramedics arrived to transport the victim to the hospital for further evaluation and treatment.\n        Crime scene investigators were dispatched to the location and conducted a thorough examination of the area, collecting potential evidence such as surveillance footage, witness statements, and the victim's personal belongings. The investigation into the beating is ongoing, with 

In [10]:
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm

nlp = spacy.blank("en") # load a new spacy model
doc_bin = DocBin() # create a DocBin object

In [None]:

from spacy.util import filter_spans

for training_example  in tqdm(training_data['annotations']):
    text = training_example['text']
    labels = training_example['entities']
    doc = nlp.make_doc(text)
    ents = []
    for start, end, label in labels:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    filtered_ents = filter_spans(ents)
    doc.ents = filtered_ents
    doc_bin.add(doc)

doc_bin.to_disk("training_data.spacy") # save the docbin object

100%|██████████| 31/31 [00:00<00:00, 276.59it/s]


Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity


In [None]:

from spacy.util import filter_spans

for training_example  in tqdm(training_data_crime['annotations']):
    text = training_example['text']
    labels = training_example['entities']
    doc = nlp.make_doc(text)
    ents = []
    for start, end, label in labels:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    filtered_ents = filter_spans(ents)
    doc.ents = filtered_ents
    doc_bin.add(doc)

doc_bin.to_disk("training_data_crime_v1_4.spacy") # save the docbin object

100%|██████████| 355/355 [00:00<00:00, 1437.01it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity





In [None]:
# Create a config
# https://spacy.io/usage/training#quickstart

In [13]:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
!python -m spacy init fill-config base_config.cfg config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


In [17]:
import time
# This took a really long time and needs to be solved

start = time.time()
# !python -m spacy train config.cfg --output ./ --paths.train ./training_data.spacy --paths.dev ./training_data.spacy
# !python -m spacy train config.cfg --output ./ --paths.train ./training_data.spacy --paths.dev ./training_data.spacy --gpu-id 0
!python -m spacy train config.cfg --output ./ --paths.train ./training_data_crime_v1_4.spacy --paths.dev ./training_data_crime_v1_4.spacy --gpu-id 0
end = time.time()

[38;5;4mℹ Saving to output directory: .[0m
[38;5;4mℹ Using GPU: 0[0m
[1m
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be 

In [15]:
print('How long it takes to train a spaCy NER model')
print(end - start)


How long it takes to train a spaCy NER model
3538.2728683948517


In [20]:
nlp_ner = spacy.load("models/models_crime_v4_1/model-best")

doc = nlp_ner("Antiretroviral therapy (ART) is recommended for all HIV-infected\
individuals to reduce the risk of disease progression.\nART also is recommended \
for HIV-infected individuals for the prevention of transmission of HIV.\nPatients \
starting ART should be willing and able to commit to treatment and understand the\
benefits and risks of therapy and the importance of adherence. Patients may choose\
to postpone therapy, and providers, on a case-by-case basis, may elect to defer\
therapy on the basis of clinical and/or psychosocial factors.")

colors = {"PATHOGEN": "#F67DE3", "MEDICINE": "#7DF6D9", "MEDICALCONDITION":"#FFFFFF"}
options = {"colors": colors}

spacy.displacy.render(doc, style="ent", options= options, jupyter=True)



In [21]:
doc = nlp_ner("On May 17, 2023, at approximately 8:00 PM, police responded to a report of fraud involving the victim, identified as Jane Doe. According to the victim, she was defrauded of a large sum of money after responding to an advertisement for an investment opportunity. The victim stated that she communicated with the suspect over the phone, providing them with her bank details to process the transaction.\n        Responding officers quickly arrived at the scene and provided assistance while investigators tracked the suspect's digital footprint in order to gain more information. It was discovered that the suspect had used a fraudulent bank account to process the victim's transaction and had since closed it before their identity could be identified.\n        Crime scene investigators were dispatched to the victim's home for further investigation and collected potential evidence such as the victim's bank statements and transaction receipts. The investigation into the fraud is ongoing, with efforts focused on locating the suspect and determining the extent of their crime.")

colors = {"Violet_Crimes": "#F67DE3", "Murdered": "#7DF6D9"}
options = {"colors": colors}

spacy.displacy.render(doc, style="ent", options= options, jupyter=True)

In [22]:
crime_prompt_data

{'examples': [{'text': "On July 12, 2022, at around 4:00 AM, the victim, identified as Jane Smith, reported being murdered by an unknown person in an isolated area near the outskirts of town. The victim's body was discovered by a jogger who immediately alerted authorities and provided a partial description of the suspect. \n\nResponding officers arrived shortly after the report and established a perimeter to search for the culprit, as well as any potential evidence related to the murder. Officers observed that the victim had suffered a severe laceration to the neck and were unable to establish a cause of death until the coroner reported an extensive investigation.\n\nCrime scene investigators were dispatched to the area and conducted an extensive examination of the area and collected potential evidence such as photographs, surveillance footage, medical records, and witness statements. The investigation into the murder is ongoing, with efforts focused on finding and apprehending the sus

In [23]:
def show_crime_prompt(response, crime):
    doc = nlp_ner(response)
    colors = {crime: "#F67DE3"}
    options = {"colors": colors}
    spacy.displacy.render(doc, style="ent", options= options, jupyter=True)

### Violent Crimes

In [24]:
# violent_crime_response1 = crime_prompt_data['examples'][0]['text']
# show_crime_prompt(violent_crime_response1, 'Violent_Crimes')

for i in range(5):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Violent_Crimes')

In [25]:
doc = nlp_ner("On May 17, 2023, at approximately 8:00 PM, Jane Doe was reported murdered by her husband in her apartment.")

colors = {"Violet_Crimes": "#F67DE3", "Murdered": "#7DF6D9"}
options = {"colors": colors}

spacy.displacy.render(doc, style="ent", options= options, jupyter=True)

### Property Crimes

In [26]:
for i in range(64, 69):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Property_Crimes')

### Financial Crimes

In [27]:
for i in range(118, 123):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Financial_Crimes')

### Cybercrimes

In [28]:
for i in range(165, 170):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Cybercrimes')

### Drug Crimes

In [29]:
for i in range(224, 229):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Drug_Crimes')

### White-Collar Crimes

In [30]:
for i in range(282, 287):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'White-Collar_Crimes')

### Sexual Crimes

In [31]:
for i in range(300, 305):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Sexual_Crimes')

### Public Order Crimes

In [32]:
for i in range(372, 377):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Public_Order_Crimes')

### Traffic and Motor Vehicle Crimes

In [33]:
for i in range(390, 395):
    crime_response = crime_prompt_data['examples'][i]['text']
    show_crime_prompt(crime_response, 'Traffic_and_Motor_Vehicle_Crimes')

In [None]:
# print([(w.text, w.pos_) for w in doc])

In [None]:
## These are notes around police report rules to expand upon the three sentences
intro = '''

You are showing me how to fill out a police report. I would like you to generate random information for each section.
Example
A police reports have sections of a Cover Sheet

'''

In [None]:
cover_sheet = '''

Cover Sheet Information:
Offense: Enter the applicable numerical code section and source for the crime being reported.   If multiple crimes are being charged, this field shall contain the most serious offense.
Report Number:  Enter the a report number, preceded by the two digit year.
In Custody Checkbox:  Check this box if the case involves an in‐custody arrest.
Cite & Release Checkbox:  Check this box if the case involves a cite and release.
Warrant Request Checkbox.  Check this box if the case is a warrant request.
Date/Time of Offense: Generate a day and time at least 10 days before today’s date.
Date/Time of Arrest:  Generate a day and time after.
Victim #1.  Enter the last name, first name, and middle name of the primary victim.
Victim #2.   Enter the last name, first name, and middle name of the secondary victim.  If there is no secondary victim, leave blank.
Suspect: Enter the last name, first name, and middle name of the suspect.
Age:  Enter the age of the suspect
Charge: Enter all charges and source for the crime or crimes being reported.  Enter one charge per line.
Submitting Officer.  Enter the first initial, last name, and badge number of the submitting officer.
Detail/Patrol:  Enter Patrol number
Phone: Enter the ten-digit department telephone number
Reviewing Officer:  Enter the first initial, last name, and badge number of the reviewing officer. Date/Time Submitted:  Enter the date and time the report was reviewed by the reviewing officer.

'''

In [None]:
all_reports = '''
An effective police report is always:
1. Factual.  A police report is an objective accounting of the relevant and observed facts of the
case, and any conclusions made by the reporting officer must be supported by articulated
and documented facts. Unsubstantiated opinions or conclusions are never to be included
in an effective report.

2. Accurate: Accuracy is achieved by carefully, precisely, and honestly reporting of all relevant information.

3. Clear: A police report speaks for the reporting officer when he or she is not present. There
should be no doubt or confusion regarding what happened during an incident or crime,
based upon the content of a police report.  Clarity in report writing is achieved by clear and
logical organization of information, the judicious use of simple, common, and first person
language, and effective writing mechanics.

4. Concise.  Reports should be brief but also contain all relevant information necessary for a
complete understanding of the crime or incident, without additional explanation.   Brevity
should never take precedence over accuracy, completeness, or clarity in report writing.

5. Complete. A complete report will contain all the relevant facts, information, and details that
the reader will need to have in order to have a comprehensive understanding of the crime
or incident described in the report.  The report is complete when it is a complete word
picture of the incident, there are no questions left unanswered by the reader, officer actions
are explained and justified by the contents of the report, and both supporting and
conflicting information is included.

6. Timely. No decisions can be made or actions taken regarding an arrest or request for follow
up investigation if a report is not submitted in a timely fashion.

'''

In [None]:
all_reports

'\nAn effective police report is always:\n1. Factual.  A police report is an objective accounting of the relevant and observed facts of the\ncase, and any conclusions made by the reporting officer must be supported by articulated\nand documented facts. Unsubstantiated opinions or conclusions are never to be included\nin an effective report.\n\n2. Accurate: Accuracy is achieved by carefully, precisely, and honestly reporting of all relevant information.\n\n3. Clear: A police report speaks for the reporting officer when he or she is not present. There\nshould be no doubt or confusion regarding what happened during an incident or crime,\nbased upon the content of a police report.  Clarity in report writing is achieved by clear and\nlogical organization of information, the judicious use of simple, common, and first person\nlanguage, and effective writing mechanics.\n\n4. Concise.  Reports should be brief but also contain all relevant information necessary for a\ncomplete understanding of t

In [None]:
# Examples of Poorly Organized vs Well Organized
example_poor_vs_well_organized = """
Poorly Organized Example 1:
‘’’
When we arrived, the husband let us into the house. We were responding to a 9‐1‐1 call. My partner and I had been dispatched to an incident of domestic violence. A woman called for help to keep her husband from beating her.
“”
Well Organized Example 1:
‘’’
My partner and I were dispatched to a domestic violence incident after a woman dialed 9‐1‐1. The woman called for help because she was afraid her husband would beat her. When we arrived, the husband let us into the house.
‘’’
Poorly Organized Example 2:
‘’’
Marie Parker said her husband refused to answer the door at first when he heard the man on the other side begin to shout. I took her statement approximately 45 minutes after the assault took place. She was sitting in the family room when her husband went to see who was at the door.
“”
Well Organized Example 2:
‘’’
I took Marie Parker’s statement approximately 45 minutes after the assault took place. Parker said she was sitting in the family room when her husband went to see who was at the door. Initially her husband refused to answer the door when he heard the man on the other side begin to shout.
‘’’
"""

In [None]:
# example_poor_vs_well_organized

In [None]:
example_abstract_vs_contrete_lang = """Reports should be written using simple, common, and concrete language whenever possible.
The use of simple language can help keep reports concise and brief, and addresses relevant information quickly and clearly.
The following table presents examples of abstract words and phrases, along with more concrete alternatives.
Here are examples of the abstract words and the concrete version:

Abstract Words Example 1: A number of …
Concrete Words Example 1: Seven …
Abstract Words Example 2: At a high rate of speed…
Concrete Words Example 2: 75 MPH…
Abstract Words Example 3: Appeared intoxicated…
Concrete Words Example 3: Breath smelled of an alcoholic beverage…
Abstract Words Example 4: Hostile behavior…
Concrete Words Example 4: Repeatedly struck at officers…
Abstract Words Example 5: Physical confrontation…
Concrete Words Example 5: Fight…
Abstract Words Example 6: Verbal altercation…
Concrete Words Example 6: Argument…
Abstract Words Example 7: Extensive record…
Concrete Words Example 7: Six DUI offenses over two years…
Abstract Words Example 8: Employed…
Concrete Words Example 8: Used…
Abstract Words Example 9: Dispute…
Concrete Words Example 9: Argument…
Abstract Words Example 10: Inquired…
Concrete Words Example 10: Asked…
Abstract Words Example 11: In the vicinity of…
Concrete Words Example 11: Near…
Abstract Words Example 12: Articulated…
Concrete Words Example 12: Said, told…
Abstract Words Example 13: Employed…
Concrete Words Example 13: Said, told…
Abstract Words Example 14: Employed…
Concrete Words Example 14: Said, told…
"""

In [None]:
casualty_medical_aid_report_rules = '''
While casualty reports are typically nothing more than an incident report, their importance cannot be underestimated.
The potential for civil liability from incidents in which an involved party is injured can be quite high,
depending upon the circumstances.As a result, the need to carefully document the incident is of an utmost necessity. T
he following are elements that need to be addressed in a medical aid or casualty report.

1. Describe the scene.  Be as thorough as possible, and include any broken concrete, improper lighting,
incorrect signage, or other conditions observed.

2. Establish the timeframe of the incident. This information is critical to impeach and rehabilitate the statements of
involved parties.
3. Take a complete statement from all parties involved.Include statements detailing the victim’s injuries, and be sure to
speak with the victim. Be as complete and thorough as possible, and if something doesn’t make sense, get clarification
immediately, because it may be the only time the party is contacted.

4. Get complete contact information for all parties.Be sure to get alternate telephone numbers and email addresses,
whenever possible.

5. Canvass the area for possible witnesses.Don’t hesitate to knock on doors, if necessary.

6. Describe any injuries or other preexisting medical conditions described by involved parties.
A thorough description contemporaneous to the incident will prevent possible statement changes later.

7. Take photographs of the scene, and of all involved parties. Once again, a picture is worth a thousand words.

8. Determine if there is video of the incident. If there is video, obtain a copy, and book it as evidence.

9. Get medical release statements, if necessary.Having access to medical records from the outset can sometimes
prevent excessive claims at a later time.

10. Document the fire and medical units on scene. If the involved party refuses medical aid, document the reason.

11. Obtain the hospital information, if the involved party is transported.  Be sure to include this information in the report.
'''

theft_burglary_property_crime_report_rules = '''
Theft, burglary and other property crime reports should answer questions regarding modus operandi,
points of entry, items taken, timeframe, and evidentiary information in order to enable investigators
to link specific incidents together.

The following are elements that should be addressed by an effective property crime report.

1. Describe the scene.  Always describe the scene as it was when the victim discovered the crime,
and also how the scene appeared when you arrived.

2. Establish what crime occurred. Articulate all elements of the crime in the report.

3. Establish the timeframe of the crime. This information is critical to impeach and rehabilitate the statements of
suspects and victims.

4. Take a complete statement from all parties involved.  Be as complete and thorough as possible, and if something
doesn’t make sense, get clarification immediately, because it may be the only time the party is contacted.

5. Get complete contact information for all parties.  Be sure to get alternate telephone numbers and email addresses,
whenever possible.  Don’t list a stolen, lost or missing telephone as the only contact information in the case.

6. Thoroughly describe the property taken, damaged, or missing.  Be as thorough as possible, and
follow up with the victim or responsible if necessary to obtain the information.
Be sure to include the color, make, model, value, and serial number of items, where available.
Also describe any owner applied markings, if applicable. If the item is a cellular telephone, obtain the MEID/IMEI numbers,
if possible.

7. Canvass the area for witnesses.  A witness can provide suspect information, or help confirm the timeframe.

8. Look for cameras, and obtain any video surveillance. Determine if there is any video surveillance in the area,
and document it in the report.  Obtain copies, if possible, of the video surveillance for the timeframe of the crime,
and book as evidence. If the surveillance is only of entrances and exits, obtain it anyway,

9. Describe the point of entry, point of exit, and mode of theft, if possible.  Criminals are creatures of habit,
and will typically use the same methods to commit certain types of crimes. 10. Ask the victim if any other people
had access or permission to take their property. This can give a starting point, and also may help narrow the timeframe
of the crime.

11. Photograph the scene, and ask the victim if they have any pictures of their property. A picture is worth a
thousand words, every time.

12. Look for, obtain, and book all evidence, or perceived evidence.  Look for the ninja rocks around a vehicle
burglary with a window smash, or look for the cut cable lock in the bushes. Don’t forget to try to lift latent
fingerprints, regardless of the value of the stolen property.
All it takes is one print to make a case.

13. Talk to the victim about future crime prevention techniques, if necessary. Mention LoJack for computers,
engraving, and registration of bicycles, not leaving property unattended…an ounce of prevention is worth a pound of cure.

'''

sexual_assalt_domestic_battery_crime_report_rules = '''
Sexual assaults, domestic violence, battery, and other crimes against persons are some of the most serious
crimes to which officers respond. The following are elements that should be addressed by an effective report
of a sexual assault, domestic violence, battery, or other crime against persons.

1. Describe the scene.  Always describe the scene as it was when the victim discovered the crime, and also
how the scene appeared when you arrived. Include distances, locations of parties, lighting conditions…anything
that may be considered relevant to the incident.

2. Establish the timeframe of the crime.  This information is critical to impeach and rehabilitate the statements
of suspects and victims.

3. Take a complete statement from all parties involved.  Be as complete and thorough as possible, and if something
doesn’t make sense, get clarification immediately, because it may be the only time the party is contacted.

4. Get complete contact information for all parties.Be sure to get alternate telephone numbers and email addresses,
whenever possible.  Don’t list a stolen, lost or missing telephone as the only contact information in the case.

5. Establish the relationships between all parties involved.  Doing so is important, because it may establish
specific crimes, motivations, and circumstances involved in the incident.

6. Establish what crimes occurred.  Doing so establishes probable cause for arrest.  Always ensure all elements
of the crime are clearly articulated.

7. Document any injuries. Take photographs, and obtain follow up photographs, if necessary. Be sure to obtain a
medical release waiver, wherever possible.  If medical transport is necessary, document the hospital.

8. Collect any clothing and bedding involved, and book the items as evidence. Photograph the items before booking.

9. Document all alcohol and drug involvement by all parties. Include the amounts, types of drugs, and frequency
of ingestion during the incident, and determine past alcohol and drug usage history.  Also determine if any of
the parties have used alcohol or drugs together before. Be sure to document the approximate intoxication level
of all involved parties, where possible.

10. Canvass the area for witnesses.  Check other rooms, or other businesses nearby.

11. Determine if there is video surveillance.  If so, obtain copies and book into evidence immediately.
If the video surveillance is only of the entrance or exit of a building, obtain a copy anyway, even if the crime
isn’t visible on the video.

12. Consider a pretext telephone call in all sexual assault cases. Attempt to do so prior to contacting the
suspect in the case.   Be sure to contact Rob Gold, Supervising DDA for the DA’s office SACA unit for permission prior
to conducting the call.  Gold can be reached at 916‐956-0866 (cellular), 916-874-6543(work) or 916-451-2452(home).

13. Offer confidentiality to the victim, and offer an advocate, if applicable.  Never forget that victims of sexual
assault and other crimes are eligible for confidentiality, and have the right to an advocate.

14. If the crime involves sexual assault, encourage the victim to undergo an evidentiary exam. Be sure to adequately
explain the purpose of the exam, and allow the victim to make the decision.

15. Record interviews, whenever possible.  Recording interviews is up to the individual officer, but recording interviews
ties a victim, suspect or witness to a specific statement, and limits later redactions or retractions of statements.

16. Consider the possible defenses that can be used by the suspect. When a possible defense is noted, try to rule out
the defense though physical evidence, or follow up questioning.

17. Do not jump to conclusions regarding the truthfulness of the victim, suspect or witness. Doing so will bias the
initial investigation.  Always assume that the crime happened,unless there is strong evidence that indicates otherwise.",

'''

## ALOT OF INFORMATION

and rules! As there should be however this is our first version so lets start small then expand. Get a report that can be reproduced in a systematic way.

There are themes between them all:

1.) Describe the scene: Describe the scene as it was when the victim discovered the crime, and also
how the scene appeared when you arrived. Include distances, locations of parties, lighting conditions…anything
that may be considered relevant to the incident.



In [None]:
# verbs_by_crime

In [None]:
# casualty_medical_aid_reports_rules

location_input = 'Boston'
date_and_time_input = 'Saturday, May 20, 2023'

In [None]:
intro = ''' You are a police report writer education tool, you need to generate data for each of the bullet points into
concise description of the crime scence
'''
describe_the_scene = '''
It is important to always describe the scene as it was:
- when the victim discovered the crime
- how the scene appeared when you arrived.
'''

# Provide the address and any specific details about the location, such as the type of building or area.
# Date and Time: Note the date and time when the crime scene was discovered or reported.

Narrative_Text = f'''
For example purposes and generate a believeable crime location in {location_input} and {date_and_time_input}
according to these points:
- List the full address. Example {location_input} 123 Main St, Boston on the Southwest corner
- Please describe the location
- Provide the address and any specific details about the location
- Type of building or area for a location
- Provide when the crime occured
- Provide when the police was call
- When police arrived to the scene

Please summarize in a narrative text with chronological order

'''



# Initial Observations: Describe the overall appearance of the crime scene, including any signs of forced entry, disturbance, or unusual conditions.
# Physical Evidence: Identify and document any physical evidence present at the scene, such as weapons, fingerprints, footprints, bloodstains, or any other relevant items.
# Body Position and Condition: If there is a victim, describe their position, injuries, or any visible signs of trauma.
# Surroundings: Note the immediate surroundings of the crime scene, including nearby objects, furniture, or any other relevant details.
# Points of Entry or Exit: Document any possible entry or exit points that might be related to the crime.
# Environmental Factors: Take note of lighting conditions, temperature, weather conditions, or any other environmental factors that may be relevant to the investigation.
# Witnesses: Record the names and contact information of any witnesses present at the scene.
# Law Enforcement Actions: Describe any actions taken by law enforcement officers, such as evidence collection, interviews, or securing the area.



In [None]:
'''

1. Describe the scene.  Always describe the scene as it was when the victim discovered the crime, and also
how the scene appeared when you arrived. Include distances, locations of parties, lighting conditions…anything
that may be considered relevant to the incident.


2. Establish the timeframe of the crime.  This information is critical to impeach and rehabilitate the statements
of suspects and victims.

3. Take a complete statement from all parties involved.  Be as complete and thorough as possible, and if something
doesn’t make sense, get clarification immediately, because it may be the only time the party is contacted.

4. Get complete contact information for all parties.Be sure to get alternate telephone numbers and email addresses,
whenever possible.  Don’t list a stolen, lost or missing telephone as the only contact information in the case.

5. Establish the relationships between all parties involved.  Doing so is important, because it may establish
specific crimes, motivations, and circumstances involved in the incident.

6. Establish what crimes occurred.  Doing so establishes probable cause for arrest.  Always ensure all elements
of the crime are clearly articulated.

7. Document any injuries. Take photographs, and obtain follow up photographs, if necessary. Be sure to obtain a
medical release waiver, wherever possible.  If medical transport is necessary, document the hospital.

8. Collect any clothing and bedding involved, and book the items as evidence. Photograph the items before booking.

9. Document all alcohol and drug involvement by all parties. Include the amounts, types of drugs, and frequency
of ingestion during the incident, and determine past alcohol and drug usage history.  Also determine if any of
the parties have used alcohol or drugs together before. Be sure to document the approximate intoxication level
of all involved parties, where possible.

10. Canvass the area for witnesses.  Check other rooms, or other businesses nearby.

11. Determine if there is video surveillance.  If so, obtain copies and book into evidence immediately.
If the video surveillance is only of the entrance or exit of a building, obtain a copy anyway, even if the crime
isn’t visible on the video.

12. Consider a pretext telephone call in all sexual assault cases. Attempt to do so prior to contacting the
suspect in the case.   Be sure to contact Rob Gold, Supervising DDA for the DA’s office SACA unit for permission prior
to conducting the call.  Gold can be reached at 916‐956-0866 (cellular), 916-874-6543(work) or 916-451-2452(home).

13. Offer confidentiality to the victim, and offer an advocate, if applicable.  Never forget that victims of sexual
assault and other crimes are eligible for confidentiality, and have the right to an advocate.

14. If the crime involves sexual assault, encourage the victim to undergo an evidentiary exam. Be sure to adequately
explain the purpose of the exam, and allow the victim to make the decision.

15. Record interviews, whenever possible.  Recording interviews is up to the individual officer, but recording interviews
ties a victim, suspect or witness to a specific statement, and limits later redactions or retractions of statements.

16. Consider the possible defenses that can be used by the suspect. When a possible defense is noted, try to rule out
the defense though physical evidence, or follow up questioning.

17. Do not jump to conclusions regarding the truthfulness of the victim, suspect or witness. Doing so will bias the
initial investigation.  Always assume that the crime happened,unless there is strong evidence that indicates otherwise.",

'''

'\n\n1. Describe the scene.\xa0\xa0Always describe the scene as it was when the victim discovered the crime, and also \nhow the scene appeared when you arrived.\xa0Include distances, locations of parties, lighting conditions…anything \nthat may be considered relevant to the incident. \n\n\n2. Establish the timeframe of the crime.\xa0\xa0This information is critical to impeach and rehabilitate the statements\nof suspects and victims. \n\n3. Take a complete statement from all parties involved.\xa0\xa0Be as complete and thorough as possible, and if something\ndoesn’t make sense, get clarification immediately, because it may be the only time the party is contacted.\n\n4. Get complete contact information for all parties.Be sure to get alternate telephone numbers and email addresses, \nwhenever possible.\xa0\xa0Don’t list a stolen, lost or missing telephone as the only contact information in the case. \n\n5. Establish the relationships between all parties involved.\xa0\xa0Doing so is importa