Prerequisites
- Have the `flair` Python package installed

## Imports

In [3]:
from flair.data import Sentence
from flair.nn import Classifier
import pandas as pd

## Load data

In [8]:
# Read the full dataframe where the data was partially cleaned
df = pd.read_csv("cleaned_police_reports.csv")
df.head()

Unnamed: 0,name,department,url,text
0,"Andrew Allen, badge #37",Minneapolis Police Department,https://d3n8a8pro7vhmx.cloudfront.net/cuapb/pa...,Home Legislative File 2021-01132 RCA Legal s...
1,"Guled Abdullahi, badge #706",Hennepin County Sheriff's Department,https://assets.nationbuilder.com/cuapb/pages/1...,"Hennepin County 300 South Sixth Street, Minne..."
2,"Dean V. Albers, badge #None",Goodhue County Sheriff's Department,https://d3n8a8pro7vhmx.cloudfront.net/cuapb/pa...,"2/22/2021 Jenson v. Craft, Civil No. 01-1488(D..."
3,"Scott Aikins, badge #22",Minneapolis Police Department,https://d3n8a8pro7vhmx.cloudfront.net/cuapb/pa...,"2/22/2021 United States v. Diriye, Case No. 14..."
4,"Matthew Aish, badge #None",Columbia Heights Police Department,https://d3n8a8pro7vhmx.cloudfront.net/cuapb/pa...,Too_Long1 Arbitration LELS (Mathew Aish)/...


In [9]:
# Read the small dataframe of manually cleaned articles
small_df = pd.read_csv("manually_cleaned_police_reports_small.csv")
small_df.head()

Unnamed: 0,name,department,url,text
0,"Jeffrey Pennaz, badge #5551",Department:Minneapolis Police Department,https://assets.nationbuilder.com/cuapb/pages/1...,Witnesses say the stop happened around 8:30 p....
1,"Kurt Radke, badge #5882",Department:Minneapolis Police Department,https://assets.nationbuilder.com/cuapb/pages/1...,Home Legislative File 2022-00241 RCA Legal S...
2,"Craig A. Taylor, badge #7139",Department:Minneapolis Police Department,https://assets.nationbuilder.com/cuapb/pages/1...,Home Legislative File 2022-00233 RCA Legal S...
3,"Cory Taylor, badge #7141",Department:Minneapolis Police Department,https://assets.nationbuilder.com/cuapb/pages/1...,Home Legislative File 2022-00240 RCA Legal S...
4,"Joseph Will, badge #7749",Department:Minneapolis Police Department,https://assets.nationbuilder.com/cuapb/pages/1...,Home Legislative File 2022-00230 RCA Legal S...


## Test Flair model on simple sentences

In [3]:
text = "Jane Smith lives on 123 Redmond Ave. NE, Redmond, WA, USA"
sentence = Sentence(text) # make a Sentence object from the text

# load the NER tagger
tagger = Classifier.load('ner-large')

# run NER over sentence
tagger.predict(sentence)

# print the sentence with all annotations
print(sentence)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


2025-10-23 22:48:44,444 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
Sentence[15]: "Jane Smith lives on 123 Redmond Ave. NE, Redmond, WA, USA" → ["Jane Smith"/PER, "Redmond Ave"/LOC, "NE"/LOC, "Redmond"/LOC, "WA"/LOC, "USA"/LOC]


Correct answer: "123 Redmond Ave. NE, Redmond, WA, USA" is a location.

In [12]:
sentence = Sentence(text)
tagger_small = Classifier.load("ner") # try small model

# run NER over sentence
tagger_small.predict(sentence)

# print the sentence with all annotations
print(sentence)

2025-10-23 22:57:36,797 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
Sentence[15]: "Jane Smith lives on 123 Redmond Ave. NE, Redmond, WA, USA" → ["Jane Smith"/PER, "Redmond Ave"/LOC, "NE"/LOC, "Redmond"/LOC, "WA"/LOC, "USA"/LOC]


In [4]:
for token in sentence:
    print(token)

Token[0]: "Jane"
Token[1]: "Smith"
Token[2]: "lives"
Token[3]: "on"
Token[4]: "123"
Token[5]: "Redmond"
Token[6]: "Ave"
Token[7]: "."
Token[8]: "NE"
Token[9]: ","
Token[10]: "Redmond"
Token[11]: ","
Token[12]: "WA"
Token[13]: ","
Token[14]: "USA"


This model performs better than the Spacy ones because it can detect street names as a location. The Spacy ones could only detect city, state, and country. However, the Flair models don't tag the address number as a location, and they detect all the address components as separate locations instead of one location.

## Test Flair model on small dataframe

In [10]:
for t in small_df["text"]:
    print(f"\n{t}")


Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."

Home Legislative File 2022-00241 RCA Legal Selement: Workers' Compensaon claim of Kurt Radke (RCA-2022-00194) ORIGINATING DEPARTMENT Finance & Property Services To Commiee(s) # Commiee Name Meeng Date 1 Policy & Government Oversight Commiee Mar 7, 2022 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Acon Item(s) # File Type Subcategory Item Descripon 1 Acon Selement Approving the workers' compensaon claim of Kurt Radke by payment of $160,000 over three years to Kurt Radke and aorney, Meuser Law Firm, and authorizing the City Aorney's Oﬃce to execute any documents necessary to eﬀectuate the selement. Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related in

We will try the first and last texts in `small_df` again.

In [13]:
# Use the small model to predict the first text
text = small_df.loc[0, "text"] # first text
sentence = Sentence(text)

tagger_small.predict(sentence)
print(sentence)

Sentence[39]: "Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."" → ["Nicollet Ave"/LOC, "Minneapolis"/LOC, "MN"/LOC]


In [15]:
# Use the transformer to predict the first text
sentence = Sentence(text)
tagger.predict(sentence)
print(sentence)

Sentence[39]: "Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."" → ["Nicollet Ave"/LOC, "Minneapolis"/LOC, "MN"/LOC]


Same benefits and problems as before. Also, no other entities are detected besides location. Also, the transformer and the small model seem to be equally accurate.

In [16]:
text = small_df.loc[len(small_df) - 1, "text"] # last text
sentence = Sentence(text)
tagger.predict(sentence)
print(sentence)

Sentence[43]: "According to the complaint, the incident began at the intersection of W 7th St & Kellogg Blvd, St. Paul, MN. Bystanders reported raised voices and a brief scuffle before the officer escorted the resident to a patrol car." → ["W 7th St"/LOC, "Kellogg Blvd"/LOC, "St. Paul"/LOC]


Cannot identify an intersection as one location. Cannot identify "MN" as a location. 

In [17]:
tagger_small.predict(sentence)
print(sentence)

Sentence[43]: "According to the complaint, the incident began at the intersection of W 7th St & Kellogg Blvd, St. Paul, MN. Bystanders reported raised voices and a brief scuffle before the officer escorted the resident to a patrol car." → ["Kellogg Blvd"/LOC, "St. Paul"/LOC, "MN"/LOC]


The small model DOES identify MN as a location, but not "W 7th St".

In [18]:
small_df.loc[1, "text"]

"Home \uf105Legislative File 2022-00241 \uf105RCA Legal Se\x01lement: Workers' Compensa\x02on claim of Kurt Radke (RCA-2022-00194) ORIGINATING DEPARTMENT Finance & Property Services To Commi\x01ee(s) # Commi\x01ee Name Mee\x02ng Date 1 Policy & Government Oversight Commi\x01ee Mar 7, 2022 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Ac\x02on Item(s) # File Type Subcategory Item Descrip\x02on 1 Ac\x02on Se\x01lement Approving the workers' compensa\x02on claim of Kurt Radke by payment of $160,000 over three years to Kurt Radke and a\x01orney, Meuser Law Firm, and authorizing the City A\x01orney's Oﬃce to execute any documents necessary to eﬀectuate the se\x01lement. Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related injuries. The par\x02es reached a tenta\x02ve se\x01lement by payment of $160,000 over three years from fun 06930-1450100-789401-145400. Risk Management believes this

In [19]:
# Try one of the workers' compensation claims. 
text = small_df.loc[2, "text"]
sentence = Sentence(text)
tagger.predict(sentence)
print(sentence)

Sentence[265]: "Home Legislative File 2022-00233 RCA Legal Selement: Workers' Compensaon claim of Craig Taylor (RCA-2022-00195) ORIGINATING DEPARTMENT Finance & Property Services To Commiee(s) # Commiee Name Meeng Date 1 Policy & Government Oversight Commiee Mar 7, 2022 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Acon Item(s) # File Type Subcategory Item Descripon 1 Acon Selement Approving the workers' compensaon claim of Craig Taylor, by payment of $175,000 over three years to Craig Taylor and aorney, Meuser Law Firm, and authorizing the City Aorney's Oﬃce to execute any documents necessary to eﬀectuate the selement. Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related injuries. The pares reached a tentave selement by payment fo $175,000 over three years from fund 06930-1450100-789401-145400. Risk Management believes this selement is in the best interests of

In [20]:
tagger_small.predict(sentence)
print(sentence)

Sentence[265]: "Home Legislative File 2022-00233 RCA Legal Selement: Workers' Compensaon claim of Craig Taylor (RCA-2022-00195) ORIGINATING DEPARTMENT Finance & Property Services To Commiee(s) # Commiee Name Meeng Date 1 Policy & Government Oversight Commiee Mar 7, 2022 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Acon Item(s) # File Type Subcategory Item Descripon 1 Acon Selement Approving the workers' compensaon claim of Craig Taylor, by payment of $175,000 over three years to Craig Taylor and aorney, Meuser Law Firm, and authorizing the City Aorney's Oﬃce to execute any documents necessary to eﬀectuate the selement. Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related injuries. The pares reached a tentave selement by payment fo $175,000 over three years from fund 06930-1450100-789401-145400. Risk Management believes this selement is in the best interests of

These models handle the weird characters better than Spacy. For the last text I tried, the transformer performs better.

## Try a HuggingFace model for detecting locations

This is a Flair model on the HuggingFace hub that was fine-tuned specifically for detecting locations. Link: https://huggingface.co/Saisam/Inquirer_ner_loc.

In [1]:
from flair.models import SequenceTagger
loc_tagger = SequenceTagger.load("Saisam/Inquirer_ner_loc")

  from .autonotebook import tqdm as notebook_tqdm


2025-10-28 16:03:00,952 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


### Try simple sentences

In [4]:
text = "Jane Smith lives on 123 Redmond Ave. NE, Redmond, WA, USA"
sentence = Sentence(text) # make a Sentence object from the text

loc_tagger.predict(sentence)
print(sentence.to_tagged_string())

Sentence[15]: "Jane Smith lives on 123 Redmond Ave. NE, Redmond, WA, USA" → ["123 Redmond Ave. NE, Redmond, WA, USA"/nk>]


In [24]:
# iterate over predictions
for label in sentence.get_labels():
    print(label)

Span[4:15]: "123 Redmond Ave. NE, Redmond, WA, USA" → nk> (0.9706)


In [5]:
sentence.get_spans("ner")

[Span[4:15]: "123 Redmond Ave. NE, Redmond, WA, USA" → nk> (0.9706)]

In [None]:
# iterate over all labels in the sentence
for label in sentence.get_labels():
    print(f'label.value is: "{label.value}"')
    print(f'label.score is: "{label.score}"')
    print(f'the text of label.data_point is: "{label.data_point.text}"\n')

label.value is: "nk>"
label.score is: "0.9706432927738536"
the text of label.data_point is: "123 Redmond Ave. NE, Redmond, WA, USA"



### Try small dataset

In [37]:
# try the texts in the dataframe
text = small_df.loc[0, "text"] # first text
sentence = Sentence(text)
loc_tagger.predict(sentence)
print(sentence)

Sentence[39]: "Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."" → ["1515 Nicollet Ave, Minneapolis, MN."/nk>, "sidewalk."/nk>]


Predicts the address perfectly, but has "sidewalk" as a location, which we don't want. So, try a regular Flair model and see how it predicts "sidewalk":

In [38]:
tagger.predict(sentence)
print(sentence)

Sentence[39]: "Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."" → ["Nicollet Ave"/LOC, "Minneapolis"/LOC, "MN"/LOC]


In [39]:
tagger_small.predict(sentence)
print(sentence)

Sentence[39]: "Witnesses say the stop happened around 8:30 p.m. outside 1515 Nicollet Ave, Minneapolis, MN. The complainant alleges the officer pulled them from the vehicle without cause and placed them in cuffs on the sidewalk."" → ["Nicollet Ave"/LOC, "Minneapolis"/LOC, "MN"/LOC]


Neither of these predict "sidewalk" as an entity. So maybe we can cross-check the results from both the HuggingFace model and the regular Flair models.

In [34]:
text = small_df.loc[len(small_df) - 1, "text"]
sentence = Sentence(text)
loc_tagger.predict(sentence)
print(sentence)

Sentence[43]: "According to the complaint, the incident began at the intersection of W 7th St & Kellogg Blvd, St. Paul, MN. Bystanders reported raised voices and a brief scuffle before the officer escorted the resident to a patrol car." → ["W 7th St & Kellogg Blvd, St. Paul, MN."/nk>]


It can extract intersections too!

In [44]:
text = small_df.loc[2, "text"]
sentence = Sentence(text)
loc_tagger.predict(sentence)
print(sentence)

Sentence[265]: "Home Legislative File 2022-00233 RCA Legal Selement: Workers' Compensaon claim of Craig Taylor (RCA-2022-00195) ORIGINATING DEPARTMENT Finance & Property Services To Commiee(s) # Commiee Name Meeng Date 1 Policy & Government Oversight Commiee Mar 7, 2022 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Acon Item(s) # File Type Subcategory Item Descripon 1 Acon Selement Approving the workers' compensaon claim of Craig Taylor, by payment of $175,000 over three years to Craig Taylor and aorney, Meuser Law Firm, and authorizing the City Aorney's Oﬃce to execute any documents necessary to eﬀectuate the selement. Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related injuries. The pares reached a tentave selement by payment fo $175,000 over three years from fund 06930-1450100-789401-145400. Risk Management believes this selement is in the best interests of

In [36]:
# iterate over all labels in the sentence
for label in sentence.get_labels():
    # print label value and score
    print(f'label.value is: "{label.value}"')
    print(f'label.score is: "{label.score}"')
    # access the data point to which label attaches and print its text
    print(f'the text of label.data_point is: "{label.data_point.text}"\n')

label.value is: "nk>"
label.score is: "0.9161780178546906"
the text of label.data_point is: "City A"

label.value is: "nk>"
label.score is: "0.896032969156901"
the text of label.data_point is: "orney's Oﬃce"

label.value is: "nk>"
label.score is: "0.7273060530424118"
the text of label.data_point is: "Ward / Neighborhood /"

label.value is: "nk>"
label.score is: "0.8573320309321085"
the text of label.data_point is: "# Ward Neighborhood Address 1."

label.value is: "nk>"
label.score is: "0.9252222180366516"
the text of label.data_point is: "City of Minneapolis"



The weird characters are still a problem, because "City Attorney's Office" is split apart. Also, it predicts "Ward / Neighborhood /" and "# Ward Neighborhood Address 1" as locations, which is wrong.

### Try full dataset

In [47]:
for i, t in enumerate(df["text"][:10]):
    print(f"{i}: {t}\n")

0: Home Legislative File 2021-01132 RCA Legal selement: Workers' Compensaon claim of Andrew Allen (RCA-2021-01206) ORIGINATING DEPARTMENT Finance & Property Services To Commiee(s) # Commiee Name Meeng Date 1 Policy & Government Oversight Commiee Oct 20, 2021 LEAD STAFF: Emily Ann Colby PRESENTED BY: Emily Ann Colby Acon Item(s) # File Type Subcategory Item Descripon 1 Acon Selement Approving the selement of the Workers' Compensaon claim of Andrew Allen, by payment of $170,000 to Andrew Allen and aorney, Meuser Law Firm, and authorizing the City Aorney's Oﬃce to execute any documents necessary to eﬀectuate the selement. Previous Acons None RCA-2021-01206 - Legal settlement: Workers' Compensation claim of ... https://lims.minneapolismn.gov/RCA/8755 1 of 2 12/4/2021, 1:33 AM Ward / Neighborhood / Address # Ward Neighborhood Address 1. Not Applicable Background Analysis City of Minneapolis employee sustained work-related injuries. The pares reached a tentave selement o

In [48]:
# Try a court case
sentence = Sentence(df.loc[2, "text"])
loc_tagger.predict(sentence)
print(sentence)



In [49]:
# print each label value, confidence score, and corresponding text
for label in sentence.get_labels():
    print(f'label.value is: "{label.value}"')
    print(f'label.score is: "{label.score}"')
    print(f'the text of label.data_point is: "{label.data_point.text}"\n')

label.value is: "nk>"
label.score is: "0.9539910384586879"
the text of label.data_point is: "United States District Court, D. Minnesota"

label.value is: "nk>"
label.score is: "0.9209363609552383"
the text of label.data_point is: "Minneapolis, MN,"

label.value is: "nk>"
label.score is: "0.6935136169195175"
the text of label.data_point is: "Nickitas Law O:"

label.value is: "nk>"
label.score is: "0.934529983997345"
the text of label.data_point is: "St. Paul, MN,"

label.value is: "nk>"
label.score is: "0.9534521847963333"
the text of label.data_point is: "Minneapolis, MN,"

label.value is: "nk>"
label.score is: "0.9268128275871277"
the text of label.data_point is: "Bloomington, MN,"

label.value is: "nk>"
label.score is: "0.9698551297187805"
the text of label.data_point is: "Goodhue County."

label.value is: "nk>"
label.score is: "0.9572806358337402"
the text of label.data_point is: "Minneapolis, MN,"

label.value is: "nk>"
label.score is: "0.995135098695755"
the text of label.data_poi

There are many errors, such as "Minnesota law", ",", and "Missouri Family" being predicted as locations. Is the weird text causing those issues, or something else?

Now we compare the results of treating one whole document as a sentence, and splitting the document into sentences using a tokenizer.

In [86]:
doc = df.loc[7, "text"]
one_sentence = Sentence(doc)
loc_tagger.predict(one_sentence)
print(one_sentence)

Sentence[864]: "4/12/2021 Minnesota BCA releases more information about Brooklyn Center police shooting - StarTribune.com https://www.startribune.com/bca-releases-more-information-about-brooklyn-center-police-shooting/559525032/ 1/2 ___ NORTH METRO Minnesota BCA releases more information about Brooklyn Center police shooting The 21-year-old man who was killed could not be subdued with Tasers, the agency said.  By STAFF REPORTS  SEPTEMBER 5, 2019 — 9:11PM The 21-year-old man fatally shot by police Saturday in Brooklyn Center had threatened officers with a knife and could not be subdued with Tasers, the Minnesota Bureau of Criminal Apprehension said Thursday. Kobe Edgar Dimock-Heisler, who lived with his grandparents in the 5900 block of N. Halifax Avenue, died after being shot several times inside the home just after 4 p.m. Saturday. The BCA identified the three officers who discharged their weapons as: • Brandon Akers, who has been with the Brooklyn Center department for eight years. H

In [87]:
# print labels from predicting the whole doc as one sentence
for label in one_sentence.get_labels():
    print(f"Text: {label.data_point.text} | Score: {label.score.__round__(3)}\n")

Text: Minnesota BCA | Score: 0.944

Text: Brooklyn Center | Score: 0.993

Text: NORTH METRO Minnesota BCA | Score: 0.941

Text: Brooklyn Center | Score: 0.991

Text: Brooklyn Center | Score: 0.997

Text: Minnesota Bureau | Score: 0.879

Text: 5900 block of N. Halifax Avenue, | Score: 0.99

Text: Brooklyn Center department | Score: 0.866

Text: scene | Score: 0.53

Text: home | Score: 0.81

Text: Hennepin County Attorney's Office | Score: 0.907

Text: Hennepin County | Score: 0.988

Text: North Memorial Health Hospital | Score: 0.991

Text: Robbinsdale, | Score: 0.991

Text: Minneapolis | Score: 0.996

Text: . | Score: 0.901

Text: Minnesota BCA | Score: 0.876

Text: Brooklyn Center | Score: 0.992

Text: North Memorial | Score: 0.994

Text: Mayday | Score: 0.564

Text: Open Streets. | Score: 0.738

Text: North Shore | Score: 0.99

Text: Canada. | Score: 0.837

Text: Estes Funeral Chapel in Minneapolis. | Score: 0.918

Text: writers | Score: 0.676



In [88]:
from flair.splitter import SegtokSentenceSplitter

splitter = SegtokSentenceSplitter() # splits text into a list of sentences
sentences = splitter.split(doc) # split text into Sentence objects
sentences_text = [s.text for s in sentences] # get just the text from each sentence object

split_doc = "\n".join(sentences_text)        # turn the list of sentences into a string, w/ each sentence separated by a newline.
print(split_doc)                             # print this for better readability

4/12/2021 Minnesota BCA releases more information about Brooklyn Center police shooting - StarTribune.com https://www.startribune.com/bca-releases-more-information-about-brooklyn-center-police-shooting/559525032/ 1/2 ___ NORTH METRO Minnesota BCA releases more information about Brooklyn Center police shooting The 21-year-old man who was killed could not be subdued with Tasers, the agency said.
By STAFF REPORTS  SEPTEMBER 5, 2019 — 9:11PM The 21-year-old man fatally shot by police Saturday in Brooklyn Center had threatened officers with a knife and could not be subdued with Tasers, the Minnesota Bureau of Criminal Apprehension said Thursday.
Kobe Edgar Dimock-Heisler, who lived with his grandparents in the 5900 block of N.
Halifax Avenue, died after being shot several times inside the home just after 4 p.m.
Saturday.
The BCA identified the three officers who discharged their weapons as: • Brandon Akers, who has been with the Brooklyn Center department for eight years.
He discharged his 

Mistakes:
- "Minnesota Bureau" is supposed to be "Minnesota Bureau of Criminal Apprehension"
- "." and "writers" is predicted as a location
- Punctuation added to location names
    - "Robbinsdale," is predicted instead of "Robbinsdale"
    - "Canada." instead of "Canada"
- "Mayday" is a parade and "Open Streets" is a community event. They are not locations.

In [None]:
loc_tagger.predict(sentences) # run prediction on the list of sentences we created with the SegTokSplitter.

for sentence in sentences:
    for label in sentence.get_labels():
        print(f"Text: {label.data_point.text} | Score: {label.score.__round__(3)}\n")

Text: Minnesota BCA | Score: 0.957

Text: Brooklyn Center | Score: 0.993

Text: NORTH METRO Minnesota BCA | Score: 0.926

Text: Brooklyn Center | Score: 0.991

Text: Brooklyn Center | Score: 0.996

Text: Minnesota Bureau | Score: 0.855

Text: 5900 block of N. | Score: 0.988

Text: Halifax Avenue, | Score: 0.992

Text: Brooklyn Center | Score: 0.993

Text: residence | Score: 0.597

Text: scene | Score: 0.548

Text: home | Score: 0.742

Text: Hennepin County Attorney's Office | Score: 0.985

Text: Hennepin County | Score: 0.993

Text: North Memorial Health Hospital | Score: 0.986

Text: Robbinsdale, | Score: 0.988

Text: Minneapolis | Score: 0.997

Text: bridge. | Score: 0.944

Text: Minnesota | Score: 0.978

Text: Brooklyn Center | Score: 0.993

Text: North Memorial | Score: 0.992

Text: hospital | Score: 0.53

Text: psychiatric ward | Score: 0.643

Text: Mayday | Score: 0.662

Text: Open Streets. | Score: 0.786

Text: North Shore | Score: 0.995

Text: Canada. | Score: 0.998

Text: Este

Mistakes when predicting a list of sentences:
- Sentence splitter is incorrect! It splits a sentence on punctuation like "N. Halifax Avenue", or "4 p.m. Saturday"
- The tagger then predicts "5900 block of N." and "Halifax Avenue" as separate locations
- Most of the same mistakes as before, when we treated the doc as one sentence.

Advantages when predicting a list of sentences:
- Doesn't predict "." or "writers" as a location.
- Tags more locations, like "bridge", "hospital", or "psychiatric ward"

### Issues with the HuggingFace model

- Words like "Sidewalk", "hospital", or "police station" are being tagged as locations, and we want geographic locations.
    - Cross-check the fine-tuned NER model with the regular Flair ones?
- All the errors when I run it on the full dataset
    - Most errors are not about the weird characters, but some are.
- Should I tokenize articles into sentences and run one prediction per sentence? That gave more accuracy in some ways and less accuracy in other ways. Also, the sentence splitter is incorrect.

Next step: tokenizing the sentences
- Is it actually important that I'm splitting into sentences, or do I just need to split the text into shorter chunks? Is it getting things wrong 
    - e.g.: split every 50 words
- Or use TopoBERT
- Don't worry abotu the "subscribe here" sentences
- Use a NER model to extract dates.
- We can use the LLM to determine if the 
- Prof M can help me install TopoBERT with the Pixie package manager
    - Link: https://pixi.sh/latest/
    - https://pixi.sh/latest/installation/ 
- Pixi is the easiest way to do package management. You can have your environment connected to the project you're working on. You don't have to activate a venv. Prof M has that set up on the GPU
    - Use `pixi add` instead of pip/conda install
- Prof M will be on campus on Tue, Wed, and Thurs, for most of the day.
- Don't need to use TopoBERT, but I can if I want.