## Classifying and Labeling PII with Named Entity Recognition

**First some basics**

In [9]:
import spacy
# using pre-trained model from spacy
nlp = spacy.load("en_core_web_sm")

In [10]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [7]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

### Classifying text

In [20]:
#text = "Tesla Inc will not be acquiring Twitter for $45 billion, Noble Ackerson will when he gets back from Silicon Valley."
text = "In west Philadelphia born and raised. On the playground was where Noble Ackerson spent most of his days. "
text += "I got in one little fight and my mom got scared. "
text += "She said 'You're movin' with your auntie and uncle in Bel Air'!"
text += "Whistled for a cab and paid $50 with my credit card which is VISA, '4539 6273 2481 7107' and expires 4/2026"
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Philadelphia | GPE | Countries, cities, states
Noble Ackerson | PERSON | People, including fictional
his days | DATE | Absolute or relative dates or periods
Bel | GPE | Countries, cities, states
50 | MONEY | Monetary values, including unit
VISA | ORG | Companies, agencies, institutions, etc.
4/2026 | CARDINAL | Numerals that do not fall under another type


In [21]:
from spacy import displacy

displacy.render(doc,style="ent")

In [23]:
#type(doc[2:5])
doc[12]

Noble

In [24]:
from spacy.tokens import Span

s1 = Span (doc, 0,1, label="ORG")
s2 = Span (doc, 5,6, label="ORG")

doc.set_ents([s1,s2], default="unmodified")

In [25]:
for ent in doc.ents:
    print(ent.text, "|", ent.label_)

In | ORG
Philadelphia | GPE
raised | ORG
Noble Ackerson | PERSON
his days | DATE
Bel | GPE
50 | MONEY
VISA | ORG
4/2026 | CARDINAL


### NER on Googe cloud

#### Data Prep

But first, a quick experiment

In [None]:
#%pip install transformers[tensorflow]
#%pip install tensorflow

In [2]:
from transformers import pipeline

# Specify the model and revision
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
revision = "f2482bf"

ner = pipeline("ner",model=model_name,revision=revision, grouped_entities=True)

sequence = "In west Philadelphia born and raised. On the playground was where I spent most of my days. "
sequence += "I got in one little fight and my mom got scared. "
sequence += "She said 'You're movin' with your auntie and uncle in Bel Air'!"
sequence += "Whistled for a cab and paid $50 with my credit card my VISA, 4539627324817107, 4/2026, 435"

output = ner(sequence)

print(output)

[{'entity_group': 'LOC', 'score': 0.9977132, 'word': 'Philadelphia', 'start': 8, 'end': 20}, {'entity_group': 'LOC', 'score': 0.9963516, 'word': 'Bel Air', 'start': 194, 'end': 201}]


In [None]:
####Pseudocode:####

# install relevant new libraries

# define file path "reviews.csv"

# Load the input dataset

# Create classification model

# Classify reviews

# View results


### Out of scope: serving and integrating ###