This example uses the default spaCy English model to perform named entity recognition.

In [3]:
import spacy
from spacy import displacy

spacy.prefer_gpu()
nlp = spacy.load("en_core_web_sm")

Now, spaCy is set up with a CPU-bound model that prioritizes efficiency over accuracy to help keep costs down. In theory, this could be deployed on host machines without any issues.



In [5]:

text = """Hello,

I want to make setup a yearly scholarship that provides a student with $40,000 if they are in Price Faculty of Engineering
as well as a $10,000 scholarship if they are in Computer Science. I would like to know if this is possible.

Thanks,

Bruce"""

# Process the text
doc = nlp(text)

displacy.render(doc, style="ent", jupyter=True)

# Extract entities
for ent in doc.ents:
    print(f"Entity: {ent.text}, Label: {ent.label_}")

Entity: 40,000, Label: MONEY
Entity: Price Faculty of Engineering, Label: ORG
Entity: 10,000, Label: MONEY
Entity: Bruce, Label: PERSON


Looking at the results out of the box, they are not ideal. The model can properly classify currency amounts, but it struggles to infer more context, such as allocations for scholarships or departments. Therefore, fine-tuning the models will be necessary to improve accuracy.