# SpaCy toolbox

## Named entity recognition

In [1]:
dialog_text = """Doctor: How are you Miss G? 
Patient: I am good doctor, thank you for asking. 
Doctor: So, tell me what is going on?
Patient: I have this ear pain and headache for some time. It's better than before but I still want to get it checked. 
Doctor: Okay, when exactly did it start?
Patient: Um, almost three weeks ago. I am having difficulty hearing. I also feel this pressure on the left side of my sinus causing tooth pain. I went to my dentist yesterday, but my teeth are fine. 
Doctor: Okay, do you have headache now?
Patient: No, just ear pain and this jaw pain on the left side. 
Doctor: Any fever, cough, sore throat, or any cold like symptoms? 
Patient: No, but I have a sinus problem and I suffer from chronic left sided headache.
Doctor: How old are you?
Patient: Oh, I am forty nine.
Doctor: Hm, so are you taking any medications for your pain?
Patient: No, currently I am just using Cutivate for my eczema. It has helped me a lot, I do need a refill for it. 
Doctor: Okay I will send a prescription for it to your pharmacy.
"""

In [2]:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(dialog_text)

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [3]:
for ent in doc.ents:
    print(ent.text, ent.label_)

Miss G PERSON
almost three weeks ago DATE
yesterday DATE
Cutivate PRODUCT


In [4]:
text2 = """Patient: I just had few questions. Can you tell me about my diagnosis?
Doctor: Sure. It's called Serotonin syndrome, ma'am. After careful evaluation of your labs, we found out that your white count and C P K was high, and those abnormalities lined up with serotonin syndrome. What are you experiencing right now?
Patient: I have been very restless and easily agitated, I have diarrhea. But no fever or shakiness.
Doctor: These can match serotonin syndrome as well. You deny any fever, tremor or hypperflexia so we will give you some IV fluids and I will check on you in an hour or so.
Patient: Okay. 
Doctor: Looks like your C P K counts improved with I V fluids and after discontinuing Prozac.
Patient: How are the counts now? Are they normal? Because I feel normal.
Doctor: Yes, your C P K and white blood cell counts have come back down. Almost normal now.
Patient: My husband left me two weeks ago. My panic attacks are increasing day by day.
Doctor: Okay, I see that you have a history of panic attacks and you do have depression and anxiety, is that correct? Last Friday, I talked to psychiatrist about your issues, and he recommended Cymbalta as an alternative to Prozac. 
Patient: Yes, I stopped taking Prozac, and I am going to see him on Monday or Tuesday. I have a counselor too.
Patient: I do think it will be difficult to go home alone but my daughter is coming to visit me in two weeks.
Doctor: Oh wow.
Patient: Yeah.
Doctor: That's nice. Do you have someone who can drop you home and help you?
Patient: Yes, I have a friend who does that, I am staying with her for next three days.
Doctor: Okay that sounds good. Just continue with your medications for high blood pressure and diabetes as well. So, we treated your imbalance issues and gave you IV fluids, you do not have any more diarrhea, right?
Patient: Yes, that's right."""

## default model identified 0 entities

In [5]:
import medspacy
nlp = medspacy.load( )
doc2 = nlp(text2)
for ent in doc2.ents:
    print(ent.text, ent.label_)

## prebuilt medical model

In [6]:
import en_core_sci_scibert
import en_core_med7_trf


In [7]:
# import scispacy
import spacy

nlp2 = spacy.load("en_core_med7_trf")
doc2 = nlp2(text2)
for ent in doc2.ents:
    print(ent.text, ent.label_)

IV ROUTE
fluids DRUG
fluids DRUG
Prozac DRUG
Cymbalta DRUG
Prozac DRUG
Prozac DRUG
IV ROUTE
fluids DRUG


In [8]:
doc2.ents

(IV, fluids, fluids, Prozac, Cymbalta, Prozac, Prozac, IV, fluids)

## Intent recognitioin / classification

In [9]:
import spacy

# Load the English model
nlp = spacy.load('en_core_web_sm')

# Function to recognize intent
def recognize_intent(text):
    doc = nlp(text)
    # Here you can define your intents based on the entities or patterns
    intents = {'greeting': ['hello', 'hi', 'hey'], 'goodbye': ['bye', 'goodbye']}
    for token in doc:
        for intent, keywords in intents.items():
            if token.text.lower() in keywords:
                return intent
    return 'unknown'

# Example usage
user_input = 'Hello, how are you?'
intent = recognize_intent(user_input)
print(f'Intent recognized: {intent}')  # Output: Intent recognized: greeting

Intent recognized: greeting


## Training a Custom Model

In [10]:
from spacy.util import minibatch, compounding
import random
from spacy.training.example import Example

In [11]:
# training_data = [
#     ('Hello, I need help', {'entities': [(0, 5, 'greeting')]}),
#     ('Goodbye, see you later', {'entities': [(0, 7, 'goodbye')]}),
# ]
training_data = [
    ('Hello, I need help', {'cats': {'greeting': 1.0, 'goodbye': 0.0}}),
    ('How is it going', {'cats': {'greeting': 1.0, 'goodbye': 0.0}}),
    ('Goodbye, see you later', {'cats': {'greeting': 0.0, 'goodbye': 1.0}}),
    ('Byebye, see you', {'cats': {'greeting': 0.0, 'goodbye': 1.0}}),
    ('see you', {'cats': {'greeting': 0.0, 'goodbye': 1.0}}),
    ('Hi there', {'cats': {'greeting': 1.0, 'goodbye': 0.0}}),
    ('See you soon', {'cats': {'greeting': 0.0, 'goodbye': 1.0}}),
]

In [12]:
nlp = spacy.blank('en')
# text_cat = nlp.create_pipe('textcat')
text_cat = nlp.add_pipe('textcat', last=True)
text_cat.add_label('greeting')
text_cat.add_label('goodbye')

1

In [13]:
examples = []
for text, annots in training_data:
    examples.append(Example.from_dict(nlp.make_doc(text), annots))
nlp.initialize(lambda: examples)

n_iter = 20
for epoch in range(n_iter):
    random.shuffle(examples)
    losses = {}
    # Create the minibatch generator
    for batch in minibatch(examples, size=8):
        nlp.update(batch, drop=0.3, losses=losses)
    print(losses)


{'textcat': 0.25}
{'textcat': 0.24345222115516663}
{'textcat': 0.23973117768764496}
{'textcat': 0.23233701288700104}
{'textcat': 0.2254914939403534}
{'textcat': 0.2097148895263672}
{'textcat': 0.20613911747932434}
{'textcat': 0.16546247899532318}
{'textcat': 0.16736507415771484}
{'textcat': 0.15204565227031708}
{'textcat': 0.1315755993127823}
{'textcat': 0.07987417280673981}
{'textcat': 0.10579823702573776}
{'textcat': 0.057442452758550644}
{'textcat': 0.049352508038282394}
{'textcat': 0.04665660858154297}
{'textcat': 0.0472964346408844}
{'textcat': 0.017516298219561577}
{'textcat': 0.015405458398163319}
{'textcat': 0.02202540822327137}


In [14]:
# Assuming trained_nlp is your trained model
def predict_text_category(nlp, text):
    doc = nlp(text)
    print("Prediction scores:")
    for label, score in doc.cats.items():
        print(f"{label}: {score}")
    
    # Get the category with the highest score
    predicted_category = max(doc.cats, key=doc.cats.get)
    print(f"Predicted category: {predicted_category}")
    return predicted_category

# Example usage
test_texts = [
    "Hello there",
    "Goodbye",
    "See you later"
]

for text in test_texts:
    print(f"\nAnalyzing: '{text}'")
    predict_text_category(nlp, text)


Analyzing: 'Hello there'
Prediction scores:
greeting: 0.8876179456710815
goodbye: 0.11238212138414383
Predicted category: greeting

Analyzing: 'Goodbye'
Prediction scores:
greeting: 0.16446293890476227
goodbye: 0.8355370163917542
Predicted category: goodbye

Analyzing: 'See you later'
Prediction scores:
greeting: 0.018145233392715454
goodbye: 0.9818547964096069
Predicted category: goodbye


## Rule based solution

In [15]:
import spacy
from spacy.matcher import Matcher

In [16]:
nlp = spacy.load("en_core_web_sm")

In [17]:
matcher = Matcher(nlp.vocab)

In [18]:
pattern = [
    {"LOWER": {"IN": ["print", "generate", "create"]}},  # Action keywords
    {"IS_ALPHA": True, "OP": "*"},  # Allow intermediate words
    {"LOWER": "map"},
    {"IS_ALPHA": True, "OP": "*"},  # Allow intermediate words
    {"LOWER": {"IN": ["hospital", "clinic", "station"]}}  # Target keywords
]
matcher.add("PRINT_MAP", [pattern])

In [19]:
text = """
can you print a map for the Toronto hospital I was wondering if we could generate one for the Guelph clinic what about printing directions to the nearest gas station
"""

In [20]:
doc = nlp(text)

In [21]:
matches = matcher(doc)

In [22]:
for matchid, start, end in matches:
    print(matchid, start, end)
    span = doc[start:end]
    print(f"Matched Intent: {span.text}")
    for token in span:
        if token.text.lower() in ["hospital", "clinic", "station"]:
            # Check for location descriptors
            descriptor = " ".join(child.text for child in token.lefts if child.dep in ["compound", "amod"])
            print(f"Descriptor: {descriptor} {token.text}")

9095104806068616893 3 10
Matched Intent: print a map for the Toronto hospital
Descriptor:  hospital
9095104806068616893 3 22
Matched Intent: print a map for the Toronto hospital I was wondering if we could generate one for the Guelph clinic
Descriptor:  hospital
Descriptor:  clinic
9095104806068616893 3 31
Matched Intent: print a map for the Toronto hospital I was wondering if we could generate one for the Guelph clinic what about printing directions to the nearest gas station
Descriptor:  hospital
Descriptor:  clinic
Descriptor:  station


## rule-based prescription pattern

In [23]:
import spacy
from spacy.matcher import Matcher
from typing import Dict, List, Tuple

def create_prescription_matcher(nlp: spacy.language.Language) -> Matcher:
    """Create a matcher with prescription-related patterns"""
    matcher = Matcher(nlp.vocab)
    
    # Prescription intent patterns
    prescription_patterns = [
        # Direct prescription patterns
        [
            {"LOWER": {"IN": ["prescribe", "prescribing", "prescribed"]}},
            {"IS_ALPHA": True, "OP": "*"},  # Optional words in between
            {"LOWER": {"IN": ["mg", "milligrams", "tablets", "pills", "capsules"]}, "OP": "?"}
        ],
        # Recommendation patterns
        [
            {"LOWER": {"IN": ["recommend", "suggesting", "suggest", "advise"]}},
            {"IS_ALPHA": True, "OP": "*"},
            {"LOWER": {"IN": ["take", "try", "start"]}},
        ],
        # Direct medication mentions
        [
            {"LOWER": {"IN": ["take", "taking"]}},
            {"IS_ALPHA": True, "OP": "*"},
            {"LOWER": {"IN": ["mg", "milligrams", "tablets", "pills", "capsules"]}}
        ],
    ]
    
    matcher.add("PRESCRIPTION_INTENT", prescription_patterns)
    return matcher

def extract_dosage(doc: spacy.tokens.Doc) -> List[Dict]:
    """Extract dosage information from text"""
    dosage_info = []
    
    for ent in doc.ents:
        # Look for quantities and measurements
        if ent.label_ in ["QUANTITY", "CARDINAL"]:
            next_token = doc[ent.end].text.lower() if ent.end < len(doc) else ""
            if next_token in ["mg", "milligrams", "tablets", "pills", "capsules"]:
                dosage_info.append({
                    "amount": ent.text,
                    "unit": next_token
                })
    
    return dosage_info

def analyze_prescription_intent(text: str) -> Dict:
    """
    Analyze text for prescription intents and extract relevant information
    
    Args:
        text: The conversation text to analyze
        
    Returns:
        Dictionary containing prescription intent information
    """
    # Load spaCy model
    nlp = spacy.load("en_core_web_sm")
    
    # Process text
    doc = nlp(text)
    
    # Create and use matcher
    matcher = create_prescription_matcher(nlp)
    matches = matcher(doc)
    
    # Extract results
    result = {
        "has_prescription_intent": len(matches) > 0,
        "matches": [],
        "dosage_info": extract_dosage(doc)
    }
    
    # Get context for each match
    for match_id, start, end in matches:
        span = doc[start:end]
        result["matches"].append({
            "text": span.text,
            "sentence": span.sent.text.strip()
        })
    
    return result

# Example usage
if __name__ == "__main__":
    # Test conversations
    conversations = [
        "I'm going to prescribe you 500mg of amoxicillin three times a day.",
        "I recommend taking two tablets of ibuprofen 200mg for your pain.",
        "Let's continue with your current medications.",
        "I suggest you start taking vitamin D supplements.",
    ]
    
    for conversation in conversations:
        result = analyze_prescription_intent(conversation)
        print("\nAnalyzing:", conversation)
        print("Has prescription intent:", result["has_prescription_intent"])
        if result["has_prescription_intent"]:
            print("Matches found:", [m["text"] for m in result["matches"]])
            if result["dosage_info"]:
                print("Dosage information:", result["dosage_info"])


Analyzing: I'm going to prescribe you 500mg of amoxicillin three times a day.
Has prescription intent: True
Matches found: ['prescribe', 'prescribe you']
Dosage information: [{'amount': '500', 'unit': 'mg'}]

Analyzing: I recommend taking two tablets of ibuprofen 200mg for your pain.
Has prescription intent: True
Matches found: ['taking two tablets']
Dosage information: [{'amount': 'two', 'unit': 'tablets'}, {'amount': '200', 'unit': 'mg'}]

Analyzing: Let's continue with your current medications.
Has prescription intent: False

Analyzing: I suggest you start taking vitamin D supplements.
Has prescription intent: True
Matches found: ['suggest you start']


## statistical model based prescription classification

In [24]:
import spacy
from spacy.util import minibatch, compounding
import random
from spacy.training.example import Example

# Training data for prescription intent
training_data = [
    ("I'll prescribe you 500mg of amoxicillin", {'cats': {'prescription': 1.0, 'non_prescription': 0.0}}),
    ("Take two tablets three times a day", {'cats': {'prescription': 1.0, 'non_prescription': 0.0}}),
    ("I recommend taking this medication", {'cats': {'prescription': 1.0, 'non_prescription': 0.0}}),
    ("Let's start you on antibiotics", {'cats': {'prescription': 1.0, 'non_prescription': 0.0}}),
    ("You should take 200mg ibuprofen", {'cats': {'prescription': 1.0, 'non_prescription': 0.0}}),
    ("How are you feeling today?", {'cats': {'prescription': 0.0, 'non_prescription': 1.0}}),
    ("Tell me about your symptoms", {'cats': {'prescription': 0.0, 'non_prescription': 1.0}}),
    ("Your blood pressure looks normal", {'cats': {'prescription': 0.0, 'non_prescription': 1.0}}),
    ("Let's schedule a follow-up", {'cats': {'prescription': 0.0, 'non_prescription': 1.0}}),
    ("I'll refer you to a specialist", {'cats': {'prescription': 0.0, 'non_prescription': 1.0}}),
]

# Initialize spaCy
nlp = spacy.blank('en')
text_cat = nlp.add_pipe('textcat', last=True)
text_cat.add_label('prescription')
text_cat.add_label('non_prescription')

# Prepare training examples
examples = []
for text, annots in training_data:
    examples.append(Example.from_dict(nlp.make_doc(text), annots))
nlp.initialize(lambda: examples)

# Training loop
n_iter = 20
for epoch in range(n_iter):
    random.shuffle(examples)
    losses = {}
    for batch in minibatch(examples, size=8):
        nlp.update(batch, drop=0.3, losses=losses)
    print(f"Epoch {epoch}, Losses:", losses)

def predict_prescription_intent(nlp, text):
    doc = nlp(text)
    print("\nAnalyzing:", text)
    print("Prediction scores:")
    for label, score in doc.cats.items():
        print(f"{label}: {score:.3f}")
    
    predicted_category = max(doc.cats, key=doc.cats.get)
    print(f"Predicted category: {predicted_category}")
    return predicted_category


Epoch 0, Losses: {'textcat': 0.49726206064224243}
Epoch 1, Losses: {'textcat': 0.49990296363830566}
Epoch 2, Losses: {'textcat': 0.4953669607639313}
Epoch 3, Losses: {'textcat': 0.46809716522693634}
Epoch 4, Losses: {'textcat': 0.4745233505964279}
Epoch 5, Losses: {'textcat': 0.47330568730831146}
Epoch 6, Losses: {'textcat': 0.4044618010520935}
Epoch 7, Losses: {'textcat': 0.43290160596370697}
Epoch 8, Losses: {'textcat': 0.3822507709264755}
Epoch 9, Losses: {'textcat': 0.33366259932518005}
Epoch 10, Losses: {'textcat': 0.3010321408510208}
Epoch 11, Losses: {'textcat': 0.291446715593338}
Epoch 12, Losses: {'textcat': 0.20964667946100235}
Epoch 13, Losses: {'textcat': 0.1929708644747734}
Epoch 14, Losses: {'textcat': 0.12446855753660202}
Epoch 15, Losses: {'textcat': 0.06889882311224937}
Epoch 16, Losses: {'textcat': 0.058809464797377586}
Epoch 17, Losses: {'textcat': 0.04528079181909561}
Epoch 18, Losses: {'textcat': 0.024109036661684513}
Epoch 19, Losses: {'textcat': 0.007771807024255

In [25]:

# Test the model
test_texts = [
    "I'm prescribing 20mg of lisinopril daily",
    "Please describe your pain level",
    "Take two tablets before bedtime",
    "We'll need to run some tests first",
    "Start with 500mg twice daily"
]

print("\nTesting the model:")
for text in test_texts:
    predict_prescription_intent(nlp, text)


Testing the model:

Analyzing: I'm prescribing 20mg of lisinopril daily
Prediction scores:
prescription: 0.899
non_prescription: 0.101
Predicted category: prescription

Analyzing: Please describe your pain level
Prediction scores:
prescription: 0.199
non_prescription: 0.801
Predicted category: non_prescription

Analyzing: Take two tablets before bedtime
Prediction scores:
prescription: 0.944
non_prescription: 0.056
Predicted category: prescription

Analyzing: We'll need to run some tests first
Prediction scores:
prescription: 0.616
non_prescription: 0.384
Predicted category: prescription

Analyzing: Start with 500mg twice daily
Prediction scores:
prescription: 0.961
non_prescription: 0.039
Predicted category: prescription
