## **Step 1: Install Required Libraries**

In [9]:
!pip install spacy tqdm



## **Step 2: Download the spaCy Model**

In [10]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.4.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.1/en_core_web_sm-3.4.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
      --------------------------------------- 0.3/12.8 MB ? eta -:--:--
     -- ------------------------------------- 0.8/12.8 MB 2.6 MB/s eta 0:00:05
     ---- ----------------------------------- 1.3/12.8 MB 2.5 MB/s eta 0:00:05
     ----- ---------------------------------- 1.8/12.8 MB 2.5 MB/s eta 0:00:05
     ------- -------------------------------- 2.4/12.8 MB 2.4 MB/s eta 0:00:05
     --------- ------------------------------ 2.9/12.8 MB 2.5 MB/s eta 0:00:05
     ---------- ----------------------------- 3.4/12.8 MB 2.5 MB/s eta 0:00:04
     ------------ --------------------------- 3.9/12.8 MB 2.4 MB/s eta 0:00:04
     ------------- -------------------------- 4.5/12.8 MB 2.4 MB/s eta 0:00:04
     --------------- ------------------------ 5.

## **1️. Define Entities for NER**

Since you want to recognize which product a customer is asking about, you need to identify relevant entities.
For example:

Product Names (PRODUCT) → "Dental EMR", "Practice Management Software"

Service Types (SERVICE) → "Revenue Cycle Management", "Medical Billing"

Company Name (COMPANY) → "SRS Web Solutions"

## **2️. Choose an NER Approach**

You have two main options:

🔹 Option 1: Pre-trained NER Model (Fast but Less Accurate)
1. Use a Hugging Face model like Blaze999/Medical-NER or dbmdz/bert-large-cased-finetuned-conll03-english.

2. Extract entities from customer queries.

3. Pros: Quick setup

4. Cons: May not recognize your custom products well.

🔹 Option 2: Custom NER Model (Best for Accuracy)

1. Train a spaCy NER or Fine-tune BERT with labeled sales data.

2. Examples:

    . SRS Web Solutions offers Dental EMR. → { "text": "Dental EMR", "label": "PRODUCT" }

    .  I need help with medical billing. → { "text": "medical billing", "label": "SERVICE" }

3. Pros: High accuracy for your products

4. cons: Needs training data.

## **3️. Implement a Basic NER Pipeline**

Here’s a simple NER setup using spaCy:

In [8]:
import spacy
from spacy.tokens import DocBin
from spacy.training import Example
import random
import os
from tqdm import tqdm

# Try to import tqdm for progress bars, fall back to a simple function if not available
try:
    from tqdm import tqdm
except ImportError:
    # Simple function to use if tqdm is not available
    def tqdm(iterable, **kwargs):
        print(f"Starting {kwargs.get('desc', 'process')}...")
        return iterable

# Expanded training data with variations
TRAIN_DATA = [
    # Original examples
    ("I want to buy Dental EMR.", {"entities": [(16, 27, "PRODUCT")]}),
    ("Does your company sell Dental EMR software?", {"entities": [(24, 34, "PRODUCT")]}),
    ("Can I get details on Practice Management Software?", {"entities": [(21, 49, "PRODUCT")]}),
    ("Tell me about your Revenue Cycle Management service.", {"entities": [(19, 48, "SERVICE")]}),
    ("Does your service include Medical Billing?", {"entities": [(26, 40, "SERVICE")]}),
    ("What are the benefits of Telemedicine Solutions?", {"entities": [(24, 47, "PRODUCT")]}),
    ("I'm interested in a cloud-based Dental EMR system.", {"entities": [(28, 38, "PRODUCT")]}),
    ("How does your Medical Billing software work?", {"entities": [(12, 27, "SERVICE")]}),
    ("Can you provide a demo of Revenue Cycle Management?", {"entities": [(25, 54, "SERVICE")]}),
    ("Do you offer cloud-based Telemedicine Solutions?", {"entities": [(23, 46, "PRODUCT")]}),
    
    # Additional examples to improve detection
    ("Tell me about Dental EMR.", {"entities": [(14, 25, "PRODUCT")]}),
    ("I need information on Dental EMR products.", {"entities": [(21, 32, "PRODUCT")]}),
    ("Telemedicine Solutions are becoming popular.", {"entities": [(0, 23, "PRODUCT")]}),
    ("Revenue Cycle Management improves cashflow.", {"entities": [(0, 29, "SERVICE")]}),
    ("Medical Billing services can reduce costs.", {"entities": [(0, 15, "SERVICE")]}),
    ("We specialize in Practice Management Software.", {"entities": [(17, 45, "PRODUCT")]}),
    ("Is Dental EMR included in your package?", {"entities": [(3, 14, "PRODUCT")]}),
    ("How much does the Telemedicine Solutions cost?", {"entities": [(18, 41, "PRODUCT")]}),
    ("Are your Revenue Cycle Management tools cloud-based?", {"entities": [(9, 38, "SERVICE")]}),
    ("Medical Billing is essential for healthcare providers.", {"entities": [(0, 15, "SERVICE")]})
]

# Function to create and train NER model
def train_healthcare_ner(use_pretrained=False):
    # Create NLP object - try to use pretrained model if requested
    if use_pretrained:
        try:
            print("Attempting to load pretrained model...")
            nlp = spacy.load("en_core_web_sm")
            print("Pretrained model loaded successfully!")
        except Exception as e:
            print(f"Could not load pretrained model: {e}")
            print("Falling back to blank model...")
            nlp = spacy.blank("en")
    else:
        print("Using blank model as requested...")
        nlp = spacy.blank("en")
    
    # Add NER pipeline if not present
    if "ner" not in nlp.pipe_names:
        ner = nlp.add_pipe("ner", last=True)
        print("Added NER pipeline component.")
    else:
        ner = nlp.get_pipe("ner")
        print("Using existing NER pipeline component.")
    
    # Add entity labels
    print("Adding entity labels...")
    labels = set()
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get("entities"):
            labels.add(ent[2])
    
    for label in labels:
        ner.add_label(label)
    print(f"Added {len(labels)} labels: {', '.join(labels)}")
    
    # Split data into training and evaluation sets (80/20 split)
    print("Splitting data into training and evaluation sets...")
    random.shuffle(TRAIN_DATA)
    split = int(len(TRAIN_DATA) * 0.8)
    train_data = TRAIN_DATA[:split]
    eval_data = TRAIN_DATA[split:]
    print(f"Training data: {len(train_data)} examples")
    print(f"Evaluation data: {len(eval_data)} examples")
    
    # Configure training parameters
    n_iter = 100
    batch_size = 4
    
    # Disable other pipelines during training for better performance
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
    print(f"Disabling other pipelines during training: {', '.join(other_pipes) if other_pipes else 'none'}")
    
    # Train with better parameters
    with nlp.disable_pipes(*other_pipes):
        # Only train NER
        optimizer = nlp.begin_training()
        print(f"Starting training for {n_iter} iterations...")
        
        # Training loop
        for i in range(n_iter):
            # Shuffle training data
            random.shuffle(train_data)
            batches = list(spacy.util.minibatch(train_data, size=batch_size))
            losses = {}
            
            # Process batches with progress bar
            for batch in tqdm(batches, desc=f"Iteration {i+1}/{n_iter}"):
                examples = []
                for text, annotations in batch:
                    doc = nlp.make_doc(text)
                    example = Example.from_dict(doc, annotations)
                    examples.append(example)
                
                # Update with appropriate dropout (higher to prevent overfitting)
                nlp.update(
                    examples,
                    drop=0.3,
                    losses=losses,
                    sgd=optimizer
                )
            
            # Print loss for this iteration
            print(f"Iteration {i+1}, Loss: {losses}")
            
            # Evaluate on the evaluation set every 10 iterations
            if i % 10 == 9 or i == n_iter - 1:
                # Calculate evaluation metrics
                correct = 0
                total_gold = 0
                total_pred = 0
                
                for text, annot in eval_data:
                    doc_gold_text = nlp.make_doc(text)
                    gold = Example.from_dict(doc_gold_text, annot)
                    pred_value = nlp(text)
                    
                    gold_ents = set([(ent.text, ent.label_) for ent in gold.reference.ents])
                    pred_ents = set([(ent.text, ent.label_) for ent in pred_value.ents])
                    
                    correct += len(gold_ents & pred_ents)
                    total_gold += len(gold_ents)
                    total_pred += len(pred_ents)
                
                precision = correct / total_pred if total_pred > 0 else 0
                recall = correct / total_gold if total_gold > 0 else 0
                f_score = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
                
                print(f"Evaluation - Precision: {precision:.2f}, Recall: {recall:.2f}, F-score: {f_score:.2f}")
    
    # Save the trained model
    output_dir = "healthcare_ner_model"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        
    nlp.to_disk(output_dir)
    print(f"Model saved to {output_dir}")
    
    return nlp, output_dir

# Test function with detailed output
def test_ner_model(nlp, test_sentences):
    print("\nTesting NER model...\n" + "-" * 50)
    
    for sentence in test_sentences:
        doc = nlp(sentence)
        
        # Print sentence with highlighted entities
        tokens = []
        for token in doc:
            if token.ent_type_:
                tokens.append(f"[{token.text}:{token.ent_type_}]")
            else:
                tokens.append(token.text)
        
        highlighted = " ".join(tokens)
        
        print(f"\nInput: {sentence}")
        print(f"Parsed: {highlighted}")
        print(f"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}")
        print("-" * 50)

# Main execution
if __name__ == "__main__":
    # First try with blank model - this will always work
    print("=" * 70)
    print("TRAINING WITH BLANK MODEL")
    print("=" * 70)
    nlp, model_dir = train_healthcare_ner(use_pretrained=False)
    
    # Test sentences
    test_sentences = [
        "Tell me about Dental EMR.",
        "Do you provide Telemedicine Solutions?",
        "I need help with Revenue Cycle Management.",
        "What are the features of Practice Management Software?",
        "Does your company offer Medical Billing?",
        # New sentences
        "Our clinic uses a different Dental EMR platform.",
        "Can we integrate your Medical Billing with our system?",
        "How secure is your Telemedicine Solutions platform?",
    ]
    
    # Test the model
    test_ner_model(nlp, test_sentences)
    
    print("\nTo install the spaCy model for better results next time, run:")
    print("python -m spacy download en_core_web_sm")

TRAINING WITH BLANK MODEL
Using blank model as requested...
Added NER pipeline component.
Adding entity labels...
Added 2 labels: SERVICE, PRODUCT
Splitting data into training and evaluation sets...
Training data: 16 examples
Evaluation data: 4 examples
Disabling other pipelines during training: none
Starting training for 100 iterations...


Iteration 1/100: 100%|██████████| 4/4 [00:00<00:00, 41.05it/s]


Iteration 1, Loss: {'ner': 85.57232841104269}


Iteration 2/100: 100%|██████████| 4/4 [00:00<00:00, 42.25it/s]


Iteration 2, Loss: {'ner': 67.53044295310974}


Iteration 3/100: 100%|██████████| 4/4 [00:00<00:00, 41.58it/s]


Iteration 3, Loss: {'ner': 32.50542763341218}


Iteration 4/100: 100%|██████████| 4/4 [00:00<00:00, 41.57it/s]


Iteration 4, Loss: {'ner': 10.466555714255492}


Iteration 5/100: 100%|██████████| 4/4 [00:00<00:00, 41.89it/s]


Iteration 5, Loss: {'ner': 8.599403459323412}


Iteration 6/100: 100%|██████████| 4/4 [00:00<00:00, 41.92it/s]


Iteration 6, Loss: {'ner': 9.684077651602198}


Iteration 7/100: 100%|██████████| 4/4 [00:00<00:00, 39.68it/s]


Iteration 7, Loss: {'ner': 7.031636081310298}


Iteration 8/100: 100%|██████████| 4/4 [00:00<00:00, 38.71it/s]


Iteration 8, Loss: {'ner': 13.713383311585943}


Iteration 9/100: 100%|██████████| 4/4 [00:00<00:00, 39.71it/s]


Iteration 9, Loss: {'ner': 16.407342093641702}


Iteration 10/100: 100%|██████████| 4/4 [00:00<00:00, 40.49it/s]


Iteration 10, Loss: {'ner': 12.559184385048738}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 11/100: 100%|██████████| 4/4 [00:00<00:00, 41.27it/s]


Iteration 11, Loss: {'ner': 8.947842960478367}


Iteration 12/100: 100%|██████████| 4/4 [00:00<00:00, 42.08it/s]


Iteration 12, Loss: {'ner': 6.811150621257817}


Iteration 13/100: 100%|██████████| 4/4 [00:00<00:00, 40.98it/s]


Iteration 13, Loss: {'ner': 9.467947105163915}


Iteration 14/100: 100%|██████████| 4/4 [00:00<00:00, 37.18it/s]


Iteration 14, Loss: {'ner': 9.04102503966071}


Iteration 15/100: 100%|██████████| 4/4 [00:00<00:00, 37.40it/s]


Iteration 15, Loss: {'ner': 6.195454505500286}


Iteration 16/100: 100%|██████████| 4/4 [00:00<00:00, 40.94it/s]


Iteration 16, Loss: {'ner': 5.5602737239838795}


Iteration 17/100: 100%|██████████| 4/4 [00:00<00:00, 38.77it/s]


Iteration 17, Loss: {'ner': 3.714468747110905}


Iteration 18/100: 100%|██████████| 4/4 [00:00<00:00, 41.43it/s]


Iteration 18, Loss: {'ner': 2.1939178421143835}


Iteration 19/100: 100%|██████████| 4/4 [00:00<00:00, 41.39it/s]


Iteration 19, Loss: {'ner': 0.40179568268020915}


Iteration 20/100: 100%|██████████| 4/4 [00:00<00:00, 42.05it/s]


Iteration 20, Loss: {'ner': 0.00552745701607904}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 21/100: 100%|██████████| 4/4 [00:00<00:00, 42.34it/s]


Iteration 21, Loss: {'ner': 0.008134441671904469}


Iteration 22/100: 100%|██████████| 4/4 [00:00<00:00, 38.78it/s]


Iteration 22, Loss: {'ner': 0.0009374278339523024}


Iteration 23/100: 100%|██████████| 4/4 [00:00<00:00, 39.09it/s]


Iteration 23, Loss: {'ner': 2.0858276268512742e-05}


Iteration 24/100: 100%|██████████| 4/4 [00:00<00:00, 39.41it/s]


Iteration 24, Loss: {'ner': 3.7623099124094616e-05}


Iteration 25/100: 100%|██████████| 4/4 [00:00<00:00, 39.68it/s]


Iteration 25, Loss: {'ner': 2.6678131970559296e-06}


Iteration 26/100: 100%|██████████| 4/4 [00:00<00:00, 41.55it/s]


Iteration 26, Loss: {'ner': 1.445849121311184e-06}


Iteration 27/100: 100%|██████████| 4/4 [00:00<00:00, 41.91it/s]


Iteration 27, Loss: {'ner': 2.096578226649749e-05}


Iteration 28/100: 100%|██████████| 4/4 [00:00<00:00, 42.10it/s]


Iteration 28, Loss: {'ner': 1.4727847682528778e-06}


Iteration 29/100: 100%|██████████| 4/4 [00:00<00:00, 41.88it/s]


Iteration 29, Loss: {'ner': 2.322647458721907e-06}


Iteration 30/100: 100%|██████████| 4/4 [00:00<00:00, 39.08it/s]


Iteration 30, Loss: {'ner': 4.443225303734581e-07}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 31/100: 100%|██████████| 4/4 [00:00<00:00, 41.60it/s]


Iteration 31, Loss: {'ner': 4.107536487009152e-06}


Iteration 32/100: 100%|██████████| 4/4 [00:00<00:00, 41.62it/s]


Iteration 32, Loss: {'ner': 2.085211613812572e-05}


Iteration 33/100: 100%|██████████| 4/4 [00:00<00:00, 40.72it/s]


Iteration 33, Loss: {'ner': 2.221716115199507e-07}


Iteration 34/100: 100%|██████████| 4/4 [00:00<00:00, 36.62it/s]


Iteration 34, Loss: {'ner': 2.508235898030787e-07}


Iteration 35/100: 100%|██████████| 4/4 [00:00<00:00, 40.39it/s]


Iteration 35, Loss: {'ner': 1.5591654112404618}


Iteration 36/100: 100%|██████████| 4/4 [00:00<00:00, 43.24it/s]


Iteration 36, Loss: {'ner': 1.8612345704029057e-05}


Iteration 37/100: 100%|██████████| 4/4 [00:00<00:00, 42.02it/s]


Iteration 37, Loss: {'ner': 3.923749965041607e-07}


Iteration 38/100: 100%|██████████| 4/4 [00:00<00:00, 41.52it/s]


Iteration 38, Loss: {'ner': 2.0663388448142484e-06}


Iteration 39/100: 100%|██████████| 4/4 [00:00<00:00, 41.92it/s]


Iteration 39, Loss: {'ner': 4.9679320148714325e-05}


Iteration 40/100: 100%|██████████| 4/4 [00:00<00:00, 42.49it/s]


Iteration 40, Loss: {'ner': 1.4664981966540133e-06}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 41/100: 100%|██████████| 4/4 [00:00<00:00, 42.29it/s]


Iteration 41, Loss: {'ner': 1.9566885532318757e-05}


Iteration 42/100: 100%|██████████| 4/4 [00:00<00:00, 27.51it/s]


Iteration 42, Loss: {'ner': 1.1282280126460528e-06}


Iteration 43/100: 100%|██████████| 4/4 [00:00<00:00, 39.52it/s]


Iteration 43, Loss: {'ner': 3.082651176938482e-07}


Iteration 44/100: 100%|██████████| 4/4 [00:00<00:00, 40.66it/s]


Iteration 44, Loss: {'ner': 2.425790385100356e-06}


Iteration 45/100: 100%|██████████| 4/4 [00:00<00:00, 36.16it/s]


Iteration 45, Loss: {'ner': 0.003297448297145458}


Iteration 46/100: 100%|██████████| 4/4 [00:00<00:00, 39.58it/s]


Iteration 46, Loss: {'ner': 7.331385501153218e-08}


Iteration 47/100: 100%|██████████| 4/4 [00:00<00:00, 38.17it/s]


Iteration 47, Loss: {'ner': 1.7014726208445288}


Iteration 48/100: 100%|██████████| 4/4 [00:00<00:00, 33.44it/s]


Iteration 48, Loss: {'ner': 9.217106933444343e-06}


Iteration 49/100: 100%|██████████| 4/4 [00:00<00:00, 38.37it/s]


Iteration 49, Loss: {'ner': 5.271356401055653e-05}


Iteration 50/100: 100%|██████████| 4/4 [00:00<00:00, 41.16it/s]


Iteration 50, Loss: {'ner': 1.5662262269215863e-05}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 51/100: 100%|██████████| 4/4 [00:00<00:00, 39.50it/s]


Iteration 51, Loss: {'ner': 0.00486954188799323}


Iteration 52/100: 100%|██████████| 4/4 [00:00<00:00, 40.50it/s]


Iteration 52, Loss: {'ner': 7.537399324816329e-07}


Iteration 53/100: 100%|██████████| 4/4 [00:00<00:00, 38.89it/s]


Iteration 53, Loss: {'ner': 1.3768870628369002e-05}


Iteration 54/100: 100%|██████████| 4/4 [00:00<00:00, 42.08it/s]


Iteration 54, Loss: {'ner': 9.851442763489462e-07}


Iteration 55/100: 100%|██████████| 4/4 [00:00<00:00, 42.13it/s]


Iteration 55, Loss: {'ner': 3.0052778724448702e-06}


Iteration 56/100: 100%|██████████| 4/4 [00:00<00:00, 41.93it/s]


Iteration 56, Loss: {'ner': 1.1268198228663516e-07}


Iteration 57/100: 100%|██████████| 4/4 [00:00<00:00, 40.58it/s]


Iteration 57, Loss: {'ner': 7.456936804290774e-07}


Iteration 58/100: 100%|██████████| 4/4 [00:00<00:00, 38.37it/s]


Iteration 58, Loss: {'ner': 0.0002918072614800037}


Iteration 59/100: 100%|██████████| 4/4 [00:00<00:00, 40.81it/s]


Iteration 59, Loss: {'ner': 8.980154154052536e-07}


Iteration 60/100: 100%|██████████| 4/4 [00:00<00:00, 41.53it/s]


Iteration 60, Loss: {'ner': 2.4238621226289707e-07}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 61/100: 100%|██████████| 4/4 [00:00<00:00, 41.86it/s]


Iteration 61, Loss: {'ner': 8.150724854618102e-09}


Iteration 62/100: 100%|██████████| 4/4 [00:00<00:00, 42.55it/s]


Iteration 62, Loss: {'ner': 1.574144971397713e-05}


Iteration 63/100: 100%|██████████| 4/4 [00:00<00:00, 40.60it/s]


Iteration 63, Loss: {'ner': 1.4994097540680827e-08}


Iteration 64/100: 100%|██████████| 4/4 [00:00<00:00, 40.53it/s]


Iteration 64, Loss: {'ner': 6.398887850673056e-08}


Iteration 65/100: 100%|██████████| 4/4 [00:00<00:00, 40.97it/s]


Iteration 65, Loss: {'ner': 1.9447547795250833e-08}


Iteration 66/100: 100%|██████████| 4/4 [00:00<00:00, 33.14it/s]


Iteration 66, Loss: {'ner': 9.117059992440974e-08}


Iteration 67/100: 100%|██████████| 4/4 [00:00<00:00, 40.84it/s]


Iteration 67, Loss: {'ner': 2.995755247733691e-06}


Iteration 68/100: 100%|██████████| 4/4 [00:00<00:00, 36.01it/s]


Iteration 68, Loss: {'ner': 7.123455972071525e-08}


Iteration 69/100: 100%|██████████| 4/4 [00:00<00:00, 39.48it/s]


Iteration 69, Loss: {'ner': 3.318128760148255e-08}


Iteration 70/100: 100%|██████████| 4/4 [00:00<00:00, 37.07it/s]


Iteration 70, Loss: {'ner': 9.14383657421906e-09}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 71/100: 100%|██████████| 4/4 [00:00<00:00, 39.68it/s]


Iteration 71, Loss: {'ner': 2.685504641841341e-08}


Iteration 72/100: 100%|██████████| 4/4 [00:00<00:00, 37.74it/s]


Iteration 72, Loss: {'ner': 5.488747269828778e-08}


Iteration 73/100: 100%|██████████| 4/4 [00:00<00:00, 33.00it/s]


Iteration 73, Loss: {'ner': 9.720494245660729e-08}


Iteration 74/100: 100%|██████████| 4/4 [00:00<00:00, 34.30it/s]


Iteration 74, Loss: {'ner': 1.3738871933129694e-08}


Iteration 75/100: 100%|██████████| 4/4 [00:00<00:00, 38.16it/s]


Iteration 75, Loss: {'ner': 9.970937865275277e-07}


Iteration 76/100: 100%|██████████| 4/4 [00:00<00:00, 32.28it/s]


Iteration 76, Loss: {'ner': 3.039561636294616e-07}


Iteration 77/100: 100%|██████████| 4/4 [00:00<00:00, 40.76it/s]


Iteration 77, Loss: {'ner': 7.156685395598836e-08}


Iteration 78/100: 100%|██████████| 4/4 [00:00<00:00, 29.84it/s]


Iteration 78, Loss: {'ner': 3.710965311109102e-08}


Iteration 79/100: 100%|██████████| 4/4 [00:00<00:00, 30.03it/s]


Iteration 79, Loss: {'ner': 5.249314686289598e-08}


Iteration 80/100: 100%|██████████| 4/4 [00:00<00:00, 26.22it/s]


Iteration 80, Loss: {'ner': 2.43235874793641e-08}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 81/100: 100%|██████████| 4/4 [00:00<00:00, 38.39it/s]


Iteration 81, Loss: {'ner': 1.3822728101425112e-08}


Iteration 82/100: 100%|██████████| 4/4 [00:00<00:00, 39.54it/s]


Iteration 82, Loss: {'ner': 1.7231255864714855e-08}


Iteration 83/100: 100%|██████████| 4/4 [00:00<00:00, 39.62it/s]


Iteration 83, Loss: {'ner': 1.5730599097331557}


Iteration 84/100: 100%|██████████| 4/4 [00:00<00:00, 40.26it/s]


Iteration 84, Loss: {'ner': 4.968822377347693e-06}


Iteration 85/100: 100%|██████████| 4/4 [00:00<00:00, 34.40it/s]


Iteration 85, Loss: {'ner': 3.4038285922804974e-06}


Iteration 86/100: 100%|██████████| 4/4 [00:00<00:00, 33.27it/s]


Iteration 86, Loss: {'ner': 0.0002621224625647358}


Iteration 87/100: 100%|██████████| 4/4 [00:00<00:00, 37.41it/s]


Iteration 87, Loss: {'ner': 4.07507991582964e-07}


Iteration 88/100: 100%|██████████| 4/4 [00:00<00:00, 38.41it/s]


Iteration 88, Loss: {'ner': 2.868286717523458e-06}


Iteration 89/100: 100%|██████████| 4/4 [00:00<00:00, 40.62it/s]


Iteration 89, Loss: {'ner': 1.9965308849386363e-06}


Iteration 90/100: 100%|██████████| 4/4 [00:00<00:00, 38.30it/s]


Iteration 90, Loss: {'ner': 2.5083319518078015e-07}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00


Iteration 91/100: 100%|██████████| 4/4 [00:00<00:00, 39.33it/s]


Iteration 91, Loss: {'ner': 2.96316852819028e-05}


Iteration 92/100: 100%|██████████| 4/4 [00:00<00:00, 36.84it/s]


Iteration 92, Loss: {'ner': 1.0986265277595719e-07}


Iteration 93/100: 100%|██████████| 4/4 [00:00<00:00, 34.83it/s]


Iteration 93, Loss: {'ner': 1.1303924439672357e-07}


Iteration 94/100: 100%|██████████| 4/4 [00:00<00:00, 40.20it/s]


Iteration 94, Loss: {'ner': 3.232201214797269e-08}


Iteration 95/100: 100%|██████████| 4/4 [00:00<00:00, 37.69it/s]


Iteration 95, Loss: {'ner': 3.706531237679611e-05}


Iteration 96/100: 100%|██████████| 4/4 [00:00<00:00, 40.35it/s]


Iteration 96, Loss: {'ner': 1.868606358685243e-05}


Iteration 97/100: 100%|██████████| 4/4 [00:00<00:00, 37.35it/s]


Iteration 97, Loss: {'ner': 6.860757960786105e-10}


Iteration 98/100: 100%|██████████| 4/4 [00:00<00:00, 41.16it/s]


Iteration 98, Loss: {'ner': 0.00018222966133523062}


Iteration 99/100: 100%|██████████| 4/4 [00:00<00:00, 41.46it/s]


Iteration 99, Loss: {'ner': 8.852585313493656e-08}


Iteration 100/100: 100%|██████████| 4/4 [00:00<00:00, 41.81it/s]

Iteration 100, Loss: {'ner': 4.666899653841235e-07}
Evaluation - Precision: 0.00, Recall: 0.00, F-score: 0.00
Model saved to healthcare_ner_model

Testing NER model...
--------------------------------------------------

Input: Tell me about Dental EMR.
Parsed: Tell me about [Dental:PRODUCT] [EMR:PRODUCT] [.:PRODUCT]
Entities: [('Dental EMR.', 'PRODUCT')]
--------------------------------------------------

Input: Do you provide Telemedicine Solutions?
Parsed: Do you provide Telemedicine Solutions ?
Entities: []
--------------------------------------------------

Input: I need help with Revenue Cycle Management.
Parsed: I need help with Revenue [Cycle:PRODUCT] [Management:PRODUCT] [.:PRODUCT]
Entities: [('Cycle Management.', 'PRODUCT')]
--------------------------------------------------

Input: What are the features of Practice Management Software?
Parsed: What are the features of [Practice:PRODUCT] [Management:PRODUCT] [Software:PRODUCT] ?
Entities: [('Practice Management Software', 'PR




### **The training and all the tests are for sample, not completed that is why all the entities not show up in the output file**

## **4️. Integrate NER into the Sales Chatbot**
Once you extract the product/service from a customer query, you can:

Recommend relevant features

Route to the correct sales team

Answer FAQs

Example flow:

Customer: "Tell me about Dental EMR."

NER detects "Dental EMR" → PRODUCT

Chatbot fetches product details and responds.

## **5. SUGGESTIONS**

1. Intent Recognition & Classification
   Instead of just identifying keywords, classify the user's intent (e.g., Inquiry, Purchase Request, Pricing, Comparison, Support). You can use:

   Transformer models (BERT, DistilBERT) for intent classification.
   Rule-based + ML hybrid approach (e.g., regex for FAQs + ML for complex queries).

2. Contextual Memory & Conversation Flow
   Make the chatbot remember previous interactions within a session to provide personalized responses.

   Short-term memory (e.g., remembering the user is asking about "Dental EMR" throughout the chat).
   Long-term memory (e.g., if they return later, recall past interactions using a database).

3. Recommendation System
   If a customer inquires about a product, suggest related products dynamically.

   Use collaborative filtering (like Amazon’s recommendations).
   Suggest based on customer history or preferences.

4. Dynamic FAQ Search Using Vector Search (Semantic Search)
   Instead of relying on predefined keyword matches, let the chatbot understand semantic meaning using:

   FAISS / Pinecone for vector-based retrieval.
   BERT-based embeddings to match similar queries intelligently.
   Example: A customer asks, “What are the features of your EMR?” → Instead of static responses, fetch the most relevant FAQ dynamically.

5. Sentiment Analysis & Tone Adjustment
   Detect if the customer is frustrated, happy, or confused and adjust the chatbot’s response tone accordingly.

   Upset users? Offer faster escalation to human support.
   Happy users? Recommend related products or upsell features.

6. Multimodal Chatbot (Voice + Text + Images)
   Instead of just text-based responses:

   Use speech-to-text (Whisper, Vosk, Google Speech API) for voice input.
   Show images/videos of products when customers ask.
   If they upload a document or screenshot, analyze it with OCR (Tesseract, EasyOCR).