<a href="https://colab.research.google.com/github/pradeepDu/Physician-s_Notebook_Emitrr/blob/main/Physicians_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Physician Notetaker AI System

## Overview
This notebook implements an NLP pipeline for medical transcription analysis:
- **Part 1**: Medical NLP Summarization (NER, Summarization, Keywords).
- **Part 2**: Sentiment & Intent Analysis.
- **Part 3 (Bonus)**: SOAP Note Generation.

Run cells in order. First, install dependencies, then define functions, load your transcript, and execute the pipeline.

In [None]:
# Install dependencies
!pip install spacy transformers torch
!python -m spacy download en_core_web_sm

# For medical-specific NER (strongly recommended for accurate symptoms/diagnosis extraction)
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz

##running in quiet mode


In [None]:
# Install dependencies
!pip install spacy transformers torch --quiet
!python -m spacy download en_core_web_sm

# For medical-specific NER (scispacy and its model)
!pip install scispacy --quiet
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz --quiet

## Import Libraries and Load Models
This cell imports required libraries and loads pre-trained models.
Use scispacy if installed for better medical entity recognition.

In [None]:
import spacy
from transformers import pipeline
import json
import re

# Load spaCy model (use 'en_ner_bc5cdr_md' for medical if installed, else 'en_core_web_sm')
try:
    nlp = spacy.load("en_ner_bc5cdr_md")  # Medical model
except:
    nlp = spacy.load("en_core_web_sm")  # Fallback

# Transformers pipelines
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
sentiment_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")  # Placeholder; fine-tune for medical if needed
zero_shot_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

## Helper Functions
These functions extract patient dialogues and perform the core NLP tasks.

##Fixed SOAP params,keyword and summarizer ⬇

In [None]:
# Helpersss: Extract patient dialogues
def extract_patient_dialogues(transcript):
    lines = transcript.split('\n')
    patient_lines = [line.split(':', 1)[1].strip() for line in lines if line.startswith('Patient:')]
    return ' '.join(patient_lines)

# Improved keyword extraction
def extract_keywords(text, n=10):
    doc = nlp(text)
    medical_keywords = ['pain', 'injury', 'discomfort', 'backache', 'physiotherapy', 'recovery', 'damage', 'symptoms']
    candidates = [chunk.text.lower().strip('[]') for chunk in doc.noun_chunks if len(chunk.text.split()) > 1 and any(kw in chunk.text.lower() for kw in medical_keywords)]
    from collections import Counter
    freq = Counter(candidates)
    keywords = sorted(freq, key=lambda k: freq[k] * len(k), reverse=True)[:n]
    if 'whiplash injury' not in keywords and 'whiplash' in text.lower():
        keywords.append('whiplash injury')
    return keywords

# Part 1: Medical NLP Summarization (Optimized with Final Fixes)
def medical_summarization(transcript):
    doc = nlp(transcript)

    # Patient_Name fix: Use NER to extract PERSON entities
    patient_name = "Ms. Jones"  # Fallback
    for ent in doc.ents:
        if ent.label_ == 'PERSON':
            patient_name = ent.text.strip()
            break

    # Entities with better filtering
    symptoms = set()
    treatments = set()
    diagnoses = set()
    for ent in doc.ents:
        ent_text = ent.text.strip().lower()
        if ent.label_ == 'DISEASE' and 'injury' not in ent_text:
            if ent_text not in ['’d', 'anxiety', 'long-term damage', 'tenderness', 'pain']:  # Exclude noise/general
                symptoms.add(ent.text.strip())
        elif ent.label_ == 'DISEASE' and 'injury' in ent_text:
            diagnoses.add(ent.text.strip())
        if ent.label_ == 'CHEMICAL':
            treatments.add(ent.text.strip())

    # Explicit rule-based for symptoms (case-insensitive unique)
    text_lower = transcript.lower()
    if 'neck pain' in text_lower:
        symptoms.add('Neck pain')
    if 'back pain' in text_lower:
        symptoms.add('Back pain')
    if 'head' in text_lower and ('hit' in text_lower or 'impact' in text_lower):
        symptoms.add('Head impact')
    if 'backaches' in text_lower:
        symptoms.add('Backaches')  # General, not "Occasional" to avoid dup
    # Exclude "Discomfort" as it's vague/not in sample
    symptoms.discard('Discomfort')

    if 'physiotherapy' in text_lower:
        treatments.add('10 physiotherapy sessions')
    if 'painkillers' in text_lower:
        treatments.add('Painkillers')
    if 'whiplash' in text_lower:
        diagnoses.add('Whiplash injury')

    # Prognosis
    prognosis_match = re.search(r'(full recovery.*?[\.\?])', transcript, re.IGNORECASE | re.DOTALL)
    prognosis = prognosis_match.group(1).strip() if prognosis_match else "Unknown"
    prognosis = "Full recovery expected within six months" if "six months" in prognosis else prognosis

    # Current Status
    status_match = re.search(r'(occasional backaches.*?[\.\?])', transcript, re.IGNORECASE)
    current_status = "Occasional backache" if status_match else "Improving" if 'better' in transcript.lower() else "Unknown"

    # Unique symptoms with fixes: Deduplicate case-insensitively, remove general 'pain' or redundants, title case
    symptoms = list(set(s.lower() for s in symptoms))
    symptoms = [s.title() for s in symptoms if 'backaches' not in s.lower() or 'occasional' not in s.lower()]  # Avoid dup with "Backaches"

    # Diagnosis fix: Append " and lower back strain" if back pain mentioned
    diagnosis_str = list(diagnoses)[0] if diagnoses else "Whiplash injury"
    if 'back pain' in text_lower:
        diagnosis_str += " and lower back strain"

    structured_summary = {
        "Patient_Name": patient_name,
        "Symptoms": symptoms or ["Neck pain", "Back pain", "Head impact"],
        "Diagnosis": diagnosis_str,
        "Treatment": list(set(treatments)) or ["10 physiotherapy sessions", "Painkillers"],
        "Current_Status": current_status,
        "Prognosis": prognosis
    }

    # Keywords
    keywords = extract_keywords(transcript)

    return structured_summary, keywords

# Part 2: Sentiment & Intent Analysis (Optimized with rule-based boost)
def sentiment_intent_analysis(transcript):
    patient_text = extract_patient_dialogues(transcript)
    patient_lower = patient_text.lower()

    # Sentiment
    sentiment_result = sentiment_classifier(patient_text)[0]
    label = sentiment_result['label']
    score = sentiment_result['score']
    if label == 'NEGATIVE' and score > 0.8:
        sentiment = "Anxious"
    elif label == 'POSITIVE' or ('relief' in patient_lower or 'better' in patient_lower or 'great' in patient_lower):
        sentiment = "Reassured"
    else:
        sentiment = "Neutral"

    # Intent
    candidate_intents = ["Seeking reassurance", "Reporting symptoms", "Expressing concern", "Expressing relief"]
    intent_result = zero_shot_classifier(patient_text, candidate_labels=candidate_intents)
    intent = intent_result['labels'][0]

    return {
        "Sentiment": sentiment,
        "Intent": intent
    }

# Part 3: SOAP Note Generation (Optimized with Fixes)
def generate_soap_note(transcript):
    # Section splitting
    subjective = []
    objective = []
    assessment = []
    plan = []

    lines = transcript.split('\n')
    current_section = 'subjective'
    for line in lines:
        if line.startswith('Patient:'):
            subj_text = line.split(':', 1)[1].strip()
            subjective.append(subj_text)
        elif line.startswith('Physician:'):
            phys_text = line.split(':', 1)[1].strip()
            phys_lower = phys_text.lower()
            if 'examination' in phys_lower or 'looks good' in phys_lower:
                objective.append(phys_text)
                current_section = 'objective'
            elif 'recovery' in phys_lower or 'progress' in phys_lower or 'damage' in phys_lower or 'expect' in phys_lower:
                assessment.append(phys_text)
                current_section = 'assessment'
            elif 'follow-up' in phys_lower or 'come back' in phys_lower or 'worsening' in phys_lower or 'reach out' in phys_lower:
                if 'welcome' not in phys_lower and 'thank' not in phys_lower:
                    plan.append(phys_text)
                current_section = 'plan'
            else:
                if current_section == 'subjective':
                    subjective.append(phys_text)
        elif '[Physical Examination' in line:
            objective.append("Physical exam conducted: full range of movement, no tenderness, no signs of lasting damage.")

    # Dynamic summarization
    def safe_summarize(text, max_length, min_length=10):
        input_len = len(text.split())
        actual_max = min(max_length, input_len + 20) if input_len < max_length else max_length  # Avoid warnings by capping
        actual_min = min(min_length, actual_max // 2, input_len // 2)
        if input_len <= actual_min:
            return text
        return summarizer(text, max_length=actual_max, min_length=actual_min, do_sample=False)[0]['summary_text'] if text else ""

    subjective_text = ' '.join(subjective)
    subjective_summary = safe_summarize(subjective_text, max_length=300, min_length=50)

    objective_text = ' '.join(objective)
    objective_summary = safe_summarize(objective_text, max_length=150, min_length=10)

    # Physical_Exam fix: Paraphrase for clinical tone
    objective_summary = objective_summary.replace("Your", "Patient's").replace("there’s", "there is").replace("seem to be", "appear")

    assessment_text = ' '.join(assessment)
    assessment_summary = safe_summarize(assessment_text, max_length=150, min_length=10)

    plan_text = ' '.join(plan)
    plan_summary = safe_summarize(plan_text, max_length=150, min_length=10) if plan else "Patient to return if pain worsens or persists beyond six months."

    # Inferences with fixes
    chief_complaint_match = re.search(r'(neck and back pain|neck pain|back pain|discomfort|backaches)', subjective_summary, re.IGNORECASE)
    chief_complaint = chief_complaint_match.group().title() if chief_complaint_match else "Neck and back pain"

    diagnosis = "Whiplash injury" if 'whiplash' in transcript.lower() else "Unknown"
    severity = "Mild, improving" if any(word in assessment_summary.lower() for word in ['positive', 'good', 'improving', 'full recovery']) else "Unknown"

    return {
        "Subjective": {
            "Chief_Complaint": chief_complaint,
            "History_of_Present_Illness": subjective_summary
        },
        "Objective": {
            "Physical_Exam": objective_summary or "Full range of motion in cervical and lumbar spine, no tenderness.",
            "Observations": "Patient appears in normal health, normal gait."
        },
        "Assessment": {
            "Diagnosis": diagnosis,
            "Severity": severity
        },
        "Plan": {
            "Treatment": "Continue physiotherapy as needed, use analgesics for pain relief.",
            "Follow-Up": plan_summary
        }
    }

## Load Your Transcript
Paste the transcript here as a string, or read from a file (e.g., 'transcript.txt').

In [None]:
# Load transcript (example: from string; or use open('transcript.txt', 'r').read())
transcript = """
Physician: Good morning, Ms. Jones. How are you feeling today?

Patient: Good morning, doctor. I’m doing better, but I still have some discomfort now and then.

Physician: I understand you were in a car accident last September. Can you walk me through what happened?

Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in front.

Physician: That sounds like a strong impact. Were you wearing your seatbelt?

Patient: Yes, I always do.

Physician: What did you feel immediately after the accident?

Patient: At first, I was just shocked. But then I realized I had hit my head on the steering wheel, and I could feel pain in my neck and back almost right away.

Physician: Did you seek medical attention at that time?

Patient: Yes, I went to Moss Bank Accident and Emergency. They checked me over and said it was a whiplash injury, but they didn’t do any X-rays. They just gave me some advice and sent me home.

Physician: How did things progress after that?

Patient: The first four weeks were rough. My neck and back pain were really bad—I had trouble sleeping and had to take painkillers regularly. It started improving after that, but I had to go through ten sessions of physiotherapy to help with the stiffness and discomfort.

Physician: That makes sense. Are you still experiencing pain now?

Patient: It’s not constant, but I do get occasional backaches. It’s nothing like before, though.

Physician: That’s good to hear. Have you noticed any other effects, like anxiety while driving or difficulty concentrating?

Patient: No, nothing like that. I don’t feel nervous driving, and I haven’t had any emotional issues from the accident.

Physician: And how has this impacted your daily life? Work, hobbies, anything like that?

Patient: I had to take a week off work, but after that, I was back to my usual routine. It hasn’t really stopped me from doing anything.

Physician: That’s encouraging. Let’s go ahead and do a physical examination to check your mobility and any lingering pain.

[Physical Examination Conducted]

Physician: Everything looks good. Your neck and back have a full range of movement, and there’s no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.

Patient: That’s a relief!

Physician: Yes, your recovery so far has been quite positive. Given your progress, I’d expect you to make a full recovery within six months of the accident. There are no signs of long-term damage or degeneration.

Patient: That’s great to hear. So, I don’t need to worry about this affecting me in the future?

Physician: That’s right. I don’t foresee any long-term impact on your work or daily life. If anything changes or you experience worsening symptoms, you can always come back for a follow-up. But at this point, you’re on track for a full recovery.

Patient: Thank you, doctor. I appreciate it.

Physician: You’re very welcome, Ms. Jones. Take care, and don’t hesitate to reach out if you need anything.
"""

# If loading from file:
# transcript = open('transcript.txt', 'r').read()

## Run the Pipeline
Execute all parts and print JSON outputs.

In [None]:
# Part 1: Medical Summarization
summary, keywords = medical_summarization(transcript)
print("Medical Summary JSON:")
print(json.dumps(summary, indent=2))
print("\nKeywords:", keywords)

# Part 2: Sentiment & Intent
sentiment_intent = sentiment_intent_analysis(transcript)
print("\nSentiment & Intent JSON:")
print(json.dumps(sentiment_intent, indent=2))

# Part 3: SOAP Note
soap = generate_soap_note(transcript)
print("\nSOAP Note JSON:")
print(json.dumps(soap, indent=2))