# First Doctor ‚Äî Workflow Testing Notebook

This notebook tests the unified `DiseaseDetector` pipeline from `disease_engine.py`.

**4-Phase Pipeline:**
1. **Phase 1** ‚Äî spaCy symptom extraction from natural language
2. **Phase 2** ‚Äî PyTorch model prediction (top-3 diseases)
3. **Phase 3** ‚Äî Dataset symptom lookup + test bundle mapping
4. **Phase 4** ‚Äî PDF prescription generation

## 0. Prerequisites

Make sure dependencies are installed:
```bash
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

And ensure `model/database/dataset.csv` is the real CSV (not a Git LFS pointer):
```bash
git lfs pull --include="model/database/dataset.csv"
```

In [None]:
# Install dependencies if needed (uncomment to run)
# !pip install -r requirements.txt
# !python -m spacy download en_core_web_sm

---
## 1. Initialise the DiseaseDetector

This loads the dataset, fits the `LabelEncoder`, loads the PyTorch model,
and prepares the spaCy pipeline.

In [None]:
from disease_engine import DiseaseDetector

detector = DiseaseDetector()

print(f"‚úÖ Model loaded on device: {detector.device}")
print(f"   Symptom columns:  {len(detector.symptom_names)}")
print(f"   Disease classes:  {len(detector.label_encoder.classes_)}")
print(f"   Dataset rows:     {len(detector.df)}")
print(f"   Test bundles:     {list(detector.test_bundles.keys())}")

---
## 2. Phase 1 ‚Äî Symptom Extraction

Test the spaCy EntityRuler + substring scan on a natural language input.

In [None]:
test_input = "I have a high fever and a persistent cough"

symptoms = detector.extract_symptoms(test_input)

print(f"Input:  '{test_input}'")
print(f"Extracted symptoms ({len(symptoms)}):")
for s in symptoms:
    print(f"  ‚Ä¢ {s}")

---
## 3. Phase 2 ‚Äî Disease Prediction

Pass the extracted symptoms through the PyTorch `SymptomClassifier` to get
the top-3 probable diseases.

In [None]:
predictions = detector.predict_diseases(symptoms, top_k=3)

print("Top-3 Predicted Diseases:")
print("-" * 40)
for i, pred in enumerate(predictions, 1):
    print(f"  #{i}  {pred['disease']:30s}  {pred['probability']*100:6.2f}%")

---
## 4. Phase 3a ‚Äî Disease Symptom Lookup

For each predicted disease, look up which symptoms are associated with it
in the training dataset.

In [None]:
for pred in predictions:
    disease = pred["disease"]
    known = detector.get_disease_symptoms(disease)
    print(f"\nüî¨ {disease}  ({len(known)} associated symptoms):")
    if known:
        # Show first 10 for readability
        for s in known[:10]:
            print(f"   ‚Ä¢ {s}")
        if len(known) > 10:
            print(f"   ... and {len(known) - 10} more")
    else:
        print("   (no matching rows in dataset)")

---
## 5. Phase 3b ‚Äî Recommended Tests (Fuzzy Matching)

Fuzzy-match each predicted disease against `test_bundles.json` to find
recommended lab tests and escalation rules.

In [None]:
for pred in predictions:
    disease = pred["disease"]
    rec = detector.get_recommended_tests(disease)
    print(f"\nüß™ {disease}:")
    print(f"   Matched:    {rec['matched_disease']}  (confidence: {rec['confidence']}%)")
    print(f"   Tests:      {rec['tests']}")
    if rec.get("escalation"):
        esc = rec["escalation"]
        print(f"   ‚ö† Escalation: {esc['priority']} ‚Äî {esc['notes']}")
    else:
        print(f"   Escalation: None")

---
## 6. Full Pipeline ‚Äî `run()`

Execute all phases in one call and display the enriched result.

In [None]:
import json

result = detector.run("I have a high fever and a persistent cough")

# Pretty-print the full result
print(json.dumps(result, indent=2, default=str))

---
## 7. Phase 4 ‚Äî PDF Prescription Generation

Generate a PDF prescription and save it to the `output/` folder.

In [None]:
pdf_path = detector.generate_pdf(result, patient_name="Test Patient")

print(f"‚úÖ PDF generated: {pdf_path}")
print(f"   File size: {pdf_path.stat().st_size:,} bytes")

### Verify the PDF exists in `output/`

In [None]:
import os

output_dir = os.path.join(os.path.dirname(os.path.abspath("disease_engine.py")), "output")
pdf_files = [f for f in os.listdir(output_dir) if f.endswith(".pdf")]

print(f"üìÅ Files in output/ directory ({len(pdf_files)} PDFs):")
for f in sorted(pdf_files):
    full = os.path.join(output_dir, f)
    size = os.path.getsize(full)
    print(f"   {f}  ({size:,} bytes)")

assert len(pdf_files) >= 1, "‚ùå No PDF files found in output/!"
print("\n‚úÖ PDF generation verified successfully!")

---
## 8. Additional Test Cases

Try different symptom descriptions to validate the pipeline.

In [None]:
test_cases = [
    "I feel chest pain and shortness of breath",
    "I have been experiencing headache and dizziness for a week",
    "My skin is itching and I have a rash",
]

for text in test_cases:
    print("=" * 60)
    print(f"Input: '{text}'")
    r = detector.run(text)
    print(f"  Symptoms: {r['extracted_symptoms']}")
    for p in r["predictions"]:
        tests = p["recommended_tests"].get("tests", [])
        esc = p["recommended_tests"].get("escalation")
        esc_str = f" ‚ö† {esc['priority']}" if esc else ""
        print(f"  ‚Üí {p['disease']} ({p['probability']*100:.1f}%){esc_str}")
        if tests:
            print(f"    Tests: {', '.join(tests)}")
    print()

---

**DISCLAIMER:** This is a student project for educational purposes only.
Results are NOT medical diagnoses. Always consult a certified medical professional.