# 🩺 Predictive Healthcare  
### *AI-based Disease Prediction System*  

---

**Course:** Bachelor of Computer Applications (BCA)  
**College:** School of Computer Science and IT  
**Submitted By:**  
- Mohit Dhanotiya  
- Kuldeep Verma  
- Khushi Kulshrestha  

**Guide:** Ms. Tarjani Sevak  
**Semester:** Final Year (Phase 1 Project)  

---

### 📘 Overview  
This project aims to predict possible diseases based on user-provided symptoms using a machine learning model.  
It leverages **MultiLabelBinarizer** for feature encoding and **Multinomial Naive Bayes** for classification.  
The system is integrated into a **Streamlit-based web interface** for easy use by end users.


In [2]:
# ---------------------------------------------------
# Step 1: Import required libraries
# ---------------------------------------------------
import pandas as pd
import joblib

In [3]:
# ---------------------------------------------------
# Step 2: Load Dataset
# ---------------------------------------------------
df = pd.read_csv("../data/diseases.csv")

print("Dataset Loaded Successfully!")
print(f"Total Records: {len(df)}\n")
print("Preview of Dataset:")
display(df.head())


Dataset Loaded Successfully!
Total Records: 100

Preview of Dataset:


Unnamed: 0,Disease,Symptoms
0,Common Cold,"runny nose, sore throat"
1,Common Cold,"sneezing, mild fever"
2,Common Cold,"runny nose, fatigue"
3,Common Cold,"sore throat, cough"
4,Common Cold,"fatigue, sneezing"


In [4]:
# ---------------------------------------------------
# Step 3: Load Trained Model and Encoder
# ---------------------------------------------------
model = joblib.load("../models/disease_model.joblib")
mlb = joblib.load("../models/symptom_encoder.joblib")

print("\nModel and Encoder Loaded Successfully!")



Model and Encoder Loaded Successfully!


In [5]:
# ---------------------------------------------------
# Step 4: Display Encoder Information
# ---------------------------------------------------
symptoms_list = mlb.classes_  # .tolist() not required unless needed as Python list
print(f"\nTotal Symptoms Used for Training: {len(symptoms_list)}")
print("Few Example Symptoms:", symptoms_list[:10])



Total Symptoms Used for Training: 38
Few Example Symptoms: ['abdominal pain' 'blurred vision' 'body ache' 'body pain'
 'chest discomfort' 'chest pain' 'chest tightness' 'chills' 'constipation'
 'cough']


In [6]:
# ---------------------------------------------------
# Step 5: Example Prediction
# ---------------------------------------------------
# Suppose the user has symptoms: fever, cough
user_symptoms = [["fever", "cough"]]

# Convert to binary vector using the same encoder
input_data = mlb.transform(user_symptoms)

# Predict disease
prediction = model.predict(input_data)[0]

print(f"\nExample Prediction for Symptoms {user_symptoms[0]}: {prediction}")



Example Prediction for Symptoms ['fever', 'cough']: Influenza (Flu)


# ---------------------------------------------------
# Step 6: Model Comparison and Selection
# ---------------------------------------------------

### Model Comparison and Selection
During experimentation, two validation methods were tested:

1. **Train-Test Split Model (Multinomial Naive Bayes)**  
   - Accuracy: **≈ 95%**  
   - Simpler and faster for the small dataset  
   - More stable predictions on test data  

2. **Cross-Validation Model (K-Fold with Multinomial NB)**  
   - Average Accuracy: **≈ 88–90%**  
   - Slightly lower and more variable  

**Final Choice:** The Train-Test Split version was selected as it provided better and more consistent accuracy for Phase 1.


# ---------------------------------------------------
# Step 7: Summary Notes
# ---------------------------------------------------

### 📋 Final Summary
- Dataset: 10 Diseases with related symptom patterns  
- Feature Encoding: MultiLabelBinarizer (symptom → binary vector)  
- Model: Multinomial Naive Bayes (Train-Test Split)  
- Model Accuracy: ~95%  
- Saved as: `models/disease_predictor.joblib`  
- Encoder: `models/symptom_encoder.joblib`  
- Integrated into: Streamlit UI (`frontend/streamlit_app.py`)  

This notebook verifies that the trained model and encoder load correctly and produce accurate predictions.
