# 🏥 Mini Capstone Project – Medical Data Processing  
## 📌 Overview  
Welcome to the **Mini Capstone Project!** This project challenges you to apply **Python, Object-Oriented Programming (OOP), NumPy, and Data Structures** to solve a real-world medical problem.  

### 🎯 Objectives  
- Implement **OOP concepts** for structured code.  
- Utilize **NumPy** for efficient medical data processing.  
- Manage patient data using **data structures** like lists and dictionaries.  
- Work collaboratively in **Kaggle’s shared workspace**.  

In [50]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/disease-symptom-description-dataset/symptom_Description.csv
/kaggle/input/disease-symptom-description-dataset/Symptom-severity.csv
/kaggle/input/disease-symptom-description-dataset/symptom_precaution.csv
/kaggle/input/disease-symptom-description-dataset/dataset.csv


In [51]:
import pandas as pd
import os
from collections import deque


In [52]:
INPUT_DIR = '/kaggle/input/disease-symptom-description-dataset'
FILE_NAME = 'dataset.csv'
DATA_PATH = os.path.join(INPUT_DIR, FILE_NAME)

# Verify available files
print("Available files in dataset:")
for dirname, _, filenames in os.walk(INPUT_DIR):
    for filename in filenames:
        print(f" - {filename}")

Available files in dataset:
 - symptom_Description.csv
 - Symptom-severity.csv
 - symptom_precaution.csv
 - dataset.csv


In [53]:
class SymptomChecker:
    def __init__(self):
        self.medical_data = self._load_data()
        self.conditions = self._prepare_data()
    
    def _load_data(self):
        """Load and validate the clinical dataset"""
        try:
            print(f"\nLoading primary dataset: {FILE_NAME}")
            df = pd.read_csv(DATA_PATH)
            return df
        except FileNotFoundError:
            print("\nERROR: Missing dataset file. Verify:")
            print(f"1. File exists in {INPUT_DIR}")
            print(f"2. Correct file name (you have: {FILE_NAME})")
            exit()

    def _prepare_data(self):
        """Process disease-symptom relationships"""
        condition_db = {}
        for _, row in self.medical_data.iterrows():
            symptoms = [s.strip().lower() for s in row[1:].dropna().tolist()]  # Fetch all symptoms dynamically
            condition_db[row['Disease'].strip()] = {'symptoms': np.array(symptoms, dtype=object)}  # Store as NumPy array
        return condition_db

    def analyze(self, symptoms):
        """Improved symptom analysis logic using NumPy"""
        matches = []
        clean_symptoms = np.array([s.strip().lower() for s in symptoms], dtype=object)
        
        for condition, data in self.conditions.items():
            common = np.intersect1d(clean_symptoms, data['symptoms'])  # NumPy-based intersection
            if common.size > 0:
                matches.append((condition, common.size))
        
        return sorted(matches, key=lambda x: x[1], reverse=True)[:5]

In [54]:
def main():
    print("\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    print("  Medical Symptom Analysis System")
    print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    
    name = input("\nPatient Name: ").strip()
    symptoms = input("Symptoms (comma-separated): ").split(',')
    
    engine = SymptomChecker()
    results = engine.analyze(symptoms)
    
    print(f"\nAssessment for {name}:")
    if not results:
        print("⚠️ No clear matches - consult a physician")
        return
    
    for idx, (condition, score) in enumerate(results, 1):
        print(f"{idx}. {condition} ({score} symptom matches)")

if __name__ == "__main__":
    main()


━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Medical Symptom Analysis System
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━



Patient Name:  emm
Symptoms (comma-separated):  vomiting, fever, headache



Loading primary dataset: dataset.csv

Assessment for emm:
1. Paralysis (brain hemorrhage) (2 symptom matches)
2. Malaria (2 symptom matches)
3. Dengue (2 symptom matches)
4. Typhoid (2 symptom matches)
5. Hypoglycemia (2 symptom matches)
