## Decision Tree |

Question 1: What is a Decision Tree, and how does it work in the context of classification?
  - A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. In the context of classification, it is a tree-like model used to predict the class or category of a target variable by learning simple decision rules inferred from data features.
  - Think of it as a flowchart where the model asks a series of questions to norrow down the possibilities until it reaches a conclusion.
  - How it works in Classification:
    - The goal of a decision tree in classification is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
    1. Recursive Partitioning
    - The process starts at the Root Node with the complete dataset. The algorithm looks for the "best" feature to split the data.
      - It asks a question (e.g., "Is the humidity > 70%")
      - Based on the answer, it splits the data into subsets.
      - It then repeats this process for each subset (child node), choosing tghe best feature to split that specific subset.
    2. Selecting the "Best" split
    - How does the tree decides which feature to split on? It uses the mathematical metrics to measures the "purity" of the split. THe goal is to produce nodes that are as homogenous as possible(containing mostly one class).
    - Two common metrics used are :         
      - Gini Impurity:
        - Measures the likelihood of an incorrect classification of a new instance of a random variable, if that new instance were randomly classified according to the distribution of class labels from the dataset. A Gini score of 0 denotes a pure node (all samples belong to the same class).
      - Entropy (Information Gain):
        - Measures the amount of randomness or uncertainty in the data. The algorithm calculates the Information Gain for each possible split and chooses the one that provides the highest gain (reduces entropy the most).$$Entropy(S) = - \sum p_i \log_2(p_i)$$Where $p_i$ is the probability of an element belonging to a specific class.  
    3. Stopping Cirteria
      - The tree continues to grow and split until one of the following conditions is met:
        - Pure Node: All data points in a node belong to the same class.
        - Max Depth: The tree reaches a pre-defined maximum depth (to prevent overfitting).
        - Min Samples: The number of samples in a node is below a certain threshold.    

Question 2: Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?
  - In the context of Decision Trees, Impurity is a measure of the "homogeneity" os the labels at a node. A node is "pure" if all its samples belong to the same class (e.g., all "Yes"). It is "Impure" if it contains a mix of different classes (e.g., 50% "Yes", 50% "NO").
  - The goal of a decision tree is to split the data ia a way that minimize impurity in the child nodes. The two most common metrics to measures this are Gini Impurity and Entropy.
  1. Gini Impurity
  - Gini impurity measures the probability of misclassifying a randomly chosen element from the dataset if it were randomly labled according tio the class distribution in dataset.
  - Formula:$$Gini = 1 - \sum_{i=1}^{C} (p_i)^2$$Where $p_i$ is the probability of an element belonging to class $i$
  - Range: It ranges from 0 to 0.5 (for binary calssification).
  - 0:
    - Perfectly pure node(all samples are the same class)
  - 0.5:
    - Maximum impurity (samples are evenly distributed across classes).
  - Intuition:
    - Gini is about "how often would i be wrong if i guessed randomly based on the distribution?" If a bag hass 99 red balls amd 1 blue ball, you are very likely to pick a red ball, and very likely to guess "red," so your error (impurity) is low.
  2. Entropy:
    -  Entropy is a concept borrowed from Information Theory. It measures the amount of "disorder," uncertainty, or surprise in the data.Formula:$$Entropy = - \sum_{i=1}^{C} p_i \log_2(p_i)$$Where $p_i$ is the probability of class $i$.
  - Range: It ranges from 0 to 1 (for binary classification).
  - 0 : Perfectly pure (zero disorder).
  - 1 : Maximum impurity (maximum disorder/uncertainty, e.g., a 50/50 split).
  - Intuition: High entropy means the dataset is chaotic and unpredictable. Low entropy means the dataset is orderly and predictable.
  3. How They Impact Splits
  - The decision tree algorithm (like CART for Gini or ID3/C4.5 for Entropy) doesn't just calculate impurity for a single node; it calculates the change in impurity (Information Gain) to decide where to split.
  - Here is the step-by-step process of how they impact the tree construction:
  - Calculate Parent Impurity:
    -  The algorithm calculates the Gini or Entropy of the current node (Parent) before splitting.
  - Test Possible Splits:
    - It iterates through every possible feature and every possible threshold (e.g., "Age > 25", "Age > 26", "Income > 50k").
  - Calculate Weighted Child Impurity:
    - For each test split, it calculates the impurity of the resulting child nodes. It takes a weighted average based on the number of samples in each child.$$Weighted\_Impurity = \frac{N_{left}}{N_{total}} \times Impurity_{left} + \frac{N_{right}}{N_{total}} \times Impurity_{right}$$
    - Select the Best Split:
     - For Entropy: It calculates Information Gain (Parent Entropy - Weighted Child Entropy). It chooses the split with the highest Information Gain.
     - For Gini: It chooses the split that produces the lowest Weighted Gini Impurity.


Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.
  - Pruning is a technique used to overcome overfitting in Decision Trees
  - Overfitting occurs when a tree becomes so complex that it "memorizes" the noise in the training data rather than learning the actual patterns, leading to poor performance on new data.
  - Here is the difference between Pre-Pruning and Post-Pruning:
  1. Pre-Pruning (Early Stopping) :       
  Pre Pruning involves halting the growth of the decision tree before it perfectly classifies the training set. You stop the tree-building process early based on specific stopping criteria/hyperparameters.
  - How it works:
    - At each step of spliting, the algorithm checks a condtition. if the condition is met (e.g., the tree is too deep), it stops splliting that node and turns it into a leaf node, evem if the node ins't pure yet.
  - Common Criteria:
    - Max Depth: Stop if the tree reaches a depth of X.
    - Min Sample Split: Stopm if a node has fewer than X samples.
    - Min Impurity Decreases: Stop if splitting doesn't reduce impurity by at least X.    
  2. Post-Punning (Backward-Punning)
  Post-pruning involves allowing the tree to grow to its full extent (until all leaves are pure or contain very few samples), and then trimming back the branches that do not provide significant information.
  - How it works:
    - The algorithm builds a massive, overfitted tree first. Then, it works from the bottom up (from leaves to root). It replaces a subtree with a leaf node if removing that subtree does not significantly increase the error rate (often verified using a validation dataset or cross-validation).
  - Common Method:
    - Cost Complexity Pruning (used in scikit-learn via the ccp_alpha parameter). It assigns a penalty to the number of terminal nodes, finding the right balance between the tree's size and its accuracy.    
  - Practical Advantages :   
  - Advantage of Pre-Pruning: Computational Efficiency
    - Why:
      - It is significantly faster and uses less memory because you never build the full, massive tree. Practical Use Case: If you are working with a massive dataset (millions of rows) or in a real-time application where training speed is critical, pre-pruning (e.g., setting max_depth=10) ensures you get a "good enough" model quickly without exhausting system resources.
  - Advantage of Post-Pruning: Higher Accuracy (Better Generalization)
    - Why:
      - It avoids the "Horizon Effect." Sometimes a split early in the tree looks "bad" (low information gain) but opens the door to a very "good" split deeper down. Pre-pruning would stop early and miss this; Post-pruning builds the whole thing, sees the value of the deeper split, and keeps it. Practical Use Case: In competitions (like Kaggle) or medical diagnosis where accuracy is paramount and you can afford a longer training time, post-pruning usually yields a more robust and accurate model.  

Question 4: What is Information Gain in Decision Trees, and why is it important for choosing the best split?
  - Information Gain is the metric used by Decision Tree algorithms (specifically ID3 and C4.5) to decide which feature to split on at each step.
  - In simple terms, it measures how much "uncertainty" (entropy) was removed from the dataset after splitting it on a specific attribute.
  - The Concept
  - Imagine you are playing a game of "20 Questions.
  - "Question A:
    - "Is it alive?" (Reduces the possibilities massively $\rightarrow$ High Information Gain)
  - Question B:
    - "Does its name start with the letter T?" (Doesn't help much $\rightarrow$ Low Information Gain)
  - In a Decision Tree, the algorithm tests every feature and calculates how much information it "gains" by splitting on that feature. It then chooses the feature with the highest Information Gain to be the next node.
  - The Formula
  - Mathematically, Information Gain is simply the difference between the entropy of the parent node and the weighted average entropy of the child nodes.$$Information\ Gain(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v)$$$Entropy(S)$: The impurity of the original dataset (Parent).
  - $Entropy(S_v)$: The impurity of the new subset (Child) created by the split
  - $\frac{|S_v|}{|S|}$: The weight (proportion) of data points that ended up in that child node.
  - Why Is It Important for Choosing the Best Split?
    - Information Gain is the core "selection criterion" that drives the tree construction. Here is why it is critical:
  1. . Identifies the Most Important Features
    - It acts as a filter to find the most predictive features. The feature at the root of the tree (topmost node) is always the one with the highest Information Gain—meaning it is the single most important factor in determining the outcome.

2. Minimizes Tree Depth (Efficiency)
    - By choosing splits that reduce impurity the fastest, the algorithm builds a shallower, more efficient tree. If we chose splits with low information gain, we would need many more layers of questions to reach a pure leaf node.

3. Prevents "Bad" Splits
    - Consider a dataset of 100 people, 50 "Fit" and 50 "Unfit" (High Entropy/Uncertainty).

    - Split by "Gym Membership": You get one group of 45 "Fit" / 5 "Unfit" and another of 5 "Fit" / 45 "Unfit". This drastically reduces uncertainty. (High Gain)

    - Split by "Favorite Color": You get groups that are still roughly 50/50 mixed. The uncertainty remains high. (Low/Zero Gain)

    - Information Gain ensures the algorithm picks "Gym Membership" over "Favorite Color."

  - A Known Drawback Bias toward Multi-Valued Attributes
    - Information Gain has a flaw: it is biased towards attributes with a large number of distinct values.

  - Example:
    - If you have a "User ID" feature, splitting on it would result in perfectly pure nodes (1 user per leaf). This gives maximum Information Gain but is useless for prediction (overfitting).

  - Solution:
    - To fix this, algorithms like C4.5 use Gain Ratio, which penalizes attributes with too many branches


Question 5: What are some common real-world applications of Decision Trees, and
what are their main advantages and limitations?
  - Decision Trees are widely used because they mimic human decision-making and are easy to interpret. While more compplex algorithms (like neural Networks) often oytperform them in raw accuracy, Decision Trees are preferred in industries where exaplaining the "why" behind a preediction is legally or operrationally required.
  - Common Real-World Applications
    1. Finance: Credit Scoring & Loan Approval
      - Application: Banks are use decision trees to determine if a loan applicant is "low Risk" or "High Risk".
      - How it works: The tree splits users based on income, Credit History, Employment Starus, and Debt-to-income Ratio.
      - Here why: REgulation often require banks to explain why a oan was rejected. A decision tree provides a clear audit trail(e.g., "Rejected because income < r$30K AND Credit Score < 600").
    2. Healthcare: Triage & Diagnosis
      - Application: Emergency rooms use decision trees (often as flowcharts) to prioritize patients or suggest initial treatments.
      - How it works:
        - Symptoms: Chest pain ? -->> Yes  
        - Age: > 45? -->> Yes.
        - History: Smokers? -->> Yes.
        - Action: Prioritize (High Risk of Heart Attack).
      - Why here:
        - Doctors need a transparent tool that aligns with medical guidelines, not a "black box" AI they can't verify.

Question 6: Write a Python program to:
  -  Load the Iris Dataset
  -  Train a Decision Tree Classifier using the Gini criterion
  - Print the model’s accuracy and feature importances

In [1]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

# 1. Load the Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
# random_state ensures reproducible results
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Train a Decision Tree Classifier using the Gini criterion
# 'criterion' is set to 'gini' by default, but we specify it explicitly here
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# 3. Print the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f} ({accuracy*100:.1f}%)")
print("-" * 30)

# 4. Print Feature Importances
print("Feature Importances:")
# Create a DataFrame for better visualization
feature_importance_df = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': clf.feature_importances_
}).sort_values(by='Importance', ascending=False)

print(feature_importance_df)

Model Accuracy: 1.00 (100.0%)
------------------------------
Feature Importances:
             Feature  Importance
2  petal length (cm)    0.906143
3   petal width (cm)    0.077186
1   sepal width (cm)    0.016670
0  sepal length (cm)    0.000000


Question 7: Write a Python program to:
  - Load the Iris Dataset
  - Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to a fully-grown tree.

In [2]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load the Iris Dataset
data = load_iris()
X = data.data
y = data.target

# Split the dataset (80% train, 20% test)
# random_state=42 ensures the split is the same every time we run it
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- Model 1: Fully Grown Tree ---
# By default, max_depth=None, allowing the tree to grow until all leaves are pure
full_tree = DecisionTreeClassifier(random_state=42)
full_tree.fit(X_train, y_train)
y_pred_full = full_tree.predict(X_test)
acc_full = accuracy_score(y_test, y_pred_full)

# --- Model 2: Pruned Tree (max_depth=3) ---
# We limit the depth to 3 levels to prevent overfitting
pruned_tree = DecisionTreeClassifier(max_depth=3, random_state=42)
pruned_tree.fit(X_train, y_train)
y_pred_pruned = pruned_tree.predict(X_test)
acc_pruned = accuracy_score(y_test, y_pred_pruned)

# --- Output the Comparison ---
print(f"Accuracy of Fully Grown Tree: {acc_full:.4f}")
print(f"Accuracy of Pruned Tree (depth=3): {acc_pruned:.4f}")

# Check depth of the full tree for context
print(f"Actual depth of the fully grown tree: {full_tree.get_depth()}")

Accuracy of Fully Grown Tree: 1.0000
Accuracy of Pruned Tree (depth=3): 1.0000
Actual depth of the fully grown tree: 6


Question 8: Write a Python program to:
  - Load the Boston Housing Dataset
  - Train a Decision Tree Regressor
  - Print the Mean Squared Error (MSE) and feature importances


In [3]:
# Code 1: Iris Classification (Accuracy & Feature Importance)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load the Iris Dataset
data = load_iris()
X = data.data
y = data.target

# Split the data (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Train a Decision Tree Classifier using the Gini criterion
# Note: criterion='gini' is the default, but we specify it for clarity
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# Predict on the test set
y_pred = clf.predict(X_test)

# 3. Print Accuracy and Feature Importances
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")
print("-" * 30)

print("Feature Importances:")
# Create a simple DataFrame to display features alongside their importance scores
importances = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance': clf.feature_importances_
}).sort_values(by='Importance', ascending=False)

print(importances)

Model Accuracy: 1.00
------------------------------
Feature Importances:
             Feature  Importance
2  petal length (cm)    0.906143
3   petal width (cm)    0.077186
1   sepal width (cm)    0.016670
0  sepal length (cm)    0.000000


In [4]:
# Code 2: Iris Depth Comparison (Pruning vs. Full Tree)
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load and split data
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# 1. Train a Pruned Tree (max_depth=3)
pruned_tree = DecisionTreeClassifier(max_depth=3, random_state=42)
pruned_tree.fit(X_train, y_train)
y_pred_pruned = pruned_tree.predict(X_test)

# 2. Train a Fully Grown Tree (max_depth=None)
full_tree = DecisionTreeClassifier(max_depth=None, random_state=42)
full_tree.fit(X_train, y_train)
y_pred_full = full_tree.predict(X_test)

# 3. Compare Accuracies
print(f"Accuracy (Max Depth = 3): {accuracy_score(y_test, y_pred_pruned):.4f}")
print(f"Accuracy (Fully Grown):   {accuracy_score(y_test, y_pred_full):.4f}")

# Check actual depth of the full tree
print(f"Actual depth of full tree: {full_tree.get_depth()}")

Accuracy (Max Depth = 3): 1.0000
Accuracy (Fully Grown):   1.0000
Actual depth of full tree: 6


In [5]:
# Code 3: Boston Housing Regression (MSE & Feature Importance)
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# 1. Load the Boston Housing Dataset (using fetch_openml as load_boston is deprecated)
# data_id=531 is the ID for the Boston Housing dataset on OpenML
boston = fetch_openml(data_id=531, parser='auto')
X = boston.data
y = boston.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Train a Decision Tree Regressor
# We use a regressor here, not a classifier, because the target is a continuous value (price)
regressor = DecisionTreeRegressor(random_state=42)
regressor.fit(X_train, y_train)

# Predict on the test set
y_pred = regressor.predict(X_test)

# 3. Print Mean Squared Error (MSE) and Feature Importances
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")
print("-" * 30)

print("Feature Importances:")
importances = pd.DataFrame({
    'Feature': X.columns,
    'Importance': regressor.feature_importances_
}).sort_values(by='Importance', ascending=False)

print(importances.head()) # Printing top 5 features

Mean Squared Error (MSE): 10.42
------------------------------
Feature Importances:
   Feature  Importance
5       RM    0.600326
12   LSTAT    0.193328
7      DIS    0.070688
0     CRIM    0.051296
4      NOX    0.027148


Question 9: Write a Python program to:
-  Load the Iris Dataset
-  Tune the Decision Tree’s max_depth and min_samples_split using
GridSearchCV
-  Print the best parameters and the resulting model accuracy

In [6]:
# 1. Load the Iris Dataset
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

# Load the dataset
data = load_iris()
X = data.data
y = data.target

# Split the dataset (80% for training/tuning, 20% for final evaluation)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
# 2.Tune the Decision Tree’s max_depth and min_samples_split using GridSearchCV
# Define the parameter grid to search
param_grid = {
    'max_depth': [3, 5, 7, None],           # Try different depths
    'min_samples_split': [2, 5, 10]         # Try different split thresholds
}

# Initialize the Decision Tree Classifier
dt = DecisionTreeClassifier(random_state=42)

# Initialize GridSearchCV
# cv=5 means 5-fold Cross-Validation
grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

In [10]:
# 3.Print the best parameters and the resulting model accuracy
# Execute the Grid Search
grid_search.fit(X_train, y_train)

# Get the best parameters and the best score achieved during validation
best_params = grid_search.best_params_
best_cv_score = grid_search.best_score_

# Evaluate the best model on the independent test set
best_model = grid_search.best_estimator_
test_accuracy = best_model.score(X_test, y_test)

print(f"Best Parameters Found: {best_params}")
print(f"Best Cross-Validation Accuracy: {best_cv_score:.4f}")
print(f"Test Set Accuracy: {test_accuracy:.4f}")

Best Parameters Found: {'max_depth': 7, 'min_samples_split': 2}
Best Cross-Validation Accuracy: 0.9417
Test Set Accuracy: 1.0000


Question 10: Imagine you’re working as a data scientist for a healthcare company that wants to predict whether a patient has a certain disease. You have a large dataset with mixed data types and some missing values.
Explain the step-by-step process you would follow to:
-  Handle the missing values
-  Encode the categorical features
-  Train a Decision Tree model
- Tune its hyperparameters
- Evaluate its performance
And describe what business value this model could provide in the real-world
setting.

Phase 1 : Data Preprocessing :
  - Before the model can learn anything, the "messy" real-world data must be cleaned.
  1. Handling Missing Values
      - Missing data in healthcare is common (e.g., a patient didn't take a specific test).
      - For Numerical Features(e.g., Blood Pressure, Age):
      - Simple Imputation:
        -  If the data is normally distributed, i would fill missing values with the Mean. if it is skewed(has outlires), i would use the Median.
      - Advanced Method:
        - Use KNN Imputer, which finds "similar" patients based on other features and uses their values to fill the gap.
  2. Encoding Categorical Features
      - Machine Learning models (specifically scikit-learn implementations) require numerical input.
      - One-Hot Encoding: For nomianal variables with no inherent order (e.g., Gender, Blood Type). This creates binary columns(e.g., Is_Type_A, Is_Type_O).
      - Label/ordinal Encoding: For ordinal varibales with a clear rank (e.g., pain level: Low/Medium/High.). I would map these to 0,1,2 to preserve the order.
Phase 2 : Model Development
  3. Train a Decision Tree Model
      - Data Split: I would split the data into Training(70%), Validation(15%), and testing (15%) sets. Stratified sampling is crucial here to ensure the percentage of "sick" patients is consistent across all splits.
      - Baseline Model: I would train an initial "vanilla" Decision Tree without constraints to establish a baseline performance and identify feature importance.
  4. Tune hyperparameters
      - A default Decision Tree will almost certainly overfit (memorize) the training data. I would use GridSearchCV or RandomizedSearchCV to find the optimal balance.
      - max_depth:
        - Limit how deep the tree grows to prevent it from learning noise.
      - min_samples_leaf:
        - Ensure every final decision node has a statistically significant number of patients(e.g., at least 20).
      - class_weight:
          - Since diseases are often rare (imbalaced data), I would set this to 'balanced' so the model pays more attention to minority class (the sick patients).
Phase 3: Evaluation & Business Impact :      
  - 5. Evaluate Performance
    - In healthcare, Accuracy is misleading. (If 99% of patients are healthy, a model that predicts "Healthy" for everyone is 99% accurate but useless).
  - I would focus on :      
    - Recall(Sensitivity):   
      - Out of all the people who actually have the disease, how many did we catch? This is the most critical metric. Missing a sick patient (False Negative) is life-threatening.
    - Precision:
      - Out of all people we predicted as sick, how many actually are? (Low precision causes "alarm fatigue" for doctors).
    - ROC-AUC Score:
      - To measure how well the model distinguishes between sick and healthy patients across different thresholds.
   - 6. Deploying this model provides three key benefits:
     - Early Intervention:
      - By flagging high-risk patients who might be asymptomatic, the hospital can start treatment earlier, significantly improving survival rates.
    - Resource Optimization (Triage):
      - In a busy hospital, the model can act as a "first pass" filter, prioritizing high-risk patients for immediate doctor review while lower-risk patients wait.
    - Cost Reduction:
       - Preventative care is cheaper than emergency care. Catching the disease early prevents expensive surgeries or ICU stays later.                                     