# Decission Tree
A Decision Tree is a versatile supervised machine learning algorithm used for both classification and regression tasks. It is based on a tree-like structure where internal nodes represent decision points based on features, branches represent outcomes, and leaf nodes represent final predictions (class labels or values).

- Type of Algorithm: Supervised Learning (Classification & Regression).
- Structure:
    - Root Node: The first decision point.
    - Internal Nodes: Intermediate decision points.
    - Leaf Nodes: Final outcomes (class labels or values).
- Splitting Criteria:
    - Classification: Gini Impurity, Entropy (Information Gain).
    - Regression: Variance Reduction, Mean Squared Error (MSE).

**How it Works**:

- Starts from the root node and splits the dataset based on the feature that best separates the target variable.
- Continues splitting until a stopping criterion is met, such as:
    - Maximum depth of the tree.
    - Minimum number of samples in a node.
    - No significant gain from further splitting.
- The result is a tree structure where each leaf node represents a class (for classification) or a value (for regression).

**Applications**:

- Healthcare: Disease diagnosis based on symptoms.
- Finance: Credit scoring, fraud detection.
- Marketing: Customer segmentation, churn prediction.
- Education: Predicting student performance.
- Retail: Product recommendations.

| **Advantages**                                                                 | **Disadvantages**                                                                  |
|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Simple to understand, interpret, and visualize.                                | Prone to overfitting, especially with deep trees.                                 |
| No assumptions about the data distribution (non-parametric).                   | Sensitive to small changes in data, leading to different tree structures.         |
| Can handle both categorical and numerical data.                                | Biased towards features with more levels (for categorical variables).             |
| Captures non-linear relationships naturally.                                   | Not robust to noisy data and irrelevant features.                                 |
| Works well for small to medium-sized datasets.                                 | Struggles with high-dimensional data and sparse datasets.                         |
| Can handle missing values (by using surrogate splits or ignoring them).        | Less accurate compared to ensemble methods like Random Forest or Gradient Boosting.|
| Computationally efficient and fast to train.                                   | Requires careful pruning or hyperparameter tuning to prevent overfitting.         |
| Effective for multi-class classification problems.                             | Struggles with imbalanced datasets (needs resampling or weighted splitting).      |
