## Decision Trees
---

#### Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. 
#### The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
import torch

print("Environment ready for ML!")
print(f"PyTorch version: {torch.__version__}")


Environment ready for ML!
PyTorch version: 2.8.0+cpu


#### Type: Supervised Learning

#### Problem: Both Classification and Regression

#### Evaluation Metrics: Accuracy (classification), MSE (regression), feature importance

#### Real-World Examples: Medical diagnosis, credit risk assessment, customer segmentation

---

#### Advantages

- Interpretable and visualizable.

- Requires minimal preprocessing (no normalization or dummy encoding).

- Handles numeric inputs; categorical support is limited.

- Fast prediction (logarithmic time per sample).

- Supports multi-output tasks.

---

Overview of `sklearn.tree` (scikit-learn decision-tree module):

**Purpose and Scope**
Supports supervised learning via decision trees for classification and regression. Creates simple if-then-else rules to predict outcomes based on feature splits. Non-parametric.([Scikit-learn][1])

---

**Advantages**

* Interpretable and visualizable.
* Requires minimal preprocessing (no normalization or dummy encoding).
* Handles numeric inputs; categorical support is limited.
* Fast prediction (logarithmic time per sample).
* Supports multi-output tasks.([Scikit-learn][1])

---

**Drawbacks**

* Prone to overfitting if unrestricted.
* Requires pruning or limiting depth, leaf count, or sample requirements to generalize.([Scikit-learn][1])

---

**Key Classes & Functions**

* `DecisionTreeClassifier` – for classification tasks.
* `DecisionTreeRegressor` – for regression tasks.([Scikit-learn][2])

Important parameters for both:

* `criterion`: `"gini"`, `"entropy"`, or `"log_loss"` – measure split quality.([Scikit-learn][3])
* `splitter`: `"best"` or `"random"`.
* `max_depth`, `min_samples_split`, `min_samples_leaf`: control tree shape.
* `max_features`, `max_leaf_nodes`, `min_impurity_decrease`: add control over splits.([Scikit-learn][3])

---

**Visualization & Export**

* `plot_tree(...)`: renders tree graphically with options for depth control, node labels, and aesthetics.([Scikit-learn][4])
* `export_graphviz`, `export_text`: textual or Graphviz formats.([Scikit-learn][2])

---

**Introspection Tools**

* `tree_.value`: class distribution per node. Relative values need multiplication by `n_node_samples` for absolute counts.([Stack Overflow][5])
* `decision_path(X)`, `apply(X)`: trace samples through the tree or get leaf indices.([Scikit-learn][6])

---

**Regression Example Insight**
Low `max_depth` yields smoother, generalized predictions. High depth overfits noise.([Scikit-learn][7])

---

**Related Ensemble Methods**
Module includes wrapper for ensembles (Random Forests, Extra-Trees) to reduce overfitting and improve stability.([Scikit-learn][8])

---

**Summary**
Decision trees offer clear rules, flexibility, and easy visualization. They require tuning to avoid overfitting. scikit-learn supports these with configurable parameters, visualization tools, node-level introspection, and ensemble options.

Let me know if you want concise code templates or deeper dive on tuning metrics.

[1]: https://scikit-learn.org/stable/modules/tree.html?utm_source=chatgpt.com "1.10. Decision Trees - Scikit-learn"
[2]: https://scikit-learn.org/stable/api/sklearn.tree.html?utm_source=chatgpt.com "sklearn.tree — scikit-learn 1.7.1 documentation"
[3]: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html?utm_source=chatgpt.com "DecisionTreeClassifier — scikit-learn 1.7.1 documentation"
[4]: https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html?utm_source=chatgpt.com "plot_tree — scikit-learn 1.7.1 documentation"
[5]: https://stackoverflow.com/questions/47719001/what-does-scikit-learn-decisiontreeclassifier-tree-value-do?utm_source=chatgpt.com "What does scikit-learn DecisionTreeClassifier.tree_.value do?"
[6]: https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html?utm_source=chatgpt.com "Understanding the decision tree structure - Scikit-learn"
[7]: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html?utm_source=chatgpt.com "Decision Tree Regression — scikit-learn 1.7.1 documentation"
[8]: https://scikit-learn.org/stable/modules/ensemble.html?utm_source=chatgpt.com "1.11. Ensembles: Gradient boosting, random forests, bagging, voting ..."
