# Decision Trees

## Problem Type
**Decision Trees** are primarily used for:
- **Classification** problems
- **Regression** problems
- **Supervised** learning

### How Decision Trees Work
- **Tree-like model of decisions:** The algorithm splits the dataset into subsets based on the value of input features, creating branches for each possible outcome.
- **Recursive binary splitting:** Continues until the algorithm either perfectly classifies the data or reaches a stopping criterion.
- **Nodes and leaves:**
  - **Nodes:** Represent a decision based on a feature.
  - **Leaves:** Represent the outcome (class label in classification, value in regression).
- **Splitting criteria:** Uses metrics like Gini impurity, Information Gain (Entropy), or Mean Squared Error (MSE) to determine the best split.
- **Greedy algorithm:** Typically, Decision Trees use a greedy approach to make locally optimal decisions at each node.

### Key Tuning Metrics
- **`max_depth`:**
  - **Description:** The maximum depth of the tree.
  - **Impact:** Limits how deep the tree can grow; deeper trees can model more complex relationships but are prone to overfitting.
  - **Default:** No limit (`None`), meaning nodes are expanded until all leaves are pure or contain less than `min_samples_split` samples.
- **`min_samples_split`:**
  - **Description:** The minimum number of samples required to split an internal node.
  - **Impact:** Higher values prevent the model from learning overly specific patterns (reduces overfitting).
  - **Default:** `2`.
- **`min_samples_leaf`:**
  - **Description:** The minimum number of samples required to be at a leaf node.
  - **Impact:** Helps in smoothing the model, especially in regression trees; higher values make the model more resistant to noise.
  - **Default:** `1`.
- **`max_features`:**
  - **Description:** The number of features to consider when looking for the best split.
  - **Impact:** Reduces overfitting and variance when randomly selected; `sqrt(n_features)` is a good starting point for classification.
  - **Default:** `None` (consider all features).
- **`criterion`:**
  - **Description:** The function to measure the quality of a split (`gini` for classification, `entropy` for classification, `mse` for regression).
  - **Impact:** Affects how the tree makes splits; `gini` and `entropy` often yield similar trees, but their theoretical underpinnings differ.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Simple to understand and visualize                    | Prone to overfitting, especially with deep trees       |
| No need for feature scaling                           | Can create biased trees if some classes dominate       |
| Can handle both numerical and categorical data        | Sensitive to noisy data                                |
| Requires little data preprocessing                    | Greedy algorithm may not find the globally optimal tree|
| Can capture non-linear relationships                  | Can be unstable; small changes in data can lead to a completely different tree |
| Fast and efficient to train                           | Less effective for very complex relationships compared to ensemble methods     |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; above 0.8 generally indicates a well-performing model.
  - **Bad Value:** Below 0.5 suggests poor model performance, especially for balanced datasets.
- **Precision (Classification):**
  - **Description:** Proportion of positive identifications that were actually correct (True Positives / (True Positives + False Positives)).
  - **Good Value:** Higher values indicate fewer false positives; important when the cost of a false positive is high.
  - **Bad Value:** Low values suggest the model makes many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives that were correctly identified (True Positives / (True Positives + False Negatives)).
  - **Good Value:** Higher values indicate fewer false negatives; crucial when missing a positive case is costly.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall; useful when you need a balance between the two.
  - **Good Value:** Higher values are better; values above 0.7 indicate good performance.
  - **Bad Value:** Lower values suggest an imbalance between precision and recall.
- **Mean Squared Error (MSE) (Regression):**
  - **Description:** Average of the squared differences between predicted and actual values.
  - **Good Value:** Lower values indicate better fit; values close to 0 suggest minimal error.
  - **Bad Value:** Higher values suggest a poor fit with significant prediction errors.
- **R-squared (R²) (Regression):**
  - **Description:** Proportion of the variance in the dependent variable that is predictable from the independent variables.
  - **Good Value:** Closer to 1 indicates a good fit, but beware of overfitting if the value is too close to 1.
  - **Bad Value:** Close to 0 suggests the model does not explain much of the variance.
- **Cross-Validation Score:**
  - **Description:** Measures the model's ability to generalize to an independent dataset.
  - **Good Value:** Higher scores across folds indicate robust generalization.
  - **Bad Value:** Low or inconsistent scores across folds suggest poor generalization.



In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.tree import plot_tree

In [None]:
wine = datasets.load_wine()

In [None]:
print(wine.DESCR)

In [None]:
X = wine.data  # Features
y = wine.target  # Target labels (wine type)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
model = DecisionTreeClassifier(
    max_depth=None,
    min_samples_split=2,
    max_features=None,
    criterion='gini'
)
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=wine.target_names)

print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(report)

In [None]:
importances = model.feature_importances_

# Print feature importances (optional)
for feature, importance in zip(wine.feature_names, importances):
    print(f"{feature}: {importance:.2f}")

In [None]:
plot_tree(model);