# Building a Decision Tree

## 1. Process of Building a Decision Tree

### Step 1: Decide the Feature for the Root Node
- The root node is the topmost node of the decision tree.
- Use an algorithm to determine the feature to split on.
- Example:
  - Split on **ear shape**:
    - **Left branch**: Pointy ears.
    - **Right branch**: Floppy ears.

### Step 2: Split the Data Recursively
- Focus on each branch and decide the next feature to split on.
- Example:
  - **Left branch** splits on **face shape**:
    - **Left**: Round faces → All cats → Leaf node prediction = Cat.
    - **Right**: Not round faces → All dogs → Leaf node prediction = Not cat.
  - **Right branch** splits on **whiskers**:
    - **Left**: Whiskers present → All cats → Leaf node prediction = Cat.
    - **Right**: Whiskers absent → All dogs → Leaf node prediction = Not cat.

### Step 3: Create Leaf Nodes
- Stop splitting when subsets are "pure" (contain only one class, e.g., all cats or all dogs).
- Make predictions at leaf nodes based on the class composition.

---

## 2. Key Decisions in Building a Decision Tree

### 1. Choosing the Feature to Split On
- Decide which feature (e.g., ear shape, face shape, whiskers) to split on.
- **Aim**: Maximise **purity** in the resulting subsets.
- **Purity**:
  - A "pure" subset contains examples of only one class.
  - Example: Splitting on a hypothetical "cat DNA" feature would achieve 100% purity.

### 2. Stopping Criteria for Splitting
- **Purity threshold**: Stop splitting when subsets are completely pure.
- **Maximum tree depth**: Limit the tree's depth to avoid overfitting.
  - **Depth**: Number of hops from the root node to the current node.
  - Example: A maximum depth of 2 prevents splits beyond two levels.
- **Minimum impurity gain**: Stop splitting if the improvement in purity is too small.
- **Minimum examples at a node**: Stop splitting if the number of examples in a node falls below a threshold.

---

## 3. Challenges and Evolution of Decision Trees
- Decision tree algorithms involve multiple components, making them appear complex:
  - Feature selection criteria.
  - Stopping conditions.
  - Methods for measuring purity/impurity.
- These complexities arose as researchers refined the algorithm over time.

---

## 4. Measuring Purity (or Impurity)
- **Purity** determines the homogeneity of the examples in a subset.
- **Entropy**: A mathematical measure of impurity.
  - **Low entropy**: High purity (e.g., all examples are cats or dogs).
  - **High entropy**: Low purity (e.g., a mix of cats and dogs).
- Upcoming topics cover entropy and its role in decision-making for splits.

---

## Explanation of Concepts

1. **Feature Selection**: A "good" feature reduces impurity, making subsets more homogenous.
2. **Stopping Splits**: Prevents overfitting and ensures the tree generalises well.
3. **Entropy and Purity**: Mathematical tools guide the algorithm to create effective splits.
4. **Complexity of Decision Trees**: Despite their perceived messiness, decision trees are robust and effective for classification tasks.