Let's manually construct a decision tree using a small, simplified dataset to make the process straightforward. We'll focus on just two features to keep the example clear.

**Dataset**
<table border="1">
    <thead>
        <tr>
            <th>Weather</th>
            <th>Temperature</th>
            <th>Play?</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Sunny</td>
            <td>Hot</td>
            <td>No</td>
        </tr>
        <tr>
            <td>Sunny</td>
            <td>Mild</td>
            <td>No</td>
        </tr>
        <tr>
            <td>Overcast</td>
            <td>Cool</td>
            <td>Yes</td>
        </tr>
        <tr>
            <td>Rainy</td>
            <td>Mild</td>
            <td>Yes</td>
        </tr>
        <tr>
            <td>Rainy</td>
            <td>Cool</td>
            <td>Yes</td>
        </tr>
        <tr>
            <td>Rainy</td>
            <td>Cool</td>
            <td>No</td>
        </tr>
    </tbody>
</table>



Step 1: Compute the Entropy of the Root Node
The root node includes all samples. Let's calculate its entropy: <br/>

Total samples: 6<br/>
Positive samples (Play? = Yes): 3<br/>
Negative samples (Play? = No): 3<br/>
The formula for entropy is:
$$H(S) = -p^+ \log_2(p^+) - p^- \log_2(p^-)$$
<br/>
where:<br/>
Given:

$$
p^+ = \frac{\text{Positive samples}}{\text{Total samples}} = \frac{3}{6} = 0.5
$$

$$
p^- = \frac{\text{Negative samples}}{\text{Total samples}} = \frac{3}{6} = 0.5
$$

Entropy \( H(S) \) is calculated as:

$$
H(S) = -p^+ \log_2(p^+) - p^- \log_2(p^-)
$$

Substitute the values:

$$
H(S) = - (0.5) \log_2(0.5) - (0.5) \log_2(0.5)
$$

$$
H(S) = - (0.5)(-1) - (0.5)(-1) = 1
$$

Thus, the entropy \( H(S) = 1 \).


Step 2: Calculate Information Gain for Each Feature. <br />
Feature: Weather<br />
The values for Weather are: Sunny, Overcast, Rainy.<br />

Subsets for Weather:
* Sunny:
    * Total samples: 2
    * Play? = No: 2
    * Entropy:
  
  The entropy for the "Sunny" class is calculated as:

$$
H(S_{\text{Sunny}}) = - (1) \log_2(1)
$$

Since \(\log_2(1) = 0\), we have:

$$
H(S_{\text{Sunny}}) = 0
$$
* Overcast:
    * Total samples: 1
    * Play? = Yes: 1
    * Entropy:
  
  The entropy for the "Overcast" class is calculated as:

    $$
    H(S_{\text{Overcast}}) = - (1) \log_2(1)
    $$

    Since \(\log_2(1) = 0\), we have:

    $$
    H(S_{\text{Overcast}}) = 0
    $$

  * Rainy:
    * Total samples:3
    * Play? = Yes: 2, Play? = No: 1
    * Entropy:
**Entropy for "Rainy" Weather**
The entropy for "Rainy" is calculated as:

$$
H(S_{\text{Rainy}}) = - \frac{3}{6} \log_2 \left( \frac{3}{6} \right) - \frac{3}{6} \log_2 \left( \frac{3}{6} \right) \approx 0.918
$$

**Weighted Entropy for Weather**

The weighted entropy for weather is calculated as:

$$
H(S_{\text{Weather}}) = \frac{2}{6} (0) + \frac{1}{6} (0) + \frac{3}{6} (0.918) = 0.459
$$

**Information Gain for Weather**

The information gain for weather is calculated as:

$$
IG(S, \text{Weather}) = H(S) - H(S_{\text{Weather}}) = 1 - 0.459 = 0.541
$$


**Feature: Temperature**
The values for Temperature are: Hot, Mild, Cool.<br/>

Subsets for Temperature:<br/>
*   Hot:
    *   Total Samples: 1
    *   Play? = No: 1, Hot:0
    *   Entropy: 
  $$
H(S_{\text{Hot}}) = - (1) \log_2(1)
$$

Since \(\log_2(1) = 0\), we have:

$$
H(S_{\text{Hot}}) = 0
$$

* Mild:

Total samples: 2 <br />
Play? = Yes: 1, Play? = No: 1 <br />
Entropy:<br />
$$
H(S_{\text{Mild}}) = - \frac{2}{4} \log_2 \left( \frac{2}{4} \right) - \frac{2}{4} \log_2 \left( \frac{2}{4} \right)
$$

Since \(\log_2 \left( \frac{2}{4} \right) = -1\), we have:

$$
H(S_{\text{Mild}}) = - \frac{2}{4} (-1) - \frac{2}{4} (-1) = 1
$$

* Cool:

Total samples: 3 <br />
Play? = Yes: 2, Play? = No: 1 <br />
Entropy:
$$
H(S_{\text{Cool}}) = - \frac{3}{6} \log_2 \left( \frac{3}{6} \right) - \frac{3}{6} \log_2 \left( \frac{3}{6} \right) \approx 0.918
$$

**Weighted Entropy for Temperature**

The weighted entropy for temperature is calculated as:
$$
H(S_{\text{Temperature}}) = \frac{1}{6} (0) + \frac{2}{6} (1) + \frac{3}{6} (0.918) = 0.639
$$

**Information Gain for Temperature**

The information gain for temperature is calculated as:

$$
IG(S, \text{Temperature}) = H(S) - H(S_{\text{Temperature}}) = 1 - 0.639 = 0.361
$$

**Step 3: Choose the Best Feature for the Root Node**
Information Gain for Weather: 0.541 <br/>
Information Gain for Temperature: 0.361 <br/>
Since Weather has the highest information gain, it becomes the root node.<br/>


**Step 4: Split the Tree on Weather**
* Branch: Sunny
  * All samples are: No.
  * Leaf node: No.
* Branch: Overcast
  * All samples are: Yes.
  * Leaf node: Yes.
* Branch: Rainy
  * Remaining samples
    <table border="1">
    <thead>
        <tr>
            <th>Weather</th>
            <th>Temperature</th>
            <th>Play?</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Rainy</td>
            <td>Mild</td>
            <td>Yes</td>
        </tr>
        <tr>
            <td>Rainy</td>
            <td>Cool</td>
            <td>Yes</td>
        </tr>
        <tr>
            <td>Rainy</td>
            <td>Cool</td>
            <td>No</td>
        </tr>
    </tbody>
</table>



**Step 5: Repeat for the Rainy Subset**
The Rainy subset still has mixed labels, so we calculate the information gain for splitting it further using Temperature.

* Subset Entropy for Rainy (Root Node Entropy)
  $$H(S_\text{Rainy})=0.918$$

Split on Temperature (for Rainy):
* Mild:
  * Total samples: 1
  * Play? = Yes: 1
  * Entropy: 0
* Cool:
  * Total samples: 2
  * Play? = Yes: 1, Play? = No: 1
  * Entropy: 1
Weighted entropy for Temperature:
$$
H(S_{\text{Rainy}, \text{Temperature}}) = \frac{1}{3} (0) + \frac{2}{3} (1) = 0.667
$$

**Information Gain for Temperature (Rainy subset)**

The information gain for the "Rainy" subset with respect to temperature is calculated as:
$$
IG(S_{\text{Rainy}}, \text{Temperature}) = H(S_{\text{Rainy}}) - H(S_{\text{Rainy}, \text{Temperature}}) = 0.918 - 0.667 = 0.251
$$

**Step 6: Final Decision Tree**

The final decision tree based on the given dataset with the selected features and their corresponding information gain values is shown below:<br />

<img src="manual_decision_tree.png">

Since the labels are mixed, it indicates that we cannot make a pure decision (e.g., not all samples are Yes or No)