In this task, I am to create a decision tree manually, using entropy and gini impurity to find the best splits. The first 12 rows will be used for training, while the remaining 3 wil be used for testing. 

Looking at the data, it is visible that the features are numerical -- not categorical. This should not be a problem, because decision trees are capable of handling numerical features as well. However, the procedure will be slightly different compared to when you are dealing with categorical features.

Firstly, the numerical features must be sorted, and then we can split on thresholds. 



To find the thresholds, I make a simple Python program. I could've done this manually by hand as well, but decided to do it with a program. 

In [None]:
#Program for å beregne thresholds for de numeriske feature-verdiene i datasettet.

#these are the feature values in the dataset.
feature0 = [4.4,4.9,4.6,4.4,6.0,5.8,6.1,5.7,6.7,6.8,6.8, 6.3]
feature1 = [3.2,3.1,3.2,2.9,2.7,2.7,2.9,3.0,2.5,3.2,3.0,3.4]
feature2 = [1.3,1.5,1.4,1.4,5.1,3.9,4.7,4.2,5.8,5.9,5.5,5.6]
feature3 = [0.2,0.1,0.2,0.2,1.6,1.2,1.4,1.2,1.8,2.3,2.1,2.4]

#I use sorted to get the values in ascending order. And set to remove duplicates.
feat0 = sorted(set(feature0))
feat1 = sorted(set(feature1))
feat2 = sorted(set(feature2))
feat3 = sorted(set(feature3))


# Then I calculate the thresholds by taking the average of two consecutive values. I use enumerate to get the index of the feature list, starting from 0.
for indx, feat in enumerate([feat0, feat1, feat2, feat3], start=0):
    out = ""
    for i in range(1, len(feat)):
        out += f"[{float((feat[i] + feat[i-1]) / 2):.2f}] "
    print(f"Feature {indx}: {out}")

Feature 0: [4.50] [4.75] [5.30] [5.75] [5.90] [6.05] [6.20] [6.50] [6.75] 
Feature 1: [2.60] [2.80] [2.95] [3.05] [3.15] [3.30] 
Feature 2: [1.35] [1.45] [2.70] [4.05] [4.45] [4.90] [5.30] [5.55] [5.70] [5.85] 
Feature 3: [0.15] [0.70] [1.30] [1.50] [1.70] [1.95] [2.20] [2.35] 


Now that I have the thresholds, I can proceed with calculating the parent entropy:

## Parent Entropy

$$
H(S) = -\sum_{i=1}^{3} p_i \log_2(p_i)
$$

Each class has $p = \tfrac{4}{12} = \tfrac{1}{3}$:

$$
H(S) = -3 \cdot \frac{1}{3}\log_2\left(\frac{1}{3}\right) = \log_2(3) \approx 1.585
$$

---

With the parent entropy calculated, I will now calculate the information gain, and left/right entropies for each of the thresholds.

## Threshold 1: $x \leq 4.5$

Left (2 Setosa):

$$
H_L = -1 \cdot \log_2(1) = 0
$$

Right (2 Setosa, 4 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{2}{10}\log_2\tfrac{2}{10} + \tfrac{4}{10}\log_2\tfrac{4}{10} + \tfrac{4}{10}\log_2\tfrac{4}{10}\right) \approx 1.522
$$

Weighted:

$$
H_{split} = \tfrac{2}{12}\cdot 0 + \tfrac{10}{12}\cdot 1.522 \approx 1.268
$$

Information Gain:

$$
IG = 1.585 - 1.268 = 0.317
$$

---

## Threshold 2: $x \leq 4.75$

Left (3 Setosa):

$$
H_L = 0
$$

Right (1 Setosa, 4 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{1}{9}\log_2\tfrac{1}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9}\right) \approx 1.392
$$

Weighted:

$$
H_{split} = \tfrac{3}{12}\cdot 0 + \tfrac{9}{12}\cdot 1.392 \approx 1.044
$$

Information Gain:

$$
IG = 1.585 - 1.044 = 0.541
$$

---

## Threshold 3: $x \leq 5.3$

Left (4 Setosa):

$$
H_L = 0
$$

Right (4 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{4}{8}\log_2\tfrac{4}{8} + \tfrac{4}{8}\log_2\tfrac{4}{8}\right) = 1.000
$$

Weighted:

$$
H_{split} = \tfrac{4}{12}\cdot 0 + \tfrac{8}{12}\cdot 1.000 = 0.667
$$

Information Gain:

$$
IG = 1.585 - 0.667 = 0.918
$$

---

## Threshold 4: $x \leq 5.75$

Left (4 Setosa, 1 Versicolor):

$$
H_L = -\left(\tfrac{4}{5}\log_2\tfrac{4}{5} + \tfrac{1}{5}\log_2\tfrac{1}{5}\right) \approx 0.722
$$

Right (3 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{3}{7}\log_2\tfrac{3}{7} + \tfrac{4}{7}\log_2\tfrac{4}{7}\right) \approx 0.985
$$

Weighted:

$$
H_{split} = \tfrac{5}{12}\cdot 0.722 + \tfrac{7}{12}\cdot 0.985 \approx 0.876
$$

Information Gain:

$$
IG = 1.585 - 0.876 = 0.709
$$

---

## Threshold 5: $x \leq 5.9$

Left (4 Setosa, 2 Versicolor):

$$
H_L = -\left(\tfrac{4}{6}\log_2\tfrac{4}{6} + \tfrac{2}{6}\log_2\tfrac{2}{6}\right) \approx 0.918
$$

Right (2 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{2}{6}\log_2\tfrac{2}{6} + \tfrac{4}{6}\log_2\tfrac{4}{6}\right) \approx 0.918
$$

Weighted:

$$
H_{split} = \tfrac{6}{12}\cdot 0.918 + \tfrac{6}{12}\cdot 0.918 = 0.918
$$

Information Gain:

$$
IG = 1.585 - 0.918 = 0.667
$$

---

## Threshold 6: $x \leq 6.05$

Left (4 Setosa, 3 Versicolor):

$$
H_L = -\left(\tfrac{4}{7}\log_2\tfrac{4}{7} + \tfrac{3}{7}\log_2\tfrac{3}{7}\right) \approx 0.985
$$

Right (1 Versicolor, 4 Virginica):

$$
H_R = -\left(\tfrac{1}{5}\log_2\tfrac{1}{5} + \tfrac{4}{5}\log_2\tfrac{4}{5}\right) \approx 0.722
$$

Weighted:

$$
H_{split} = \tfrac{7}{12}\cdot 0.985 + \tfrac{5}{12}\cdot 0.722 \approx 0.876
$$

Information Gain:

$$
IG = 1.585 - 0.876 = 0.709
$$

---

## Threshold 7: $x \leq 6.2$

Left (4 Setosa, 4 Versicolor):

$$
H_L = -\left(\tfrac{4}{8}\log_2\tfrac{4}{8} + \tfrac{4}{8}\log_2\tfrac{4}{8}\right) = 1.000
$$

Right (4 Virginica):

$$
H_R = 0
$$

Weighted:

$$
H_{split} = \tfrac{8}{12}\cdot 1.000 + \tfrac{4}{12}\cdot 0 = 0.667
$$

Information Gain:

$$
IG = 1.585 - 0.667 = 0.918
$$

---

## Threshold 8: $x \leq 6.5$

Left (4 Setosa, 4 Versicolor, 1 Virginica):

$$
H_L = -\left(\tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{1}{9}\log_2\tfrac{1}{9}\right) \approx 1.392
$$

Right (3 Virginica):

$$
H_R = 0
$$

Weighted:

$$
H_{split} = \tfrac{9}{12}\cdot 1.392 + \tfrac{3}{12}\cdot 0 \approx 1.044
$$

Information Gain:

$$
IG = 1.585 - 1.044 = 0.541
$$

---

## Threshold 9: $x \leq 6.75$

Left (4 Setosa, 4 Versicolor, 2 Virginica):

$$
H_L = -\left(\tfrac{4}{10}\log_2\tfrac{4}{10} + \tfrac{4}{10}\log_2\tfrac{4}{10} + \tfrac{2}{10}\log_2\tfrac{2}{10}\right) \approx 1.522
$$

Right (2 Virginica):

$$
H_R = 0
$$

Weighted:

$$
H_{split} = \tfrac{10}{12}\cdot 1.522 + \tfrac{2}{12}\cdot 0 \approx 1.268
$$

Information Gain:

$$
IG = 1.585 - 1.268 = 0.317
$$

---



The entropies and information gains have been calculated. Below is a table representing the weighed entropy and information gain for each threshold.

## Feature 0 — Summary of thresholds

$$
\begin{array}{|c|c|c|}
\hline
\textbf{Threshold} & \mathbf{H_{\text{split}}} & \mathbf{IG} \\
\hline
4.50 & 1.268 & 0.317 \\
4.75 & 1.044 & 0.541 \\
\mathbf{5.30} & \mathbf{0.667} & \mathbf{0.918} \\
5.75 & 0.876 & 0.709 \\
5.90 & 0.918 & 0.667 \\
6.05 & 0.876 & 0.709 \\
\mathbf{6.20} & \mathbf{0.667} & \mathbf{0.918} \\
6.50 & 1.044 & 0.541 \\
6.75 & 1.268 & 0.317 \\
\hline
\end{array}
$$

*The two thresholds with the highest information gain have been highlighted*

Based on Feature 0 alone, the two best thresholds were 5.30 and 6.20, and I could have started the tree with either split (they give the same information gain). However, ID3 chooses the root by comparing the maximum information gain across all features, not just within one. A different feature can provide a more informative first split or lead to a simpler, cleaner tree overall, and a tie inside one feature doesn’t guarantee a global optimum. So I decided to compute thresholds and gains for the remaining features as well and only then select the root using the overall best (feature, threshold) combination.



Moving on to Feature 1:

## Feature 1 (Sepal Width) — Full threshold calculations

Parent entropy:
$$
H(S)=\log_2 3 \approx 1.585
$$

---

## Threshold 1: $x \leq 2.6$

Left (1 Virginica):
$$
H_L = 0
$$

Right (4 Setosa, 4 Versicolor, 3 Virginica):
$$
H_R = -\!\left(\tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{3}{11}\log_2\tfrac{3}{11}\right) \approx 1.571
$$

Weighted:
$$
H_{\text{split}} = \tfrac{1}{12}\cdot 0 + \tfrac{11}{12}\cdot 1.571 \approx 1.439
$$

Information Gain:
$$
IG = 1.585 - 1.439 = 0.146
$$

---

## Threshold 2: $x \leq 2.8$

Left (2 Versicolor, 1 Virginica):
$$
H_L = -\!\left(\tfrac{2}{3}\log_2\tfrac{2}{3} + \tfrac{1}{3}\log_2\tfrac{1}{3}\right) \approx 0.918
$$

Right (4 Setosa, 2 Versicolor, 3 Virginica):
$$
H_R = -\!\left(\tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{2}{9}\log_2\tfrac{2}{9} + \tfrac{3}{9}\log_2\tfrac{3}{9}\right) \approx 1.530
$$

Weighted:
$$
H_{\text{split}} = \tfrac{3}{12}\cdot 0.918 + \tfrac{9}{12}\cdot 1.530 \approx 1.377
$$

Information Gain:
$$
IG = 1.585 - 1.377 = 0.208
$$

---

## Threshold 3: $x \leq 2.95$

Left (1 Setosa, 3 Versicolor, 1 Virginica):
$$
H_L = -\!\left(\tfrac{1}{5}\log_2\tfrac{1}{5} + \tfrac{3}{5}\log_2\tfrac{3}{5} + \tfrac{1}{5}\log_2\tfrac{1}{5}\right) \approx 1.371
$$

Right (3 Setosa, 1 Versicolor, 3 Virginica):
$$
H_R = -\!\left(\tfrac{3}{7}\log_2\tfrac{3}{7} + \tfrac{1}{7}\log_2\tfrac{1}{7} + \tfrac{3}{7}\log_2\tfrac{3}{7}\right) \approx 1.448
$$

Weighted:
$$
H_{\text{split}} = \tfrac{5}{12}\cdot 1.371 + \tfrac{7}{12}\cdot 1.448 \approx 1.416
$$

Information Gain:
$$
IG = 1.585 - 1.416 = 0.169
$$

---

## Threshold 4: $x \leq 3.05$

Left (1 Setosa, 4 Versicolor, 2 Virginica):
$$
H_L = -\!\left(\tfrac{1}{7}\log_2\tfrac{1}{7} + \tfrac{4}{7}\log_2\tfrac{4}{7} + \tfrac{2}{7}\log_2\tfrac{2}{7}\right) \approx 1.379
$$

Right (3 Setosa, 0 Versicolor, 2 Virginica):
$$
H_R = -\!\left(\tfrac{3}{5}\log_2\tfrac{3}{5} + \tfrac{2}{5}\log_2\tfrac{2}{5}\right) \approx 0.971
$$

Weighted:
$$
H_{\text{split}} = \tfrac{7}{12}\cdot 1.379 + \tfrac{5}{12}\cdot 0.971 \approx 1.208
$$

Information Gain:
$$
IG = 1.585 - 1.208 = 0.377
$$

---

## Threshold 5: $x \leq 3.15$

Left (2 Setosa, 4 Versicolor, 2 Virginica):
$$
H_L = -\!\left(\tfrac{2}{8}\log_2\tfrac{2}{8} + \tfrac{4}{8}\log_2\tfrac{4}{8} + \tfrac{2}{8}\log_2\tfrac{2}{8}\right) = 1.500
$$

Right (2 Setosa, 0 Versicolor, 2 Virginica):
$$
H_R = -\!\left(\tfrac{2}{4}\log_2\tfrac{2}{4} + \tfrac{2}{4}\log_2\tfrac{2}{4}\right) = 1.000
$$

Weighted:
$$
H_{\text{split}} = \tfrac{8}{12}\cdot 1.500 + \tfrac{4}{12}\cdot 1.000 = 1.333
$$

Information Gain:
$$
IG = 1.585 - 1.333 = 0.252
$$

---

## Threshold 6: $x \leq 3.3$

Left (4 Setosa, 4 Versicolor, 3 Virginica):
$$
H_L = -\!\left(\tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{3}{11}\log_2\tfrac{3}{11}\right) \approx 1.571
$$

Right (0 Setosa, 0 Versicolor, 1 Virginica):
$$
H_R = 0
$$

Weighted:
$$
H_{\text{split}} = \tfrac{11}{12}\cdot 1.571 + \tfrac{1}{12}\cdot 0 \approx 1.439
$$

Information Gain:
$$
IG = 1.585 - 1.439 = 0.146
$$

---

## Summary (Feature 1)

$$
\begin{array}{|c|c|c|}
\hline
\textbf{Threshold} & \mathbf{H_{\text{split}}} & \mathbf{IG} \\
\hline
2.60 & 1.439 & 0.146 \\
2.80 & 1.377 & 0.208 \\
2.95 & 1.416 & 0.169 \\
\mathbf{3.05} & \mathbf{1.208} & \mathbf{0.377} \\
3.15 & 1.333 & 0.252 \\
3.30 & 1.439 & 0.146 \\
\hline
\end{array}
$$




Among all of the candidate thresholds, the best split is at 3.05; with a weighed entropy at approximately 1.208 and with an information gain equal to 0.377. This split sends 7 samples to the left, and 5 to the right. Although 3.05 is the most helpful cut for this feature, neither branch is pure. The left side is still a mix dominated by versicolor, adn the right side is a setosa-virginica micture without any versicolor. The other thresholds in this features are significantly weaker than this one.

Moving on Feature 2:

## Feature 2 (Petal Length) — Full threshold calculations

Parent entropy:
$$
H(S)=\log_2 3 \approx 1.585
$$

---

## Threshold 1: $x \leq 1.35$

Left (1 Setosa):
$$
H_L = 0
$$

Right (3 Setosa, 4 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{3}{11}\log_2\tfrac{3}{11} + \tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{4}{11}\log_2\tfrac{4}{11}\right) \approx 1.573
$$

Weighted:
$$
H_{\text{split}} = \tfrac{1}{12}\cdot 0 + \tfrac{11}{12}\cdot 1.573 \approx 1.442
$$

Information Gain:
$$
IG = 1.585 - 1.442 = 0.143
$$

---

## Threshold 2: $x \leq 1.45$

Left (3 Setosa):
$$
H_L = 0
$$

Right (1 Setosa, 4 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{1}{9}\log_2\tfrac{1}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9}\right) \approx 1.392
$$

Weighted:
$$
H_{\text{split}} = \tfrac{3}{12}\cdot 0 + \tfrac{9}{12}\cdot 1.392 \approx 1.044
$$

Information Gain:
$$
IG = 1.585 - 1.044 = 0.541
$$

---

## Threshold 3: $x \leq 2.70$

Left (4 Setosa):
$$
H_L = 0
$$

Right (4 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{4}{8}\log_2\tfrac{4}{8} + \tfrac{4}{8}\log_2\tfrac{4}{8}\right) = 1.000
$$

Weighted:
$$
H_{\text{split}} = \tfrac{4}{12}\cdot 0 + \tfrac{8}{12}\cdot 1.000 = 0.667
$$

Information Gain:
$$
IG = 1.585 - 0.667 = 0.918
$$

---

## Threshold 4: $x \leq 4.05$

Left (4 Setosa, 1 Versicolor):
$$
H_L = -\!\left(\tfrac{4}{5}\log_2\tfrac{4}{5} + \tfrac{1}{5}\log_2\tfrac{1}{5}\right) \approx 0.722
$$

Right (3 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{3}{7}\log_2\tfrac{3}{7} + \tfrac{4}{7}\log_2\tfrac{4}{7}\right) \approx 0.985
$$

Weighted:
$$
H_{\text{split}} = \tfrac{5}{12}\cdot 0.722 + \tfrac{7}{12}\cdot 0.985 \approx 0.876
$$

Information Gain:
$$
IG = 1.585 - 0.876 = 0.709
$$

---

## Threshold 5: $x \leq 4.45$

Left (4 Setosa, 2 Versicolor):
$$
H_L = -\!\left(\tfrac{4}{6}\log_2\tfrac{4}{6} + \tfrac{2}{6}\log_2\tfrac{2}{6}\right) \approx 0.918
$$

Right (2 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{2}{6}\log_2\tfrac{2}{6} + \tfrac{4}{6}\log_2\tfrac{4}{6}\right) \approx 0.918
$$

Weighted:
$$
H_{\text{split}} = \tfrac{6}{12}\cdot 0.918 + \tfrac{6}{12}\cdot 0.918 = 0.918
$$

Information Gain:
$$
IG = 1.585 - 0.918 = 0.667
$$

---

## Threshold 6: $x \leq 4.90$

Left (4 Setosa, 3 Versicolor):
$$
H_L = -\!\left(\tfrac{4}{7}\log_2\tfrac{4}{7} + \tfrac{3}{7}\log_2\tfrac{3}{7}\right) \approx 0.985
$$

Right (1 Versicolor, 4 Virginica):
$$
H_R = -\!\left(\tfrac{1}{5}\log_2\tfrac{1}{5} + \tfrac{4}{5}\log_2\tfrac{4}{5}\right) \approx 0.722
$$

Weighted:
$$
H_{\text{split}} = \tfrac{7}{12}\cdot 0.985 + \tfrac{5}{12}\cdot 0.722 \approx 0.876
$$

Information Gain:
$$
IG = 1.585 - 0.876 = 0.709
$$

---

## Threshold 7: $x \leq 5.30$

Left (4 Setosa, 4 Versicolor):
$$
H_L = -\!\left(\tfrac{4}{8}\log_2\tfrac{4}{8} + \tfrac{4}{8}\log_2\tfrac{4}{8}\right) = 1.000
$$

Right (4 Virginica):
$$
H_R = 0
$$

Weighted:
$$
H_{\text{split}} = \tfrac{8}{12}\cdot 1.000 + \tfrac{4}{12}\cdot 0 = 0.667
$$

Information Gain:
$$
IG = 1.585 - 0.667 = 0.918
$$

---

## Threshold 8: $x \leq 5.55$

Left (4 Setosa, 4 Versicolor, 1 Virginica):
$$
H_L = -\!\left(\tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{4}{9}\log_2\tfrac{4}{9} + \tfrac{1}{9}\log_2\tfrac{1}{9}\right) \approx 1.392
$$

Right (3 Virginica):
$$
H_R = 0
$$

Weighted:
$$
H_{\text{split}} = \tfrac{9}{12}\cdot 1.392 + \tfrac{3}{12}\cdot 0 \approx 1.044
$$

Information Gain:
$$
IG = 1.585 - 1.044 = 0.541
$$

---

## Threshold 9: $x \leq 5.70$

Left (4 Setosa, 4 Versicolor, 2 Virginica):
$$
H_L = -\!\left(\tfrac{4}{10}\log_2\tfrac{4}{10} + \tfrac{4}{10}\log_2\tfrac{4}{10} + \tfrac{2}{10}\log_2\tfrac{2}{10}\right) \approx 1.522
$$

Right (2 Virginica):
$$
H_R = 0
$$

Weighted:
$$
H_{\text{split}} = \tfrac{10}{12}\cdot 1.522 + \tfrac{2}{12}\cdot 0 \approx 1.268
$$

Information Gain:
$$
IG = 1.585 - 1.268 = 0.317
$$

---

## Threshold 10: $x \leq 5.85$

Left (4 Setosa, 4 Versicolor, 3 Virginica):
$$
H_L = -\!\left(\tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{4}{11}\log_2\tfrac{4}{11} + \tfrac{3}{11}\log_2\tfrac{3}{11}\right) \approx 1.573
$$

Right (1 Virginica):
$$
H_R = 0
$$

Weighted:
$$
H_{\text{split}} = \tfrac{11}{12}\cdot 1.573 + \tfrac{1}{12}\cdot 0 \approx 1.442
$$

Information Gain:
$$
IG = 1.585 - 1.442 = 0.143
$$

---

## Summary (Feature 2)

$$
\begin{array}{|c|c|c|}
\hline
\textbf{Threshold} & \mathbf{H_{\text{split}}} & \mathbf{IG} \\
\hline
1.35 & 1.442 & 0.143 \\
1.45 & 1.044 & 0.541 \\
\mathbf{2.70} & \mathbf{0.667} & \mathbf{0.918} \\
4.05 & 0.876 & 0.709 \\
4.45 & 0.918 & 0.667 \\
4.90 & 0.876 & 0.709 \\
\mathbf{5.30} & \mathbf{0.667} & \mathbf{0.918} \\
5.55 & 1.044 & 0.541 \\
5.70 & 1.268 & 0.317 \\
5.85 & 1.442 & 0.143 \\
\hline
\end{array}
$$


Feature 2, unlike feature 1 turned out to be a strong splitter. It contains two thresholds, 2.70 and 5.30 that tied for the highest infformation gain. At 2.70, setosa becomes a pure leaf and the remainder is versicolor and virginica. At 5.30, virginica manages to become a pure leaf, leaving a remainder of setosa and versicolor. The rest of the thresholds are significantly worse than these two.

## Feature 3 (Petal Width) — Full threshold calculations

Parent entropy:
$$
H(S)=\log_2 3 \approx 1.585
$$

---

## Threshold 1: $x \leq 0.15$

Left: 1 Setosa; Right: 3 Setosa, 4 Versicolor, 4 Virginica.

$$
H_L=0,\quad
H_R=-\!\left(\tfrac{3}{11}\log_2\tfrac{3}{11}+\tfrac{4}{11}\log_2\tfrac{4}{11}+\tfrac{4}{11}\log_2\tfrac{4}{11}\right)\approx 1.573
$$

Weighted:
$$
H_{\text{split}}=\tfrac{1}{12}\cdot 0+\tfrac{11}{12}\cdot 1.573 \approx 1.442
$$

Information Gain:
$$
IG=1.585-1.442=0.143
$$

---

## Threshold 2: $x \leq 0.70$

Left: 4 Setosa; Right: 4 Versicolor, 4 Virginica.

$$
H_L=0,\quad
H_R=-\!\left(\tfrac{4}{8}\log_2\tfrac{4}{8}+\tfrac{4}{8}\log_2\tfrac{4}{8}\right)=1.000
$$

Weighted:
$$
H_{\text{split}}=\tfrac{4}{12}\cdot 0+\tfrac{8}{12}\cdot 1.000=0.667
$$

Information Gain:
$$
IG=1.585-0.667=0.918
$$

---

## Threshold 3: $x \leq 1.30$

Left: 4 Setosa, 2 Versicolor; Right: 2 Versicolor, 4 Virginica.

$$
H_L=-\!\left(\tfrac{4}{6}\log_2\tfrac{4}{6}+\tfrac{2}{6}\log_2\tfrac{2}{6}\right)\approx 0.918,\quad
H_R=-\!\left(\tfrac{2}{6}\log_2\tfrac{2}{6}+\tfrac{4}{6}\log_2\tfrac{4}{6}\right)\approx 0.918
$$

Weighted:
$$
H_{\text{split}}=\tfrac{6}{12}\cdot 0.918+\tfrac{6}{12}\cdot 0.918=0.918
$$

Information Gain:
$$
IG=1.585-0.918=0.667
$$

---

## Threshold 4: $x \leq 1.50$

Left: 4 Setosa, 3 Versicolor; Right: 1 Versicolor, 4 Virginica.

$$
H_L=-\!\left(\tfrac{4}{7}\log_2\tfrac{4}{7}+\tfrac{3}{7}\log_2\tfrac{3}{7}\right)\approx 0.985,\quad
H_R=-\!\left(\tfrac{1}{5}\log_2\tfrac{1}{5}+\tfrac{4}{5}\log_2\tfrac{4}{5}\right)\approx 0.722
$$

Weighted:
$$
H_{\text{split}}=\tfrac{7}{12}\cdot 0.985+\tfrac{5}{12}\cdot 0.722 \approx 0.876
$$

Information Gain:
$$
IG=1.585-0.876=0.709
$$

---

## Threshold 5: $x \leq 1.70$

Left: 4 Setosa, 4 Versicolor; Right: 4 Virginica.

$$
H_L=-\!\left(\tfrac{4}{8}\log_2\tfrac{4}{8}+\tfrac{4}{8}\log_2\tfrac{4}{8}\right)=1.000,\quad
H_R=0
$$

Weighted:
$$
H_{\text{split}}=\tfrac{8}{12}\cdot 1.000+\tfrac{4}{12}\cdot 0=0.667
$$

Information Gain:
$$
IG=1.585-0.667=0.918
$$

---

## Threshold 6: $x \leq 1.95$

Left: 4 Setosa, 4 Versicolor, 1 Virginica; Right: 3 Virginica.

$$
H_L=-\!\left(\tfrac{4}{9}\log_2\tfrac{4}{9}+\tfrac{4}{9}\log_2\tfrac{4}{9}+\tfrac{1}{9}\log_2\tfrac{1}{9}\right)\approx 1.392,\quad
H_R=0
$$

Weighted:
$$
H_{\text{split}}=\tfrac{9}{12}\cdot 1.392+\tfrac{3}{12}\cdot 0 \approx 1.044
$$

Information Gain:
$$
IG=1.585-1.044=0.541
$$

---

## Threshold 7: $x \leq 2.20$

Left: 4 Setosa, 4 Versicolor, 2 Virginica; Right: 2 Virginica.

$$
H_L=-\!\left(\tfrac{4}{10}\log_2\tfrac{4}{10}+\tfrac{4}{10}\log_2\tfrac{4}{10}+\tfrac{2}{10}\log_2\tfrac{2}{10}\right)\approx 1.522,\quad
H_R=0
$$

Weighted:
$$
H_{\text{split}}=\tfrac{10}{12}\cdot 1.522+\tfrac{2}{12}\cdot 0 \approx 1.268
$$

Information Gain:
$$
IG=1.585-1.268=0.317
$$

---

## Threshold 8: $x \leq 2.35$

Left: 4 Setosa, 4 Versicolor, 3 Virginica; Right: 1 Virginica.

$$
H_L=-\!\left(\tfrac{4}{11}\log_2\tfrac{4}{11}+\tfrac{4}{11}\log_2\tfrac{4}{11}+\tfrac{3}{11}\log_2\tfrac{3}{11}\right)\approx 1.573,\quad
H_R=0
$$

Weighted:
$$
H_{\text{split}}=\tfrac{11}{12}\cdot 1.573+\tfrac{1}{12}\cdot 0 \approx 1.442
$$

Information Gain:
$$
IG=1.585-1.442=0.143
$$

---

## Summary (Feature 3)

$$
\begin{array}{|c|c|c|}
\hline
\textbf{Threshold} & \mathbf{H_{\text{split}}} & \mathbf{IG} \\
\hline
0.15 & 1.442 & 0.143 \\
\mathbf{0.70} & \mathbf{0.667} & \mathbf{0.918} \\
1.30 & 0.918 & 0.667 \\
1.50 & 0.876 & 0.709 \\
\mathbf{1.70} & \mathbf{0.667} & \mathbf{0.918} \\
1.95 & 1.044 & 0.541 \\
2.20 & 1.268 & 0.317 \\
2.35 & 1.442 & 0.143 \\
\hline
\end{array}
$$



Feature 3, just like feature 0 and 1, turned out to be one of the strongest splitters. It has two thresholds; 0.70 and 1.70 that tie for the top information gain at 0.918. At 0.70, all setosa form a pure leaf immidiately, leaving a clean two class remainder (versicolor and virginica). At 1.70, virginica becomes a pure leaf on the right, with setosa and versicolor on the left. The rest of the thresholds are significantly worse

## Best splits per feature

$$
\begin{array}{|c|c|c|c|}
\hline
\textbf{Feature} & \textbf{Best Threshold(s)} & \mathbf{H_{\text{split}}} & \mathbf{IG} \\
\hline
\text{Feature 0 (Sepal Length)} & 5.30,\; 6.20 & 0.667 & 0.918 \\
\hline
\text{Feature 1 (Sepal Width)} & 3.05 & 1.208 & 0.377 \\
\hline
\text{Feature 2 (Petal Length)} & 2.70,\; 5.30 & 0.667 & 0.918 \\
\hline
\text{Feature 3 (Petal Width)} & 0.70,\; 1.70 & 0.667 & 0.918 \\
\hline
\end{array}
$$


I chose the petal width ≤ 0.70 as the root of the decision tree not only because it ties for the highest information gain along with the other strong thresholds, but also because it sends all setosa to a pure leaf in one step and leaves a compact two class problem on the other branch. I didn't choose epal Length at 5.30 or 6.20 as the root, even though they tied on information gain, because they leave a more heretogenous remainder and did not set up as natural a follow up split. I didn't choose petal length at 2.70 or 5.30 for the same reasons - they were equally strong numerically, but petal width bbetter matches the structure of the data by isolating setosa with a single narrow petal threshold, making the tree shorted and more interpretable.

<pre>
                 ┌──────────────────────────────┐
                 │ Petal Width ≤ 0.70 ?         │
                 └──────────────────────────────┘
                        /              
                       /                 
           ┌──────────────────┐
           │     Setosa       │ 
           └──────────────────┘ 
</pre>


After choosing the root of the parent node as petal width <= 0.70, I chose petal length ≤ 5.30 on the right branch because it cleanly seperated the two remaining classes with one rule while keeping the overall depth at two decisions. I did not choose Petal Width ≤ 1.70 there, althought it also ties on information gain, because using a different but complementary petal measure produced a slightly clearer rule set and it avoids repeating the same feature at consecutive levels. I avoided sepal widt for the early splits because its best information gain was far weaker, which in turn would leave to a messier tree with weaker purity.
 

The final decision tree wil be as follows:

<pre>
                 ┌──────────────────────────────┐
                 │ Petal Width ≤ 0.70 ?         │
                 └──────────────────────────────┘
                        /                 \
                       /                   \
           ┌──────────────────┐      ┌──────────────────────────┐
           │     Setosa       │      │ Petal Length ≤ 5.30 ?    │
           └──────────────────┘      └──────────────────────────┘
                                         /               \
                                        /                 \
                              ┌──────────────────┐   ┌──────────────────┐
                              │   Versicolor     │   │    Virginica     │
                              └──────────────────┘   └──────────────────┘

</pre>


## Chosen rules

$$
\hat{y}(\text{PW},\text{PL})=
\begin{cases}
\text{Setosa}, & \text{if } \text{PW}\le 0.70,\\[6pt]
\text{Versicolor}, & \text{if } \text{PW}>0.70 \ \land\ \text{PL}\le 5.30,\\[6pt]
\text{Virginica}, & \text{otherwise.}
\end{cases}
$$

## Test instances (rows 12–14) with predictions

$$
\begin{array}{|c|c|c|c|c|c|c|}
\hline
\textbf{Row} & \textbf{Sepal L} & \textbf{Sepal W} & \textbf{Petal L (PL)} & \textbf{Petal W (PW)} & \textbf{True} & \textbf{Pred} \\
\hline
12 & 4.6 & 3.2 & 1.4 & 0.2 & \text{Setosa} & \text{Setosa} \\
13 & 5.1 & 2.5 & 3.0 & 1.1 & \text{Versicolor} & \text{Versicolor} \\
14 & 7.2 & 3.0 & 5.8 & 1.6 & \text{Virginica} & \text{Virginica} \\
\hline
\end{array}
$$

## Test accuracy

$$
\text{Accuracy}=\frac{3}{3}=1.00\;(100\%).
$$


As the accuracy shows, this minor test set is perfectly classified because the petal features in this subset are seperated with clear margins; setosa has very small petals, virginica has consistently longer patters than versicolor. The tree is also very shallow, helping the interpretability and reduces overfitting risk for this small problem. Another key thing to remember is that the second split choice matters; if I had chosen to split the right branch on Petal Width ≤ 1.70 instead of petal length, the sample 14 would have been micclassified as versicolor, which would reduce the accuracy to 2/3.

This shows that even though several of the splits tied on information gain and training time, it is still important to pick splits that reflects the class geometry better. Only haing three test instances, the estimate is high variance. A larger test or cross validation would have given a more reliable assessment.

