# [Machine Learning](https://github.com/marcocrowe/learn-machine-learning)

## Sample Question - Weather Entropy

The following dataset contains the descriptive features (`Humid`, `Cloudy`, `Windy`) which determine whether it will `Rain` (target feature). Given the dataset:

| #  | Humid | Cloudy | Windy | Rain |
|----|-------|--------|-------|------|
| 1  | True  | False  | True  | Yes  |
| 2  | True  | True   | False | Yes  |
| 3  | True  | True   | False | Yes  |
| 4  | False | True   | True  | No   |
| 5  | False | False  | False | No   |
| 6  | False | False  | False | No   |

### Part 1 - Discuss what is meant by Entropy. [7 Marks]

Use both the definition and formula for Entropy to clarify your answer.

#### Answer

Entropy is a measure of randomness or uncertainty in a dataset. In the context of decision trees and classification problems, entropy is used to quantify the impurity of a collection of examples. A dataset with high entropy has a lot of disorder, meaning the classes are distributed randomly or evenly. On the other hand, a dataset with low entropy has less disorder, indicating that the examples belong predominantly to one class.

In lay terms entropy is the 'sameness' or homogeneity in a dataset measured between 0 and 1. If the entropy is 0, it means that the dataset is perfectly homogenous, and all examples belong are teh 'same'. If the entropy is 1, it means that the dataset is completely random, and the examples are not the 'same'.

The formula for entropy is given by:

$$ \text{Entropy}(S) = - \sum_{i=1}^{c} p_i \log_2(p_i) $$

where:
- $S$ is the dataset
- $c$ is the number of classes in the dataset
- $\sum_{i=1}^{c}$ is the sum of the entropy for each class

### Part 2 - Calculate the Entropy [6 Marks]

Calculate the Entropy for the entire dataset above using the target feature.


$S =6$ (total number of examples)  

The classes for the target feature `Rain` are `Yes` and `No`.  The number of classes $c = 2$

The sum of `Yes` is 3 and the sum of `No` is 3.

$P(1) = P(Rain = Yes) = \frac{3}{6} = 0.5$  
$P(2) = P(Rain = No) = \frac{3}{6} = 0.5$

$ Entropy(S) = - \sum_{i=1}^{c} p_i \log_2(p_i) $

$ Entropy(S) = - (p_1 \log_2(p_1) + p_2 \log_2(p_2)) $ (since $c = 2$)  

Plug in the values into the formula:  

$ Entropy(S) = - (0.5 \log_2(0.5) + 0.5 \log_2(0.5)) $

$ Entropy(S) = - (0.5 \times -1 + 0.5 \times -1) $

$ Entropy(S) = - (-0.5 + -0.5) $

$ Entropy(S) = - (-1) $

$ Entropy(S) = 1 $



### Alt Sample Question


Calculate the Entropy for the entire dataset above using the Windy feature.

$S =6$ (total number of examples)

The classes for the target feature `Windy` are `True` and `False`.  The number of classes $c = 2$

The sum of `True` is 2 and the sum of `False` is 4.

$P(1) = P(Windy = True) = \frac{2}{6} = 0.33$  
$P(2) = P(Windy = False) = \frac{4}{6} = 0.67$  

$Entropy(S) = - \sum_{i=1}^{c} p_i \log_2(p_i)$

Plug in the values:

$Entropy(S) = - \left( \frac{2}{6} \log_2(\frac{2}{6}) + \frac{4}{6} \log_2(\frac{4}{6}) \right) \\ = (0.33 \times -1.58) + (0.67 \times -0.58) $
$= 0.92$




$\cdot\frac{\ln\left(.33\right)}{\ln\left(2\right)}+0.67\cdot\frac{\ln\left(.67\right)}{\ln\left(2\right)}$


### Part 3 - Calculate the information gain [12 Marks]

Demonstrate how you would calculate the information gain for each of the above features (`Humid`, `Cloudy`, `Windy`).

#### Answer

Information gain measures the effectiveness of a feature in classifying the dataset. It indicates how much entropy is reduced when a dataset is split on a particular feature.

To calculate information gain for a feature, we first calculate the weighted average of the entropies of the resulting subsets after splitting the dataset on that feature

Our feature set is `Humid`, `Cloudy`, and `Windy`. We will calculate the information gain for each feature.

The formula for information gain is given by:

$$ Information Gain(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \times Entropy(S_v) $$

where:

- $S$ is the dataset
- $A$ is the feature
- $Values(A)$ is the set of possible values of feature $A$
- $S_v$ is the subset of $S$ for which feature $A$ has value $v$
- $|S|$ is the total number of examples in $S$
- $|S_v|$ is the number of examples in $S_v$
- $Entropy(S)$ is the entropy of the dataset $S$
- $Entropy(S_v)$ is the entropy of the subset $S_v$
- $Information Gain(S, A)$ is the information gain of feature $A$ on dataset $S$
- $v$ is a value of feature $A$

Let's calculate the information gain for each feature:

1. **Humid**:

- Split the dataset based on the `Humid` feature, we have two classes `True` and `False`, $c = 2$
  - $P(1) = P(Humid = True) = \frac{3}{6} = 0.5$  
  - $P(2) = P(Humid = False) = \frac{3}{6} = 0.5$  
- Calculate the entropy of the subsets:
  - For `Humid = True`:
      - Number of examples = 3
      - Number of `Yes` = 3
      - Number of `No` = 0
      - $Entropy(S_{Humid=True}) = 0$ (since all examples are of the same class)
  - For `Humid = False`:
      - Number of examples = 3
      - Number of `Yes` = 0
      - Number of `No` = 3
      - $Entropy(S_{Humid=False}) = 0$ (since all examples are of the same class)
- Calculate the information gain:
  - $ Information Gain(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \times Entropy(S_v) $
  - $Information Gain(S, Humid) = Entropy(S) -  \frac{3}{6} \times 0 + \frac{3}{6} \times 0  = 1 - 0 = 1$

---

Copyright &copy; 2024 Mark Crowe <https://github.com/marcocrowe>. All rights reserved.
