# [Machine Learning](https://github.com/marcocrowe/learn-machine-learning "Machine Learning")

## Sample Question - Weather Entropy

The following dataset contains the descriptive features (`Humid`, `Cloudy`, `Windy`) which determine whether it will `Rain` (target feature). Given the dataset:

| #  | Humid | Cloudy | Windy | Rain |
|----|-------|--------|-------|------|
| 1  | True  | False  | True  | Yes  |
| 2  | True  | True   | False | Yes  |
| 3  | True  | True   | False | Yes  |
| 4  | False | True   | True  | No   |
| 5  | False | False  | False | No   |
| 6  | False | False  | False | No   |

### Part 1 - Discuss what is meant by Entropy. [7 Marks]

Use both the definition and formula for Entropy to clarify your answer.

#### Answer

Entropy is a measure of randomness or uncertainty in a dataset. In the context of decision trees and classification problems, entropy is used to quantify the impurity of a collection of examples. A dataset with high entropy has a lot of disorder, meaning the classes are distributed randomly or evenly. On the other hand, a dataset with low entropy has less disorder, indicating that the examples belong predominantly to one class.

In lay terms entropy is the 'sameness' or homogeneity in a dataset measured between 0 and 1. If the entropy is 0, it means that the dataset is perfectly homogenous, and all examples belong are the 'same'. If the entropy is 1, it means that the dataset is completely random, and the examples are not the 'same'.

The formula for entropy is given by:

$$ \text{Entropy}(S) = - \sum_{i=1}^{c} p_i \log_2(p_i) $$

where:
- $S$ is the dataset
- the classes of the target feature `Rain` are `Yes` and `No`
- $i$ is the index of the class. i.e. $i=1$ represents the first class is `Yes` and $i=2$ represents the second class is `No`
- $p_i$ is the proportion of examples in class $i$ in the dataset
- $c$ is the number of classes in the feature/target variable of interest.
- $\sum_{i=1}^{c}$ is the sum of the entropy for each class

### Part 2 - Calculate the Entropy [6 Marks]

Calculate the Entropy for the entire dataset above using the target feature.


$S =6$ (total number of examples)  

The classes for the target feature `Rain` are `Yes` and `No`.  The number of classes $c = 2$

The sum of `Yes` is 3 and the sum of `No` is 3.

$P(i=1) = P( \text{Rain} = Yes) = \frac{3}{6} = 0.5$  
$P(i=2) = P( \text{Rain} = No) = \frac{3}{6} = 0.5$

$ \text{Entropy}(S) = - \sum_{i=1}^{c} p_i \log_2(p_i) $

since $c = 2$
 
$ \text{Entropy}(S) = - (p_1 \log_2(p_1) + p_2 \log_2(p_2)) $ 

Plug in the values into the formula:  

$ \text{Entropy}(S) = - (0.5 \log_2(0.5) + 0.5 \log_2(0.5)) $

$ \text{Entropy}(S) = - (0.5 \times -1 + 0.5 \times -1) $

$ \text{Entropy}(S) = - (-0.5 + -0.5) $

$ \text{Entropy}(S) = - (-1) $

$ \text{Entropy}(S) = 1 $



### Alt Sample Question


Calculate the Entropy for the entire dataset above using the Windy feature.

$S =6$ (total number of examples)

The classes for the target feature `Windy` are `True` and `False`.  The number of classes $c = 2$

The sum of `True` is 2 and the sum of `False` is 4.

$P(1) = P(Windy = True) = \frac{2}{6} = 0.33$  
$P(2) = P(Windy = False) = \frac{4}{6} = 0.67$  

$Entropy(S) = - \sum_{i=1}^{c} p_i \log_2(p_i)$

Plug in the values:

$Entropy(S) = - \left( \frac{2}{6} \log_2(\frac{2}{6}) + \frac{4}{6} \log_2(\frac{4}{6}) \right) \\ = (0.33 \times -1.58) + (0.67 \times -0.58) $
$= 0.92$




$\cdot\frac{\ln\left(.33\right)}{\ln\left(2\right)}+0.67\cdot\frac{\ln\left(.67\right)}{\ln\left(2\right)}$


### Part 3 - Calculate the information gain [12 Marks]

Demonstrate how you would calculate the information gain for each of the above features (`Humid`, `Cloudy`, `Windy`).

#### Answer

Information gain measures the effectiveness of a feature in classifying the dataset. It indicates how much entropy is reduced when a dataset is split on a particular feature.

To calculate information gain for a feature, we first calculate the weighted average of the entropies of the resulting subsets after splitting the dataset on that feature

Our feature set is `Humid`, `Cloudy`, and `Windy`. We will calculate the information gain for each feature.

The formula for information gain is given by:

$$ Information Gain(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \times Entropy(S_v) $$

where:

- $S$ is the dataset
- $A$ is the feature
- $Values(A)$ is the set of possible values of feature $A$
- $S_v$ is the subset of $S$ for which feature $A$ has value $v$
- $|S|$ is the total number of examples in $S$
- $|S_v|$ is the number of examples in $S_v$
- $Entropy(S)$ is the entropy of the dataset $S$
- $Entropy(S_v)$ is the entropy of the subset $S_v$
- $Information Gain(S, A)$ is the information gain of feature $A$ on dataset $S$
- $v$ is a value of feature $A$

Let's calculate the information gain for each feature:

1. **Humid**:

- Split the dataset based on the `Humid` feature, we have two classes `True` and `False`, $c = 2$
  - $P(1) = P(Humid = True) = \frac{3}{6} = 0.5$  
  - $P(2) = P(Humid = False) = \frac{3}{6} = 0.5$  
- Calculate the entropy of the subsets:
  - For `Humid = True`:
      - Number of examples = 3
      - Number of `Yes` = 3
      - Number of `No` = 0
      - $Entropy(S_{Humid=True}) = 0$ (since all examples are of the same class)
  - For `Humid = False`:
      - Number of examples = 3
      - Number of `Yes` = 0
      - Number of `No` = 3
      - $Entropy(S_{Humid=False}) = 0$ (since all examples are of the same class)
- Calculate the information gain:
  - $ Information Gain(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \times Entropy(S_v) $
  - $Information Gain(S, Humid) = Entropy(S) -  \frac{3}{6} \times 0 + \frac{3}{6} \times 0  = 1 - 0 = 1$

## Lab Question 9

In [5]:
import matplotlib.pyplot as plt
from pandas import DataFrame, read_csv
from sklearn.tree import DecisionTreeClassifier

In [2]:
dataframe = read_csv("data.csv")

In [6]:
dataframe

Unnamed: 0,Age,Education,Marital Status,Occupation,Annual Income
0,39,bachelors,never married,transport,25K-50K
1,50,bachelors,married,professional,25K-50K
2,18,high school,never married,agriculture,<25K
3,28,bachelors,married,professional,25K-50K
4,37,high school,married,agriculture,25K-50K
5,24,high school,never married,armed forces,<25K
6,52,high school,divorced,transport,25K-50K
7,40,doctorate,married,professional,>50K


In [10]:
features = dataframe.columns[0:-1]
target = dataframe.columns[-1]
decisionTreeClassifier = DecisionTreeClassifier(criterion="entropy")
decisionTreeClassifier.fit(dataframe[features], dataframe[target])

ValueError: could not convert string to float: 'bachelors'


# Information Based Learning - ID3 Algorithm



The dataset below describes the predictive annual income of individuals based on the descriptive features `Age`, `Education`, `Marital Status` and `Occupation`.



## Question 1

Calculate the entropy for the entire dataset. The `Annual Income` is the target feature.



### Answer 1

The entropy of the entire dataset is calculated as follows:

$$ \text{Entropy(\text{Annual Income})} = - \sum_{i=1}^{n} p_i \log_2 p_i $$

where $p_i$ is the probability of the $i$th class.

The probability of each class of `Annual Income` is calculated as follows:

$p(\text{<25K}) = \frac{2}{8} = 0.25$  
$p(\text{25K-50K}) = \frac{5}{8} = 0.625$  
$p(\text{>50K}) = \frac{1}{8} = 0.125$  

The entropy of the entire dataset is calculated as follows:

$\text{Entropy(\text{AI})} = - (\text{Entropy(\text{AI=<25K})} + \text{Entropy(\text{AI=25K-50K})} + \text{Entropy(\text{AI=>50K})})$  
$\text{Entropy(\text{AI})} = - ((0.25 \log_2 0.25) + (0.625 \log_2 0.625) + (0.125 \log_2 0.125))$  
$\text{Entropy(\text{AI})} = - ((0.25 \times -2) + (0.625 \times 0.6781) + (0.125 \times -3))$  
$\text{Entropy(\text{AI})} = - (-0.5 + -0.4238 + -0.375)$  
$\text{Entropy(\text{AI})} = - (-1.2988)$  
$\text{Entropy(\text{AI})} = 1.2988$



| Id | Age | Education   | Marital Status | Occupation   | Annual Income |
|----|-----|-------------|----------------|--------------|---------------|
| 1  | 39  | bachelors   | never married  | transport    | 25K-50K       |
| 2  | 50  | bachelors   | married        | professional | 25K-50K       |
| 3  | 18  | high school | never married  | agriculture  | <25K          |
| 4  | 28  | bachelors   | married        | professional | 25K-50K       |
| 5  | 37  | high school | married        | agriculture  | 25K-50K       |
| 6  | 24  | high school | never married  | armed forces | <25K          |
| 7  | 52  | high school | divorced       | transport    | 25K-50K       |
| 8  | 40  | doctorate   | married        | professional | >50K          |



## Question 2



Using this dataset construct the decision tree that would be generated by the ID3 algorithm: using entropy-based information gain. (only use the `Education` `Marital Status`, `Occupation` descriptive features)

Clearly show the entropy and information gain for each feature that was generated by the ID3 algorithm.



### Answer 2

The decision tree generated by the ID3 algorithm is as follows:

1. **Root Node**: The root node is the feature with the highest information gain.  
   - Calculate the information gain for each feature: `Education`, `Marital Status`, and `Occupation`.

$$ \text{Information Gain} = \text{Entropy(\text{Annual Income})} - \text{Entropy(\text{Annual Income} | \text{Feature})} $$

   - Calculate the entropy for each feature.
 - **Education**:
   - Calculate the entropy for each class of `Education`.
   - Calculate the information gain for `Education`.
   - $p(\text{bachelors}) = \frac{3}{8} = 0.375$
   - $p(\text{high school}) = \frac{4}{8} = 0.5$
   - $p(\text{doctorate}) = \frac{1}{8} = 0.125$
   - $ \text{Entropy(\text{AI | Education})} = - \sum_{i=1}^{n} p_i \text{Entropy(\text{AI=class})} $
   - $ \text{Entropy(\text{AI | Education})} = -(\text{Entropy(\text{AI=bachelors})} + \text{Entropy(\text{AI=high school})} + \text{Entropy(\text{AI=doctorate})})$
   - $\text{Entropy(\text{AI | Education})} = - (0.375 \times log_2 0.375) + (0.5 \times log_2 0.5) + (0.125 \times log_2 0.125)$
   - $\text{Entropy(\text{AI | Education})} = - (0.375 \times -1.415) + (0.5 \times -1) + (0.125 \times -3)$
   - $\text{Entropy(\text{AI | Education})} = - (-0.5306 - 0.5 - 0.375)$  
   - $\text{Entropy(\text{AI | Education})} = -(-1.4056) = 1.4056$  
   - $S_{\text{bachelors}} = 3$
   - $S_{\text{high school}} = 4$
   - $S_{\text{doctorate}} = 1$


$ \text{Information Gain} = \text{Entropy(\text{Annual Income})} - \sum_{i=1}^{n} \left( \frac{|S_i|}{|S|} \times \text{Entropy}(S_i) \right) $


## Question 3

A colleague suggests that the feature `Marital Status` should be the root of the tree. Would you agree with this? Clearly explain your reasoning.


---

Copyright &copy; 2024 Mark Crowe <https://github.com/marcocrowe>. All rights reserved.
