# Project 1 - Information measures

The goal of this first project is to get accustomed to the information and uncertainty measures. We ask you to write a brief report (pdf format) collecting your answers to the different questions. All codes must be written in Python inside this Jupyter Notebook. No other code file will be accepted. Note that you can not change the content of locked cells or import any extra Python library than the ones already imported (numpy and pandas).

## Implementation

In this project, you will need to use information measures to answer several questions. Therefore, in this first part, you are asked to write several functions that implement some of the main measures seen in the first theoretical lectures. Remember that you need to fill in this Jupyter Notebook to answer these questions. Pay particular attention to the required output format of each function.

In [50]:
# [Locked Cell] You can not import any extra Python library in this Notebook.
import numpy as np
import pandas as pd

### Question 1

Write a function entropy that computes the entropy $\mathcal{H(X)}$ of a random variable $\mathcal{X}$ from its probability distribution $P_\mathcal{X} = (p_1, p_2, . . . , p_n)$. Give the mathematical formula that you are using and explain the key parts of your implementation. Intuitively, what is measured by the entropy?

In [51]:
def entropy(Px):
    """
    Computes the entropy from the marginal probability distribution. 
    Arguments:
    ----------
    - Px :  Marginal probability distribution of the random 
            variable X in a numpy array where Px[i]=P(X=i)
    Return:
    -------
    - The entropy of X (H(X)) as a number (integer, float or double).
    """
    Px_no_0 = Px.copy()
    # replace the probability zero with a value for the log not to raise an error
    # this is not a big deal since the value of the log will be absorbed by the multiplication with zero
    Px_no_0[Px_no_0 == 0] = 1 
    return -np.sum(Px * np.log2(Px_no_0)) # We use the Shannon Entropy formula

### Question 2

Write a function joint_entropy that computes the joint entropy $\mathcal{H(X,Y)}$ of two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Compare the entropy and joint_entropy functions (and their corresponding formulas), what do you notice?

In [52]:
def joint_entropy(Pxy):
    """
    Computes the joint entropy from the joint probability distribution.  
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The joint entropy H(X,Y) as a number (integer, float or double).
    """
    Pxy_no_0 = Pxy.copy()
    for i in range(len(Pxy_no_0)):
        Pxy_no_0[i][Pxy_no_0[i] == 0] = 1 # same as before replace the zeroes by a value
    # Thanks to numpy 2D-arrays and 1D-arrays are handle the same way using the same functions
    return -np.sum(Pxy * np.log2(Pxy_no_0))

### Question 3

Write a function conditional_entropy that computes the conditional entropy $\mathcal{H(X|Y)}$ of a discrete random variable $\mathcal{X}$ given another discrete random variable $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Describe an equivalent way of computing that quantity.

In [53]:
def conditional_entropy(Pxy):
    """
    Computes the conditional entropy from the joint probability distribution.
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The conditional entropy H(X|Y) as a number (integer, float or double)
    """
    Py = np.sum(Pxy, axis=0)
    Py[Py == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    Px_given_y = Pxy / Py # P[x|y] = P[x, y]/P[y]
    Px_given_y[Px_given_y == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    return -np.sum(Pxy * np.log2(Px_given_y)) # the formula is self explanatory

### Question 4

Write a function mutual_information that computes the mutual information $\mathcal{I(X;Y)}$ between two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$ . Give the mathematical formula that you are using and explain the key parts of your implementation. What can you deduce from the mutual information $\mathcal{I(X;Y)}$ on the relationship between $\mathcal{X}$ and $\mathcal{Y}$? Discuss.

In [54]:
def mutual_information(Pxy):
    """
    Computes the mutual information I(X;Y) from joint probability distribution
    
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The mutual information I(X;Y) as a number (integer, float or double)
    """
    Px = np.sum(Pxy, axis=1)
    Py = np.sum(Pxy, axis=0)
    return entropy(Px) + entropy(Py) - joint_entropy(Pxy) # apply the formula I(X, Y) = H(X) + H(Y) - H(X, Y)

### Question 5

Let $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ be three discrete random variables. Write the functions cond_joint_entropy and cond_mutual_information that respectively compute $\mathcal{H(X,Y|Z)}$ and $\mathcal{I(X;Y|Z)}$ of two discrete random variable $\mathcal{X}$, $\mathcal{Y}$ given another discrete random variable $\mathcal{Z}$ from their joint probability distribution $P_\mathcal{X,Y,Z}$. Give the mathematical formulas that you are using and explain the key parts of your implementation.
Suggestion: Observe the mathematical definitions of these quantities and think how you could derive them from the joint entropy and the mutual information.

In [55]:
def cond_joint_entropy(Pxyz):
    """
    Computes the conditional joint entropy of X, Y knowing Z 
    from the joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - The conditional joint entropy H(X,Y|Z) as a number (integer, float or double)
    
    """
    Pz = np.sum(Pxyz, axis=(0,1))
    Pz[Pz == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    Pxy_given_z = Pxyz / Pz
    Pxy_given_z[Pxy_given_z == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    return -np.sum(Pxyz*np.log2(Pxy_given_z)) # H(X,Y|Z) = -sum_xyz(P[X,Y,Z]*log_2(P[X,Y|Z]))

In [56]:
def cond_mutual_information(Pxyz):
    """
    Computes the conditional mutual information of X, Y knowing Z 
    from joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - I(X;Y|Z): The conditional mutual information as a number (integer, float or double)
    
    """
    # Compute the conditional entropy of X given Z
    H_X_given_Z = conditional_entropy(np.sum(Pxyz, axis=1))
    
    # Compute the conditional entropy of Y given Z
    H_Y_given_Z = conditional_entropy(np.sum(Pxyz, axis=0))
    
    # Compute the conditional joint entropy of X and Y given Z
    H_XY_given_Z = cond_joint_entropy(Pxyz)
    
    # Compute the conditional mutual information
    return H_X_given_Z + H_Y_given_Z - H_XY_given_Z


In [57]:
# [Locked Cell] Evaluation of your functions by the examiner. 
# You don't have access to the evaluation, this will be done by the examiner.
# Therefore, this cell will return nothing for the students.
import os
if os.path.isfile("private_evaluation.py"):
    from private_evaluation import unit_tests
    unit_tests(entropy, joint_entropy, conditional_entropy, mutual_information, cond_joint_entropy, cond_mutual_information)

### Campaign outcome

You may create cells below to answer the different questions related to campaign outcome. Unlike in the first part (Implementation), you are free to define as many cells as you need below to answer the different questions. Try to be structured and clear in your code (comment it if necessary). Note that you have to answer the questions in the pdf report, including the numbers you get!

### Question 5

Compute and report the entropy of each variable, and compare each value with its
corresponding variable cardinality. What do you notice? Justify theoretically.

In [58]:
"""
This cell only retrieve the data for later and defines a very useful function
that computes a probability of n variables P[X1, ..., Xn].
"""
# retrieve the data
df = pd.read_csv("data.csv")

# define useful function
def get_probability_of(dataframe:pd.DataFrame, *variables:str):
    shape = []
    label_to_index = {}
    for variable in variables:
        uniques = dataframe[variable].unique()
        for i, unique in enumerate(uniques):
            label_to_index[unique] = i
        shape.append(len(uniques))
    probability_matrix = np.zeros(shape=shape)
    labeled_joint_proba = (dataframe[list(variables)].value_counts() / len(df)).to_dict()
    for key, value in labeled_joint_proba.items():
        indexes = [label_to_index[k] for k in key]
        probability_matrix.itemset(tuple(indexes), value)
    return probability_matrix

In [59]:
for column_name in df.columns:
    proba = get_probability_of(df, column_name)
    cardinality = len(df[column_name].unique())
    print(f"Entropy of {column_name} = {entropy(proba):.3f} (card. {cardinality})")

print("\nKnowing that:")
for i in [2, 3, 4, 6]:
    print(f"log_2({i}) = {np.log2(i)}")

Entropy of campaign_outcome = 0.999 (card. 2)
Entropy of target_demographic = 1.585 (card. 3)
Entropy of budget_allocation = 1.495 (card. 3)
Entropy of reach = 1.563 (card. 3)
Entropy of conversion_rate = 1.543 (card. 3)
Entropy of campaign_duration = 1.546 (card. 3)
Entropy of time_of_year = 2.000 (card. 4)
Entropy of content_type = 1.548 (card. 3)
Entropy of platform_used = 2.439 (card. 6)
Entropy of weather = 1.455 (card. 3)
Entropy of previous_campaign_outcome = 1.000 (card. 2)

Knowing that:
log_2(2) = 1.0
log_2(3) = 1.584962500721156
log_2(4) = 2.0
log_2(6) = 2.584962500721156


### Question 7

Compute and report the conditional entropy of outcome given each of the other
variables. Considering the variable descriptions, what do you notice when the
conditioning variable is (a) weather and (b) previous_outcome?

In [60]:
for column_name in df.columns[1:]:
    proba = get_probability_of(df, "campaign_outcome", column_name)
    print(f"Joint entropy of campaign_outcome and {column_name} = {joint_entropy(proba):.3f}")

Joint entropy of campaign_outcome and target_demographic = 2.584
Joint entropy of campaign_outcome and budget_allocation = 2.273
Joint entropy of campaign_outcome and reach = 2.404
Joint entropy of campaign_outcome and conversion_rate = 2.260
Joint entropy of campaign_outcome and campaign_duration = 2.369
Joint entropy of campaign_outcome and time_of_year = 2.999
Joint entropy of campaign_outcome and content_type = 2.547
Joint entropy of campaign_outcome and platform_used = 3.438
Joint entropy of campaign_outcome and weather = 2.454
Joint entropy of campaign_outcome and previous_campaign_outcome = 1.858


### Question 8

Compute the mutual information between the variables target_demographic and
budget. What can you deduce about the relationship between these two variables?
What about the variables duration and reach?

In [61]:
proba = get_probability_of(df, "target_demographic", "budget_allocation")
print(f"Mutual information of target_demographic and budget_allocation = {mutual_information(proba)}")
proba = get_probability_of(df, "campaign_duration", "reach")
print(f"Mutual information of campaign_duration and reach = {mutual_information(proba)}")

Mutual information of target_demographic and budget_allocation = 0.00031025276388252365
Mutual information of campaign_duration and reach = 0.6039885926525139


### Question 9

A student in Computer Science from the University of Liège bets his friends that he
can predict the upcoming election by accessing the dataset. However, his hacking
skills are still weak. Therefore, he can only access a single variable of the dataset to
make its prediction. Using only the mutual information, which variable should he
choose to get? Would using conditional entropy lead to another choice?

In [62]:
for column_name in df.columns[1:]:
    proba = get_probability_of(df, "campaign_outcome", column_name)
    print(f"Mutual information of {column_name} = {mutual_information(proba)}")
    print(f"Conditional entropy of {column_name} = {conditional_entropy(proba)}")
    print()

Mutual information of target_demographic = 0.00046794421336437964
Conditional entropy of target_demographic = 0.9985969873642466

Mutual information of budget_allocation = 0.22078866087442206
Conditional entropy of budget_allocation = 0.7782762707031883

Mutual information of reach = 0.15847633111073822
Conditional entropy of reach = 0.8405886004668729

Mutual information of conversion_rate = 0.28282926976373135
Conditional entropy of conversion_rate = 0.71623566181388

Mutual information of campaign_duration = 0.17570790739621378
Conditional entropy of campaign_duration = 0.8233570241813968

Mutual information of time_of_year = 0.0004451007626644632
Conditional entropy of time_of_year = 0.9986198308149461

Mutual information of content_type = 0.0003241099012418758
Conditional entropy of content_type = 0.9987408216763687

Mutual information of platform_used = 0.0003861009113430569
Conditional entropy of platform_used = 0.9986788306662672

Mutual information of weather = 0.0001013582378

The greater the mutual information the better (the other variable brings more information). The less the conditional entropy the better (the combination is less chaotic).

### Question 10

With the outcome of the previous campaign considered as known, would you change
your answer from the previous question? What can you say about the amount of
information provided by this variable? Compare this value with previous results.

In [63]:
def joint_cond_entropy(Pxyz):
    """
    Computes the conditional entropy of X knowing the joint probability of Y and Z 
    from the joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - The joint conditional entropy H(X|Y,Z) as a number (integer, float or double)
    
    """
    # Compute the sum along the X axis to get P[Y, Z]
    Pyz = np.sum(Pxyz, axis=0)
    for i in range(len(Pyz)):
        Pyz[Pyz == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect

    # Compute the conditional probability
    Px_given_yz = Pxyz / Pyz
    for i in range(len(Px_given_yz)):
        Px_given_yz[Px_given_yz == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    return -np.sum(Pxyz * np.log2(Px_given_yz)) # H(X|Y,Z) = -sum( P[X,Y,Z] * log_2(P[X|Y,Z]) )

for column_name in df.columns[1:-1]:
    proba = get_probability_of(df, "campaign_outcome", column_name, "previous_campaign_outcome")
    print(f"I(campaign_outcome ; {column_name} | previous_campaign_outcome) = {cond_mutual_information(proba):.3f}")
print()
for column_name in df.columns[1:-1]:
    proba = get_probability_of(df, "campaign_outcome", column_name, "previous_campaign_outcome")
    print(f"H(campaign_outcome, {column_name} | previous_campaign_outcome) = {joint_cond_entropy(proba):.3f}")



I(campaign_outcome ; target_demographic | previous_campaign_outcome) = 0.000
I(campaign_outcome ; budget_allocation | previous_campaign_outcome) = 0.080
I(campaign_outcome ; reach | previous_campaign_outcome) = 0.073
I(campaign_outcome ; conversion_rate | previous_campaign_outcome) = 0.146
I(campaign_outcome ; campaign_duration | previous_campaign_outcome) = 0.067
I(campaign_outcome ; time_of_year | previous_campaign_outcome) = 0.001
I(campaign_outcome ; content_type | previous_campaign_outcome) = 0.000
I(campaign_outcome ; platform_used | previous_campaign_outcome) = 0.001
I(campaign_outcome ; weather | previous_campaign_outcome) = 0.001

H(campaign_outcome, target_demographic | previous_campaign_outcome) = 0.858
H(campaign_outcome, budget_allocation | previous_campaign_outcome) = 0.778
H(campaign_outcome, reach | previous_campaign_outcome) = 0.786
H(campaign_outcome, conversion_rate | previous_campaign_outcome) = 0.712
H(campaign_outcome, campaign_duration | previous_campaign_outcome

### Playing with information theory-based strategy
Let us first consider a fictional game in which players have to guess all numbers (N, from 1
to N) of a C x C grid. The same number may appear several times. At each turn, the player
chooses to guess a single number in one of the fields (or cells). The game then lets the
player know if the number is in the correct spot (green), if it is not in the grid (red), or if it is in
the grid but at the wrong spot (orange).

### Question 11
In this simple version of the game, what is the entropy of each of the fields for a grid with C=3 and N=9? Also, what is the entropy of the whole game (the 9 fields combined) ? How are these two quantities linked? Justify.

In [64]:
from math import comb
xiProbs = [1/9] * 9
entropy_value = entropy(np.array(xiProbs))
print (entropy_value)

3.169925001442312


In [65]:
totalCombinations = 9**9
entropy_original = np.log2(totalCombinations)
sum_entropies = entropy_value*9
print(sum_entropies)
print(entropy_original)

28.52932501298081
28.529325012980813


### QUESTION 12. 
In this simple version of the game with C=3 and N=9, let us assume that your first guess is a 9 in the center. What is now the entropy of each field, and the entropy of the game at this stage for each of the three possible outcomes (red, green, orange)? How much information has this guess brought you in each case (in bits)?

In [66]:
# Case red
totalCombinations = 8**9
entropy_red = np.log2(totalCombinations)
infogain = entropy_original - entropy_red
print(entropy_red/9)
print(entropy_red)
print(infogain)

3.0
27.0
1.529325012980813


In [67]:
# Case green
totalCombinations = 9**8
entropy_green = np.log2(totalCombinations)
infogain = entropy_original - entropy_green
print(entropy_green/8)
print(entropy_green)
print(infogain)

3.169925001442312
25.359400011538497
3.1699250014423157


In [68]:
# Case orange
x5Probs = [1/8]*8
entropy_x5 = entropy(np.array(x5Probs))
i = 1
posXi = 0
combinationsRestCells = 0
for i in range (1,9): # size of subset
    combinationsRestCells += comb(7,i)*(8**2)*(9**7) # all combinations of
    posXi += comb(7,i-1)*8*(9**7)
 
probYi = posXi / (posXi + combinationsRestCells) 
probNYi = 1 - probYi

xiProbs = [1/8]*8 # case Xi!=9
# H(Xi|Y) = p(Y=0) * H (Xi| Y = 0) + p(Y=1) * H (Xi| Y = 1)
# Second term is zero since knowing Y gives Xi
entropy_cond_Xi_Y = probNYi * entropy (xiProbs) 
entropy_Y = -(probYi* np.log2(probYi) + probNYi*np.log2(probNYi))
entropy_Xi = entropy_Y + entropy_cond_Xi_Y
entropy_orange = entropy_x5 + 8 * entropy_Xi
infogain = entropy_original - entropy_orange

print(entropy_original)
print (entropy_orange)
print(infogain)

28.529325012980813
25.695029143454242
2.8342958695265708


### QUESTION 13
In this simple version of the game with C=3 and N=9, let us now assume that your second guess is a 2 in the top-left corner. What is now the entropy of each field, and the entropy of the game at this stage for the following outcome? How much information has this second guess brought you (in bits)?

In [69]:
entropy_X5 = 0 # we know it's a 9
prob_X1 = [1/8]*8 # all same prob except 2
entropy_X1= entropy(prob_X1)
print(entropy_X1)

2.625


In [70]:
# same procedure as before
combinationsRestCells = 0
posXi = 0
for i in range (1,8): # size of subset of possible combinations of 2s in 6 slots
    combinationsRestCells += comb(6,i)*(8**2)*(9**6) # all combinations of 2s 
    posXi += comb(6,i-1)*8*(9**6)
    
probYi = posXi / (posXi + combinationsRestCells) # case: Xi = 2
probNYi = 1 - probYi
xiProbs = [1/8]*8 # case: Xi != 2
# H(Xi|Y) = p(Y=0) * H (Xi| Y = 0) + p(Y=1) * H (Xi| Y = 1)
entropy_cond_Xi_Y = probNYi * entropy (xiProbs)  # Knowing Y gives Xi
entropy_Y = -(probYi * np.log2(probYi) + probNYi * np.log2(probNYi))
entropy_Xi = entropy_Y + entropy_cond_Xi_Y
entropy_GO = entropy_X5 + entropy_X1 + 7 * entropy_Xi
infogain = entropy_green - entropy_GO

print (entropy_green)
print (entropy_GO)
print(infogain)

25.359400011538497
22.48512495958259
2.874275051955909


Let us now consider an advanced game in which the numbers in the grid to guess follows the same constraints as the Sudoku game (i.e., the same number can not appear twice on the same row, the same column, and the same sub 3x3 grid).


### QUESTION 14
In this advanced version of the game with C=3 and N=9, let us assume that your first guess is a 9 in the center and the second guess is a 2 in the top-left corner. What is now the entropy of each field, and the entropy of the game at every stage (including the initial stage) for the following outcome? How much information have these guesses brought you in this case (in bits)?

In [71]:
# Initial case
totalcases = 1
for i in range(1,10):
    totalcases *= i 
entropy_original_dep = np.log2(totalcases)
entropy_Xi = entropy_original_dep/9
print (entropy_Xi)# only for visualization
print (entropy_original_dep)

2.0521258910921767
18.46913301982959


In [72]:
# First guess
entropy_X5 = 0 # we know it is a 9
entropy_green_dep = entropy_X5
totalcases = 1
for i in range(1,9):
    totalcases *= i
    
entropy_green_dep += np.log2(totalcases)
fieldEntropy = entropy_green_dep/8 
infogain = entropy_original_dep - entropy_green_dep

print(entropy_green_dep)
print (entropy_original_dep)
print(infogain)

15.29920801838728
18.46913301982959
3.169925001442312


In [73]:
# Second guess
x1Probs = [1/7]*7 
entropy_X1 = entropy(x1Probs)# we known it is neither a 2 nor a 9 
unknownCells = 7
entropy_GO_dep = entropy_X1
for i in range(1,7):
    rest = unknownCells -i 
    posYi = (rest**rest)
    posNYi = rest*((rest+1)**(rest-1))
    probYi = posYi/(posYi+posNYi)
    probNYi = 1- probYi
    
    xiProbs = [1/rest]*rest
    entropy_cond_Xi_Y = probNYi * entropy(xiProbs) 
    entropy_Y = -(probYi* np.log2(probYi) + probNYi*np.log2(probNYi))
    entropy_Xi = entropy_Y + entropy_cond_Xi_Y
    entropy_GO_dep += entropy_Xi

infogain = entropy_green_dep - entropy_GO_dep

print(entropy_GO_dep)
print(entropy_green_dep)
print(infogain)

12.748120982967706
15.29920801838728
2.551087035419574


### QUESTION 15
In this advanced version of the game with C=9 and N=9, what is the entropy of an
empty grid? How does it compare to the grid in the simple version?

In [74]:
from math import log2
totalCombinations = 6670903752021072936960 # looked up on wiki
entropy = log2(6670903752021072936960) # number is too big that numpy cannot process it
print(entropy)

72.49837221884734
