# Project 1 - Information measures

The goal of this first project is to get accustomed to the information and uncertainty measures. We ask you to write a brief report (pdf format) collecting your answers to the different questions. All codes must be written in Python inside this Jupyter Notebook. No other code file will be accepted. Note that you can not change the content of locked cells or import any extra Python library than the ones already imported (numpy and pandas).

## Implementation

In this project, you will need to use information measures to answer several questions. Therefore, in this first part, you are asked to write several functions that implement some of the main measures seen in the first theoretical lectures. Remember that you need to fill in this Jupyter Notebook to answer these questions. Pay particular attention to the required output format of each function.

In [286]:
# [Locked Cell] You can not import any extra Python library in this Notebook.
import numpy as np
import pandas as pd

### Question 1

Write a function entropy that computes the entropy $\mathcal{H(X)}$ of a random variable $\mathcal{X}$ from its probability distribution $P_\mathcal{X} = (p_1, p_2, . . . , p_n)$. Give the mathematical formula that you are using and explain the key parts of your implementation. Intuitively, what is measured by the entropy?

In [287]:
def entropy(Px):
    """
    Computes the entropy from the marginal probability distribution. 
    Arguments:
    ----------
    - Px :  Marginal probability distribution of the random 
            variable X in a numpy array where Px[i]=P(X=i)
    Return:
    -------
    - The entropy of X (H(X)) as a number (integer, float or double).
    """
    Px_no_0 = Px.copy()
    # replace the probability zero with a value for the log not to raise an error
    # this is not a big deal since the value of the log will be absorbed by the multiplication with zero
    Px_no_0[Px_no_0 == 0] = 1 
    return -np.sum(Px * np.log2(Px_no_0)) # We use the Shannon Entropy formula

### Question 2

Write a function joint_entropy that computes the joint entropy $\mathcal{H(X,Y)}$ of two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Compare the entropy and joint_entropy functions (and their corresponding formulas), what do you notice?

In [288]:
def joint_entropy(Pxy):
    """
    Computes the joint entropy from the joint probability distribution.  
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The joint entropy H(X,Y) as a number (integer, float or double).
    """
    # Thanks to numpy 2D-arrays and 1D-arrays are handle the same way using the same functions
    # so the algorithm is strictly the same, only the shape of the input changes.
    # But here is the math we should have done :
    Pxy_no_0 = Pxy.copy()
    for i in range(len(Pxy_no_0)):
        Pxy_no_0[i][Pxy_no_0[i] == 0] = 1 # same as before replace the zeroes by a value
    H_xy = 0
    for i in range(Pxy.shape[0]):
        for j in range(Pxy.shape[1]):
            # Check if the probability is non-zero
            if Pxy[i][j] > 0:
                # Compute the contribution to entropy
                H_xy -= Pxy[i][j] * np.log2(Pxy[i][j])
    return H_xy


### Question 3

Write a function conditional_entropy that computes the conditional entropy $\mathcal{H(X|Y)}$ of a discrete random variable $\mathcal{X}$ given another discrete random variable $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Describe an equivalent way of computing that quantity.

In [289]:
def conditional_entropy(Pxy):
    """
    Computes the conditional entropy from the joint probability distribution.
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The conditional entropy H(X|Y) as a number (integer, float or double)
    """
    Py = np.sum(Pxy, axis=0)
    Py[Py == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    Px_given_y = Pxy / Py # P[x|y] = P[x, y]/P[y]
    Px_given_y[Px_given_y == 0] = 1 # replace the zeroes by non-zeroes, it will have no effect
    return -np.sum(Pxy * np.log2(Px_given_y)) # the formula is self explanatory

### Question 4

Write a function mutual_information that computes the mutual information $\mathcal{I(X;Y)}$ between two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$ . Give the mathematical formula that you are using and explain the key parts of your implementation. What can you deduce from the mutual information $\mathcal{I(X;Y)}$ on the relationship between $\mathcal{X}$ and $\mathcal{Y}$? Discuss.

In [290]:
def mutual_information(Pxy):
    """
    Computes the mutual information I(X;Y) from joint probability distribution
    
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The mutual information I(X;Y) as a number (integer, float or double)
    """
    Py = np.sum(Pxy, axis=0)
    Px = np.sum(Pxy, axis=1)
    return entropy(Px) + entropy(Py) - joint_entropy(Pxy) # apply the formula I(X, Y) = H(X) + H(Y) - H(X, Y)

### Question 5

Let $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ be three discrete random variables. Write the functions cond_joint_entropy and cond_mutual_information that respectively compute $\mathcal{H(X,Y|Z)}$ and $\mathcal{I(X;Y|Z)}$ of two discrete random variable $\mathcal{X}$, $\mathcal{Y}$ given another discrete random variable $\mathcal{Z}$ from their joint probability distribution $P_\mathcal{X,Y,Z}$. Give the mathematical formulas that you are using and explain the key parts of your implementation.
Suggestion: Observe the mathematical definitions of these quantities and think how you could derive them from the joint entropy and the mutual information.

In [291]:
def cond_joint_entropy(Pxyz):
    """
    Computes the conditional joint entropy of X, Y knowing Z 
    from the joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - The conditional joint entropy H(X,Y|Z) as a number (integer, float or double)
    
    """
    # P[X,Y|Z] = P[X,Y,Z] / P[Z]
    Pxy_given_z = Pxyz / np.sum(Pxyz, axis=(0, 1)) 
    print(Pxy_given_z)
    Pxy_given_z[Pxy_given_z == 0] = 1
    print(Pxy_given_z)
    # H(X,Y|Z) = -sum(Pxyz * log(pxy_given_z))
    return -sum(Pxyz * np.log2(Pxy_given_z))

In [292]:
def cond_mutual_information(Pxyz):
    """
    Computes the conditional mutual information of X, Y knowing Z 
    from joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - I(X;Y|Z): The conditional mutual information as a number (integer, float or double)
    
    """
    Pxy_given_z = Pxyz / np.sum(Pxyz, axis=(0, 1)) # P[X,Y|Z] = P[X,Y,Z] / P[Z]
    print(Pxy_given_z)
    Px_given_z = np.sum(Pxyz, axis=1) # P[X|Z] = P[X,Z] / P[Z]
    print(Px_given_z)
    Py_given_z = np.sum(Pxyz, axis=0)
    print(Py_given_z)
    # I(X;Y|Z) = sum(P[X, Y, Z] * log(P[X,Y|Z]/(P[X|Z] * P[Y|Z])))
    return np.sum(Pxyz * np.log2(Pxy_given_z / (Px_given_z * Py_given_z)))

In [293]:
# [Locked Cell] Evaluation of your functions by the examiner. 
# You don't have access to the evaluation, this will be done by the examiner.
# Therefore, this cell will return nothing for the students.
import os
if os.path.isfile("private_evaluation.py"):
    from private_evaluation import unit_tests
    unit_tests(entropy, joint_entropy, conditional_entropy, mutual_information, cond_joint_entropy, cond_mutual_information)

### Campaign outcome

You may create cells below to answer the different questions related to campaign outcome. Unlike in the first part (Implementation), you are free to define as many cells as you need below to answer the different questions. Try to be structured and clear in your code (comment it if necessary). Note that you have to answer the questions in the pdf report, including the numbers you get!

### Question 5

Compute and report the entropy of each variable, and compare each value with its
corresponding variable cardinality. What do you notice? Justify theoretically.

In [294]:
# retrieve the data
df = pd.read_csv("data.csv")

# define useful function
def get_probability_of(dataframe:pd.DataFrame, *variables:str):
    shape = []
    label_to_index = {}
    for variable in variables:
        uniques = dataframe[variable].unique()
        for i, unique in enumerate(uniques):
            label_to_index[unique] = i
        shape.append(len(uniques))
    probability_matrix = np.zeros(shape=shape)
    labeled_joint_proba = (dataframe[list(variables)].value_counts() / len(df)).to_dict()
    for key, value in labeled_joint_proba.items():
        indexes = [label_to_index[k] for k in key]
        probability_matrix.itemset(tuple(indexes), value)
    # print(labeled_joint_proba)
    # print(probability_matrix)
    return probability_matrix

In [295]:
for column_name in df.columns:
    proba = get_probability_of(df, column_name)
    print(f"Entropy of {column_name} = {entropy(proba)}")

Entropy of campaign_outcome = 0.9990649315776107
Entropy of target_demographic = 1.5849156232031731
Entropy of budget_allocation = 1.4945090722547374
Entropy of reach = 1.5632810900389758
Entropy of conversion_rate = 1.5433257785229548
Entropy of campaign_duration = 1.5459277368460542
Entropy of time_of_year = 1.9999292608210553
Entropy of content_type = 1.5480461237498753
Entropy of platform_used = 2.4388925243564703
Entropy of weather = 1.4545497590677807
Entropy of previous_campaign_outcome = 0.9995123141215677


What do I notice...

### Question 7

Compute and report the conditional entropy of outcome given each of the other
variables. Considering the variable descriptions, what do you notice when the
conditioning variable is (a) weather and (b) previous_outcome?

In [296]:
for column_name in df.columns[1:]:
    proba = get_probability_of(df, "campaign_outcome", column_name)
    print(f"Joint entropy of campaign_outcome and {column_name} = {joint_entropy(proba)}")

Joint entropy of campaign_outcome and target_demographic = 2.5835126105674195
Joint entropy of campaign_outcome and budget_allocation = 2.272785342957926
Joint entropy of campaign_outcome and reach = 2.4038696905058483
Joint entropy of campaign_outcome and conversion_rate = 2.2595614403368343
Joint entropy of campaign_outcome and campaign_duration = 2.3692847610274512
Joint entropy of campaign_outcome and time_of_year = 2.998549091636002
Joint entropy of campaign_outcome and content_type = 2.546786945426244
Joint entropy of campaign_outcome and platform_used = 3.437571355022738
Joint entropy of campaign_outcome and weather = 2.453513332407548
Joint entropy of campaign_outcome and previous_campaign_outcome = 1.8575661545610016


Interpretation...

### Question 8

Compute the mutual information between the variables target_demographic and
budget. What can you deduce about the relationship between these two variables?
What about the variables duration and reach?

In [297]:
proba = get_probability_of(df, "target_demographic", "budget_allocation")
print(f"Mutual information = {mutual_information(proba)}")

Mutual information = 0.0003102527638816355


### Question 9

A student in Computer Science from the University of Liège bets his friends that he
can predict the upcoming election by accessing the dataset. However, his hacking
skills are still weak. Therefore, he can only access a single variable of the dataset to
make its prediction. Using only the mutual information, which variable should he
choose to get? Would using conditional entropy lead to another choice?

In [298]:
for column_name in df.columns[1:]:
    proba = get_probability_of(df, "campaign_outcome", column_name)
    print(f"Mutual information of {column_name} = {mutual_information(proba)}")
    print(f"Conditional entropy of {column_name} = {conditional_entropy(proba)}")
    print()

Mutual information of target_demographic = 0.00046794421336437964
Conditional entropy of target_demographic = 0.9985969873642466

Mutual information of budget_allocation = 0.22078866087442206
Conditional entropy of budget_allocation = 0.7782762707031883

Mutual information of reach = 0.15847633111073822
Conditional entropy of reach = 0.8405886004668729

Mutual information of conversion_rate = 0.28282926976373135
Conditional entropy of conversion_rate = 0.71623566181388

Mutual information of campaign_duration = 0.17570790739621378
Conditional entropy of campaign_duration = 0.8233570241813968

Mutual information of time_of_year = 0.0004451007626640191
Conditional entropy of time_of_year = 0.9986198308149461

Mutual information of content_type = 0.0003241099012418758
Conditional entropy of content_type = 0.9987408216763687

Mutual information of platform_used = 0.0003861009113430569
Conditional entropy of platform_used = 0.9986788306662672

Mutual information of weather = 0.0001013582378

Discussion

### Question 10

With the outcome of the previous campaign considered as known, would you change
your answer from the previous question? What can you say about the amount of
information provided by this variable? Compare this value with previous results.

In [299]:
for column_name in df.columns[1:]:
    proba = get_probability_of(df, "campaign_outcome", column_name, "budget_allocation")
    print(f"Conditional mutual information of {column_name} = {cond_mutual_information(proba)}")
    print(f"Conditional entropy of {column_name} = {cond_joint_entropy(proba)}")

[[[0.14807959 0.27709611 0.0396376 ]
  [0.13465988 0.27300613 0.02944507]
  [0.13558538 0.26533742 0.03397508]]

 [[0.19204072 0.06186094 0.2785957 ]
  [0.18787598 0.0603272  0.30917327]
  [0.20175845 0.06237219 0.30917327]]]
[[0.1808 0.319  0.0182]
 [0.2514 0.0722 0.1584]]
[[0.147  0.1326 0.0562]
 [0.1394 0.1304 0.0598]
 [0.1458 0.1282 0.0606]]


ValueError: operands could not be broadcast together with shapes (2,3) (3,3) 