# Module 4 - What is Information? II

Author: Julio Correa, 2020; based on the original Matlab tutorials.<br/>
Adaptations by: J. Lizier, 2023-

The following block aims to import all the relevant libraries to analyse data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import math

# Preparing your environment

As per the previous modules, we want to use functions we have defined in our previous work in other notebooks.

You have several options on handling this as per previously. I suggest you update `simpleinfotheory.py` script to **add the new functions you wrote in the last module**, and import the required functions from this.

In [2]:
# Option 1: your notebook from Module 1 is complete:
# from ipynb.fs.full.Module_1_notebook import entropy
# Option 2: you use the Module 1 note book solutions: (if so, ignore the out
# from ipynb.fs.full.Module_1_notebook_solutions import entropy
# Option 3: edit simpleinfotheory.py and past your functions into that as you write them
from simpleinfotheory import entropy, entropyempirical, jointentropy, jointentropyempirical, conditionalentropy, conditionalentropyempirical, mutualinformation, mutualinformationempirical

# 4. Coding conditional mutual information

In this exercise we continue to alter the Python code to measure the conditional mutual information between variables $x$ and $y$, conditional on variable $z$, for a distribution $p(x,y,z)$:

$I\left(X;Y\mid Z\right)=H\left(X\mid Z\right)+H\left(Y\mid Z\right)-H\left(X,Y\mid Z\right)$

For the conditional mutual information, we will focus only on its _empirical_ calculation (for the most part). We will code conditional mutual information $I(X;Y|Z)$ for empirical samples `xn` and `yn` and `zn` in the cell below.

1. Find the lines where you need to add code, and do so. _Hint_: You can call your existing code `conditionalentropyempirical` to compute $H(X,Y|Z)$, $H(X|Z)$ and $H(Y|Z)$ respectively, by passing in `np.append(xn, yn, axis=1),zn`, then `xn,zn` and `yn,zn` as arguments to these functions respectively.

In [3]:
"""function conditionalmutualinformationempirical(xn,yn,zn)
Computes the mutual information over all samples xn of a random
variable X with samples yn of a random variable Y, conditioning on 
samples zn of a random variable Z.

Inputs:
- xn - matrix of samples of outcomes x. May be a 1D vector of samples, or
    a 2D matrix, where each row is a vector sample for a multivariate X.
- yn - matrix of samples of outcomes y. May be a 1D vector of samples, or
    a 2D matrix, where each row is a vector sample for a multivariate Y.
    Must have the same number of rows as X.
- zn - matrix of samples of outcomes z. May be a 1D vector of samples, or
    a 2D matrix, where each row is a vector sample for a multivariate Z
    which will be conditioned on.
    Must have the same number of rows as X.

Outputs:
- result - conditional mutual information of X with Y, given Z

Copyright (C) 2020-, Julio Correa, Joseph T. Lizier
Distributed under GNU General Public License v3
"""
def conditionalmutualinformationempirical(xn, yn, zn):
    
    # First, error checking, and converting argument into standard form:    
    xn = np.array(xn)
    # Convert to column vectors if not already:
    if xn.ndim == 1:
        xn = np.reshape(xn,(len(xn),1))
    yn = np.array(yn)
    if yn.ndim == 1:
        yn = np.reshape(yn,(len(yn),1))
    zn = np.array(zn)
    if zn.ndim == 1:
        zn = np.reshape(zn,(len(zn),1))
    [rx,cx] = xn.shape
    [ry,cy] = yn.shape
    [rz,cz] = zn.shape

    # Should we check any potential error conditions on the input?
    # Check that their number of rows are the same:
    assert(rx == ry)
    assert(rx == rz)

    # We need to compute H(X|Z) + H(Y|Z) - H(X,Y|Z):
    # 1. conditional joint entropy:
    H_XY_given_Z = conditionalentropyempirical(np.append(xn, yn, axis=1),zn); # How to compute this empirically ...?
    # 2. conditional entropy of Y:
    H_Y_given_Z = conditionalentropyempirical(yn,zn) # How to compute this empirically ...?
    # 3. conditional entropy of X:
    H_X_given_Z = conditionalentropyempirical(xn,zn) # How to compute this empirically ...?
    
    # Alternatively, note that we could compute I(X;Y,Z) - I(X;Z)
    
    result = H_X_given_Z + H_Y_given_Z - H_XY_given_Z;
    return result

2. Test that your code works by running, e.g.:
    1. `conditionalmutualinformationempirical([0,0,1,1],[0,1,0,1],[0,1,0,1])` and validating that you get the result 0 bits.
    1. `conditionalmutualinformationempirical([0,0,1,1],[0,0,1,1],[0,1,1,0])` and validating that you get the result 1 bit.
    1. `conditionalmutualinformationempirical([0,0,1,1],[0,1,0,1],[0,1,1,0])` and validating that you get the result 1 bit.
    1. Can you explain the expected results for these boundary cases?
    1. _Challenge_: Let's make a larger empirical test of case c above. First we will generate a large sample of binary values for variable $X$, `X = np.random.randint(0, 2, (1000,1))`, and same for $Z$, `Z = np.random.randint(0, 2, (1000,1))`, then we will construct the samples of $Y$ as the exclusive OR (XOR) of these two, `Y = np.logical_xor(X, Z)`. Validate using `mutualinformationempirical` that there is (almost) no mutual information between either $X$ or $Z$ with $Y$, yet using `conditionalmutualinformationempirical` that there is (almost) one bit of conditional mutual information from $X$ to $Y$ given $Z$ (or vice versa in $X$ and $Z$). Explain the meaning of the conditioning on $Z$ increasing the apparent mutual information between $X$ and $Y$ - see part 5 of the lecture, below. (Also: why are the bit values not quite 0 and 1 in this example?)

In [4]:
# Test the code here
print( conditionalmutualinformationempirical([0,0,1,1],[0,1,0,1],[0,1,0,1]) )
print( conditionalmutualinformationempirical([0,0,1,1],[0,0,1,1],[0,1,1,0]) )
print( conditionalmutualinformationempirical([0,0,1,1],[0,1,0,1],[0,1,1,0]) )

X = np.random.randint(0, 2, (1000,1))
Z = np.random.randint(0, 2, (1000,1))
Y = np.logical_xor(X,Z)
print( "I(X;Y) = %.4f bits" % mutualinformationempirical(X,Y)[0] ) # My solution code returns multiple values, just taking the result part
print( "I(Z;Y) = %.4f bits" % mutualinformationempirical(Z,Y)[0] ) # # My solution code returns multiple values, just taking the result part
print( "I(X;Y|Z) = %.4f bits" % conditionalmutualinformationempirical(X,Y,Z) )
print( "I(Z;Y|X) = %.4f bits" % conditionalmutualinformationempirical(Z,Y,X) )

0.0
1.0
1.0
I(X;Y) = 0.0004 bits
I(Z;Y) = 0.0004 bits
I(X;Y|Z) = 0.9995 bits
I(Z;Y|X) = 0.9995 bits


3. _Challenge_: Can you alter the code in `conditionalmutualinformationempirical` to compute conditional mutual information $I(X;Y|Z)$ using the expression $I(X;Y|Z) = I(X;Y,Z) - I(X;Z)$?

4. _Challange_: We did not code a function for `conditionalmutualinformation` in this exercise - an implementation is provided for you however in the solutions (see below). Can you read the code and understand how this is calculating the conditional mutual information for the given probability table `p`? Note that the argument `p` would be a 3D matrix, representing the probability $p(x,y,z)$.