# Continuous-variable MI in JIDT -- Kernel estimator

In this activity, we continue to analyse the 2coupledRandomCols-1.txt data set (which has variable 1 as a noisy lagged copy of Gaussian variable 0). We measured MI of this data set earlier using a linear-Gaussian estimator during the Part 2 of the lecture.
<br>
1. Select Kernel estimator. Ensure that the 2coupledRandomCols-1.txt data set is still selected.
<br>

2. Set the TIME_DIFF parameter to 1 again. (Note these are always reset to defaults when you change the estimator). Leave the KERNEL_WIDTH parameter at its default (0.25).
<br>

3. Press Compute.<br>
    a. Note the answer (it's in bits).<br>
    b. How close is it to the answer we measured (in nats) using the linear-Gaussian estimator?<br>
    c. Why might it be different? Hint: Think about the assumptions of each estimator and their properties.


4. Change the kernel width from 0.1 up to 1.0 in increments of 0.05, pressing compute and noting the answer each time.<br>
    a. You should alter the code template to do this in a for loop, and indeed in Matlab save and plot the results. (Recall that you need to call <b>initialise()</b> after <b>setProperty()</b> when changing properties such as the kernel width).<br>
    
    b. How does the answer vary as a function of the kernel width? How is the question the estimator is asking varying as we change the kernel width? Is there an obvious kernel width that you should use?
    
Sample solutions (try it first!): code (see solutions), and sample plot below:

![image.png](attachment:image.png)

## Continuous-variable MI in JIDT -- KSG estimator

In this activity, we continue to analyse the 2coupledRandomCols-1.txt data set, this time with the KSG estimator.

1. Select Kraskov (KSG) alg. 2 estimator. Ensure that the 2coupledRandomCols-1.txt data set is still selected.
<br>

2. Set the TIME_DIFF parameter to 1 again. (Note these are always reset to defaults when you change the estimator). Leave the k (nearest neighbours) parameter at its default (4).
<br>

3. Press Compute.

    a. Note the answer (it's in nats). Press Compute again -- note that the answer changes slightly due to some stochastic noise we add to the data (to keep the KSG algorithm stable).<br>
    b. How close is it to the answer we measured (in nats) using the linear-Gaussian estimator?<br>
    c. Why might this estimate be closer than that with the Kernel estimator?<br>
    d. Given that we know the underlying data are linearly coupled Gaussians, which estimator do you think is best to use?


4. Change the number of nearest neighbours parameter k from 4 up to 15, pressing compute and noting the answer each time.

    a. You should alter the code template to do this in a for loop, and indeed in Matlab save and plot the results. (Recall that you need to call initialise() after setProperty() when changing properties such as k).<br>
    b. How does the answer vary as a function of k? Does it appear more stable to parameters than the Kernel estimator? How is the question the estimator is asking vary as we change k? Our lecture notes suggested leaving k at 4 was a good default option: review the results here and reflect on that.
    
Sample solutions (try it first!): code (see solutions), and sample plot below:

![image.png](attachment:image.png)

## Heart-breath interaction MI analysis
In this activity, we analyse the <b>SFI-heartRate_breathVol_bloodOx-extract.txt</b> data set (in the folder demos/data/). This includes (in each column) the heart rate, breath rate and blood oxygen concentration data for a sleeping patient who has sleep apnoea. Full data credits are available in the header comments in the data file. The first column is heart rate, the second is chest volume, and the third is blood oxygen concentration.

You can plot e.g. the heart rate in Python via:

1. Change directory in Matlab to demos/data.
2. Load the data: data = readFloatsFile.readFloatsFile("../SFI-heartRate_breathVol_bloodOx-extract.txt")
3. Plot the heart-rate data from the first column: plt.plot(data[:,0])

![image.png](attachment:image.png)

Selecting the first column here selects the heart rate data only. You can see the apnoea incidents when the heart rate dramatically rises and falls.

Here we wish to investigate whether there is a relationship between heart and breath rate in the data set.

**A. Linear-Gaussian estimator**

1. Using the MI AutoAnalyser, select Gaussian estimator.
2. Select the SFI-heartRate_breathVol_bloodOx-extract.txt data set.
3. Set the source column to 0 (heart rate) and destination column to 1 (breath rate).
4. Press Compute.<br>
 a. Note the answer (it's in nats).<br>
 b. Does this appear to indicate a relationship between heart rate and breath rate?<br>
 c. What happens if you scan the TIME_DIFF parameter to look for a lagged relationship? (Optional Challenge: you could alter the code template to do this in a for loop, and indeed in Matlab save and plot the results.)<br>
 d. What can you conclude from the results? Is there no relationship at all between the variables?<br>
 
Sample solutions (try it first!): code (see solutions), and sample plot below:

![image.png](attachment:image.png)

**B. KSG estimator**

1. Select Kraskov (KSG) alg. 2 estimator. Ensure that the <b>SFI-heartRate_breathVol_bloodOx-extract.txt</b> data set is still selected.
2. Set the TIME_DIFF parameter to 0 again. (Note these are always reset to defaults when you change the estimator). Leave the k (nearest neighbours) parameter at its default (4).
3. Press Compute.<br>
 a. Notice whether it is faster or slower than the linear-Gaussian calculator was. You may not notice a difference on this short data set -- if not, try both estimators on the full SFI-heartRate_breathVol_bloodOx.txt data set<br>
 b. Note the answer (it's in nats).<br>
 c. How different is it to the answer we measured (in nats) using the linear-Gaussian estimator?<br>
 d. Why might this estimate be different to that obtained with the linear-Gaussian estimator?<br>
 f. Does this change your conclusions on whether there is a relationship between the variables?<br>
 g. Reflect on the differences between the estimator types, their properties and what they can measure. You could plot the data to check for any relationship yourself (e.g. in Matlab:plot(data(:,1), data(:,2), 'x');).<br>
4. What happens if you scan the TIME_DIFF parameter to look for a lagged relationship?<br>
 a. Optional Challenge: you could alter the code template to do this in a <b>for</b> loop, and indeed in Python save and plot the results.<br>
 b. What does it mean that the MI(heart; breath) is large over not just one TIME_DIFF but several? Are these earlier values of heart rate providing the same or different information about the destination breath variable? How could you investigate this further? (Think about how a conditional mutual information could be used to investigate this)<br>

Sample solutions (try it first!): code (see solutions), and sample plot below:

![image.png](attachment:image.png)