# Demo on unbalanced designs

In this demo we will have a look at calculating RDMs from data which either contians different numbers of measurements per stimulus or where the measurements for different stimuli do not contain the same measurement channels. Such data is common in neural recordings where individual neurons may become unmeasureable during an experiment such that we do not have measurements of all neurons for all stimuli. A common reason for different numbers of repetition for a stimulus are trials which have to be removed due to artifacts or aborted experiments or because the experimentor did not have full control over which stimuli were shown.

In [2]:
import numpy as np
import pyrsa

First we need some data to work on. This data is very small scale such that we can understand everything which happens. As balanced data we have two repetitions of three different stimuli, which are measured for 5 different measurement channels. 

In [9]:
data_balanced_array = np.array([
    [0.7, 0.8, 0.9, 1.0, 1.1],
    [0.2, 1.8, 2.9, 1.0, 1.3],
    [2.7, 0.8, 0.2, 1.2, 1.1],
    [1.7, 0.5, 0.9, 1.5, 1.1],
    [1.7, 2.8, 2.2, 1.2, 1.0],
    [1.7, 0.5, 0.4, 1.4, 0.3],
    ])
runs = [0, 1, 0, 1, 0, 1]
stimulus = [0, 0, 1, 1, 2, 2]
descriptors = {'runs': runs, 'stimulus': stimulus}
data_balanced = pyrsa.data.Dataset(data_balanced_array, obs_descriptors=descriptors)

For the unbalanced data, we can use similar data, but let's assume that we now measured the 0 stimulus only once and measured stimulus 1 three times. Also at two times we had to discard measurements from single channels due to technical problems:

In [10]:
data_unbalanced_array = np.array([
    [0.7, 0.8, 0.9, 1.0, 1.1],
    [0.2, 1.8, np.nan, 1.0, 1.3],
    [2.7, 0.8, 0.2, 1.2, 1.1],
    [1.7, 0.5, 0.9, 1.5, 1.1],
    [1.7, 2.8, 2.2, 1.2, np.nan],
    [1.7, 0.5, 0.4, 1.4, 0.3],
    ])
runs = [0, 0, 1, 2, 0, 1]
stimulus = [0, 1, 1, 1, 2, 2]
descriptors = {'runs': runs, 'stimulus': stimulus}
data_unbalanced = pyrsa.data.Dataset(data_unbalanced_array, obs_descriptors=descriptors)

For the balanced data we can use the normal functions to get a RDM:

In [19]:
balanced_rdm = pyrsa.rdm.calc_rdm(data_balanced, descriptor='stimulus')
print(balanced_rdm)

pyrsa.rdm.RDMs
1 RDM(s) over 3 conditions

dissimilarity_measure = 
euclidean

dissimilarities[0] = 
[[0.     1.088  0.4875]
 [1.088  0.     0.4035]
 [0.4875 0.4035 0.    ]]

descriptors: 

rdm_descriptors: 
index = [0]

pattern_descriptors: 
index = [0 1 2]
stimulus = [0, 1, 2]




For the unbalanced data this does not work, because the missing values break the calculation. Also the different numbers of measurements are disregarded when we use the normal calc_rdm function which averages stimulus responses first:

In [15]:
unbalanced_rdm_broken = pyrsa.rdm.calc_rdm(data_unbalanced, descriptor='stimulus')
print(unbalanced_rdm_broken)

pyrsa.rdm.RDMs(
dissimilarity_measure = 
euclidean
dissimilarities = 
[[nan nan nan]]
descriptors = 
{}
rdm_descriptors = 
{'index': array([0])}
pattern_descriptors = 
{'index': array([0, 1, 2]), 'stimulus': [0, 1, 2]}

instead we can use the slightly slower variant for unbalanced designs, which calculates the RDM correctly:

In [17]:
unbalanced_rdm = pyrsa.rdm.calc_rdm_unbalanced(data_unbalanced, descriptor='stimulus')
print(unbalanced_rdm)

pyrsa.rdm.RDMs
1 RDM(s) over 3 conditions

dissimilarity_measure = 
euclidean

dissimilarities[0] = 
[[0.         0.51142857 0.98555556]
 [0.51142857 0.         1.0868    ]
 [0.98555556 1.0868     0.        ]]

descriptors: 

rdm_descriptors: 
index = [0]
weights = [[14, 9, 25]]

pattern_descriptors: 
index = [0 1 2]
stimulus = {0, 1, 2}




This RDM now contains valid values for the dissimilarities. Additionally it gained an rdm_descriptor called 'weights', which contains the weight each carried by each dissimilarity. The difference here are caused by different numbers of measurement channels and repetitions for the different stimuli.

As a sanity check we can compute the RDM based on the balanced data with this additional method to check that it results in the same RDM:

In [18]:
sanity_rdm = pyrsa.rdm.calc_rdm_unbalanced(data_balanced, descriptor='stimulus')
print(sanity_rdm)

pyrsa.rdm.RDMs
1 RDM(s) over 3 conditions

dissimilarity_measure = 
euclidean

dissimilarities[0] = 
[[0.    1.436 1.205]
 [1.436 0.    0.94 ]
 [1.205 0.94  0.   ]]

descriptors: 

rdm_descriptors: 
index = [0]
weights = [[20, 20, 20]]

pattern_descriptors: 
index = [0 1 2]
stimulus = {0, 1, 2}


