## Logic for calculating the false positive rate

#### Case 1 repeats (8 experiments, 100 repeats each):

- None of the structures are actually affected 
- Record if there’s at least one structure that has p<0.05

#### Case 2 repeats (8 experiments, 100 repeats each):

- Left hippocampus (levels 1&2) are actually affected 
- Record if there’s at least one structure among the left amygdala (levels 1&2) have p<0.05

#### Case 3 repeats (8 experiments, 100 repeats each):

- Left hippocampus (levels 1&2) and left amygdala (levels 1&2) are actually affected 
- There are no structures in this case that could yield false positives (only true positives), so don’t consider case 3 repeats for false positives

#### Case 4 repeats (8 experiments, 100 repeats each):

- Use the saved Zs to see which structures are actually affected:

    - If the left hippocampus is actually affected, record if there’s at least one structure among the left amygdala (levels 1&2) that has p<0.05
    - If the left amygdala is affected, record if there’s at least one structure among the left hippocampus (levels 1&2) that has p<0.05

#### In total, there are 800 * 4 = 3,200 repeats

## Viewing the contents of one of the .npz files

In [31]:
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd

import scipy
from scipy import stats
from scipy.stats import binom

In [2]:
# Viewing the contents of one of the .npz files 
# Assign the file to an object called "test," which is a dictionary object
test = np.load("case_1_experiment_1_repeat_0000.npz")

# Print each key in the dictionary 
for key in test: 
    print(key)

# Print the values corresponding to this key
print(test["arr_0"])

arr_0
['Limbic_L_434_3, P[Z=1|X]=0.037920449823881666, p=0.647'
 'CerebralCortex_L_482_4, P[Z=1|X]=0.037920449823881666, p=0.647'
 'Telencephalon_L_501_5, P[Z=1|X]=0.037920449823881666, p=0.647'
 'Everything, P[Z=1|X]=0.037920449823881666, p=0.647'
 'Hippo_L_75_1, P[Z=1|X]=0.0003457965129371767, p=0.88'
 'Hippo_L_338_2, P[Z=1|X]=0.0003457965129371767, p=0.88'
 'Amyg_L_73_1, P[Z=1|X]=8.875288279312776e-05, p=0.671'
 'Amyg_L_336_2, P[Z=1|X]=8.875288279312776e-05, p=0.671']


## Extracting the p-value for a structure in a given repeat

Must run the next cell first, where `case_1` is defined

In [5]:
file = np.load(case_1[0])
file = file["arr_0"]
p = file[0].find("p=")
float(file[0][p::][2::])

0.647

## Case 1 repeats

General format of case 1 repeat files: case_1_experiment_x_repeat_xxxx.npz

Case 1 experiments 1-8, repeats 0000 to 0099 → 8 * 100 = 800 files

### Case 1: Obtaining a list of file names

In [4]:
case_1 = []

for i in range(1, 9):
    for j in range(0, 100):
        if j < 10:
            string = "case_1_experiment_" + str(i) + "_repeat_" + "000" + str(j) + ".npz"
        else:
            string = "case_1_experiment_" + str(i) + "_repeat_" + "00" + str(j) + ".npz"
        case_1.append(string)

case_1[0:4], case_1[796:801]

(['case_1_experiment_1_repeat_0000.npz',
  'case_1_experiment_1_repeat_0001.npz',
  'case_1_experiment_1_repeat_0002.npz',
  'case_1_experiment_1_repeat_0003.npz'],
 ['case_1_experiment_8_repeat_0096.npz',
  'case_1_experiment_8_repeat_0097.npz',
  'case_1_experiment_8_repeat_0098.npz',
  'case_1_experiment_8_repeat_0099.npz'])

### Case 1 experiment 1: Calculating the false positives rate

Case 1: nothing is affected

Experiment 1: clip the probabilities at 0.999 and 0.001 (original)

In [51]:
count = 0

for i in range(0, 100): # For each of the 100 case 1 experiment 1 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.03

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [52]:
binom.cdf(k = count, n = 100, p = 0.05)

0.2578386591160152

### Case 1 experiment 2: Calculating the false positives rate

Case 1: nothing is affected

Experiment 2: clip the probabilities at 0.9999 and 0.0001

In [53]:
count = 0

for i in range(100, 200): # For each of the 100 case 1 experiment 2 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.01

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [54]:
binom.cdf(k = count, n = 100, p = 0.05)

0.037081209327355036

### Case 1 experiment 3: Calculating the false positives rate

Case 1: nothing is affected

Experiment 3: run 20 iterations of the EM algorithm (original)

In [55]:
count = 0

for i in range(200, 300): # For each of the 100 case 1 experiment 3 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.01

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [56]:
binom.cdf(k = count, n = 100, p = 0.05)

0.037081209327355036

### Case 1 experiment 4: Calculating the false positives rate

Case 1: nothing is affected

Experiment 4: run 50 iterations of the EM algorithm

In [57]:
count = 0

for i in range(300, 400): # For each of the 100 case 1 experiment 4 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.04

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [58]:
binom.cdf(k = count, n = 100, p = 0.05)

0.43598130068571

### Case 1 experiment 5: Calculating the false positives rate

Case 1: nothing is affected

Experiment 5: run 100 iterations of the EM algorithm

In [59]:
count = 0

for i in range(400, 500): # For each of the 100 case 1 experiment 5 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.06

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [60]:
binom.cdf(k = count, n = 100, p = 0.05)

0.7660139840148319

### Case 1 experiment 6: Calculating the false positives rate

Case 1: nothing is affected

Experiment 6: initial probabilities are 0.5 (original)

In [61]:
count = 0

for i in range(500, 600): # For each of the 100 case 1 experiment 6 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.04

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [62]:
binom.cdf(k = count, n = 100, p = 0.05)

0.43598130068571

### Case 1 experiment 7: Calculating the false positives rate

Case 1: nothing is affected

Experiment 7: initial probabilities are 0.25

In [63]:
count = 0

for i in range(600, 700): # For each of the 100 case 1 experiment 7 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.05

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [64]:
binom.cdf(k = count, n = 100, p = 0.05)

0.6159991279561414

### Case 1 experiment 8: Calculating the false positives rate

Case 1: nothing is affected

Experiment 8: initial probabilities are 0.75

In [65]:
count = 0

for i in range(700, 800): # For each of the 100 case 1 experiment 8 files (00-99)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 100

0.04

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [66]:
binom.cdf(k = count, n = 100, p = 0.05)

0.43598130068571

## Case 2 repeats

General format of case 2 repeat files: case_2_experiment_x_repeat_xxxx.npz

Case 2 experiments 1-8, repeats 0000 to 0099 → 8 * 100 = 800 files

### Case 2: Obtaining a list of file names

In [21]:
case_2 = []

for i in range(1, 9):
    for j in range(0, 100):
        if j < 10:
            string = "case_2_experiment_" + str(i) + "_repeat_" + "000" + str(j) + ".npz"
        else:
            string = "case_2_experiment_" + str(i) + "_repeat_" + "00" + str(j) + ".npz"
        case_2.append(string)

case_2[0:4], case_2[796:801]

(['case_2_experiment_1_repeat_0000.npz',
  'case_2_experiment_1_repeat_0001.npz',
  'case_2_experiment_1_repeat_0002.npz',
  'case_2_experiment_1_repeat_0003.npz'],
 ['case_2_experiment_8_repeat_0096.npz',
  'case_2_experiment_8_repeat_0097.npz',
  'case_2_experiment_8_repeat_0098.npz',
  'case_2_experiment_8_repeat_0099.npz'])

### Case 2 experiment 1: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 1: clip the probabilities at 0.999 and 0.001 (original)

In [67]:
count = 0

for i in range(0, 100): # For each of the 100 case 2 experiment 1 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.07

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [68]:
binom.cdf(k = count, n = 100, p = 0.05)

0.872039521379621

### Case 2 experiment 2: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 2: clip the probabilities at 0.9999 and 0.0001

In [69]:
count = 0

for i in range(100, 200): # For each of the 100 case 2 experiment 2 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.06

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [70]:
binom.cdf(k = count, n = 100, p = 0.05)

0.7660139840148319

### Case 2 experiment 3: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 3: run 20 iterations of the EM algorithm (original)

In [71]:
count = 0

for i in range(200, 300): # For each of the 100 case 2 experiment 3 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.05

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [72]:
binom.cdf(k = count, n = 100, p = 0.05)

0.6159991279561414

### Case 2 experiment 4: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 4: run 50 iterations of the EM algorithm

In [73]:
count = 0

for i in range(300, 400): # For each of the 100 case 2 experiment 4 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.07

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [74]:
binom.cdf(k = count, n = 100, p = 0.05)

0.872039521379621

### Case 2 experiment 5: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 5: run 100 iterations of the EM algorithm

In [75]:
count = 0

for i in range(400, 500): # For each of the 100 case 2 experiment 5 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.08

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [76]:
binom.cdf(k = count, n = 100, p = 0.05)

0.9369104093725512

### Case 2 experiment 6: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 6: initial probabilities are 0.5 (original)

In [77]:
count = 0

for i in range(500, 600): # For each of the 100 case 2 experiment 6 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.04

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [78]:
binom.cdf(k = count, n = 100, p = 0.05)

0.43598130068571

### Case 2 experiment 7: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 7: initial probabilities are 0.25

In [79]:
count = 0

for i in range(600, 700): # For each of the 100 case 2 experiment 7 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.05

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [80]:
binom.cdf(k = count, n = 100, p = 0.05)

0.6159991279561414

### Case 2 experiment 8: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 8: initial probabilities are 0.75

In [81]:
count = 0

for i in range(700, 800): # For each of the 100 case 2 experiment 8 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 100

0.07

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05. 

In [82]:
binom.cdf(k = count, n = 100, p = 0.05)

0.872039521379621

## Case 4 repeats

General format of case 4 repeat files: case_4_experiment_x_repeat_xxxx.npz

Case 4 experiments 1-8, repeats 0000 to 0099 → 8 * 100 = 800 files

### Case 4: Obtaining a list of file names

In [6]:
case_4 = []

for i in range(1, 9):
    for j in range(0, 100):
        if j < 10:
            string = "case_4_experiment_" + str(i) + "_repeat_" + "000" + str(j) + ".npz"
        else:
            string = "case_4_experiment_" + str(i) + "_repeat_" + "00" + str(j) + ".npz"
        case_4.append(string)

case_4[0:4], case_4[796:801]

(['case_4_experiment_1_repeat_0000.npz',
  'case_4_experiment_1_repeat_0001.npz',
  'case_4_experiment_1_repeat_0002.npz',
  'case_4_experiment_1_repeat_0003.npz'],
 ['case_4_experiment_8_repeat_0096.npz',
  'case_4_experiment_8_repeat_0097.npz',
  'case_4_experiment_8_repeat_0098.npz',
  'case_4_experiment_8_repeat_0099.npz'])

### Case 4: Calculating the false positives rate