## Logic for calculating the false positive rate

#### Case 1 repeats (8 experiments, 1000 repeats each):

- None of the structures are actually affected 
- Record if there’s at least one structure that has p<0.05

#### Case 2 repeats (8 experiments, 1000 repeats each):

- Left hippocampus (levels 1&2) are actually affected 
- Record if there’s at least one structure among the left amygdala (levels 1&2) have p<0.05

#### Case 3 repeats (8 experiments, 1000 repeats each):

- Left hippocampus (levels 1&2) and left amygdala (levels 1&2) are actually affected 
- There are no structures in this case that could yield false positives (only true positives), so don’t consider case 3 repeats for false positives

#### Case 4 repeats (8 experiments, 1000 repeats each):

- Use the saved Zs to see which structures are actually affected:

    - If the left hippocampus is actually affected, record if there’s at least one structure among the left amygdala (levels 1&2) that has p<0.05
    - If the left amygdala is affected, record if there’s at least one structure among the left hippocampus (levels 1&2) that has p<0.05

#### In total, there are 8000 * 4 = 32,000 repeats

## Viewing the contents of one of the .npz files

In [1]:
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd

import scipy
from scipy import stats
from scipy.stats import binom

In [29]:
# Viewing the contents of one of the .npz files 
# Assign the file to an object called "test," which is a dictionary object
test = np.load("fwer_experiments/case_1_experiment_1_repeat_0000.npz")

# Print each key in the dictionary 
for key in test: 
    print(key)

# Print the values corresponding to this key
print(test["arr_0"])

arr_0
['Limbic_L_434_3, P[Z=1|X]=0.28751672624651625, p=0.125'
 'CerebralCortex_L_482_4, P[Z=1|X]=0.28751672624651625, p=0.125'
 'Telencephalon_L_501_5, P[Z=1|X]=0.28751672624651625, p=0.125'
 'Everything, P[Z=1|X]=0.28751672624651625, p=0.125'
 'Amyg_L_73_1, P[Z=1|X]=0.1537784202057456, p=0.158'
 'Amyg_L_336_2, P[Z=1|X]=0.1537784202057456, p=0.158'
 'Hippo_L_75_1, P[Z=1|X]=0.0002833414807840441, p=0.418'
 'Hippo_L_338_2, P[Z=1|X]=0.0002833414807840441, p=0.418']


## Addressing ties in the probability of a structure being affected

In the example above, Limbic_L_434_3, CerebralCortex_L_482_4, Telencephalon_L_501_5, and Everything all have a probability of 0.037920449823881666 of being affected.

This is a four-way tie in the probability of being affected.

We want the order of these tied structures to be from parents to children.

In [3]:
dictionary = {"Everything" : 1, 
              "Telencephalon_L_501_5" : 2,
              "CerebralCortex_L_482_4": 3, 
              "Limbic_L_434_3" : 4,
              "Hippo_L_338_2" : 5,
              "Amyg_L_336_2" : 6,
              "Hippo_L_75_1" : 7,
              "Amyg_L_73_1" : 8
             }

# Print the alphabetically sorted strucutures and their indices
for i in range(0, 7):
    print(dictionary[sorted(dictionary)[i]], sorted(dictionary)[i]) 
    
# Print the structures sorted by index (sorted by parent to child)
dict(sorted(dictionary.items(), key=lambda item: item[1]))

6 Amyg_L_336_2
8 Amyg_L_73_1
3 CerebralCortex_L_482_4
1 Everything
5 Hippo_L_338_2
7 Hippo_L_75_1
4 Limbic_L_434_3


{'Everything': 1,
 'Telencephalon_L_501_5': 2,
 'CerebralCortex_L_482_4': 3,
 'Limbic_L_434_3': 4,
 'Hippo_L_338_2': 5,
 'Amyg_L_336_2': 6,
 'Hippo_L_75_1': 7,
 'Amyg_L_73_1': 8}

In [4]:
# List of strings
my_list = test["arr_0"]

# Round 1: sort structures from parent to child
# Extract the name and sort by key values in the dictionary
my_list = sorted(my_list, key = lambda x : dictionary[x.split(",")[0]])  

# Round 2: sort the 1st (actually 2nd) element of each list -> probabilities get sorted numerically
# Convert the probabilities to floats so that the numbers are sorted numerically (not alphabetically)
# If there's a tie in probabilities, the first sort will be used instead (parent to child)
my_list = sorted(my_list, key = lambda x : float(x.split(",")[1].split("=")[-1]), reverse = True) 

for i in range(0, len(my_list)):
    print(my_list[i], "\n")

Everything, P[Z=1|X]=0.28751672624651625, p=0.125 

Telencephalon_L_501_5, P[Z=1|X]=0.28751672624651625, p=0.125 

CerebralCortex_L_482_4, P[Z=1|X]=0.28751672624651625, p=0.125 

Limbic_L_434_3, P[Z=1|X]=0.28751672624651625, p=0.125 

Amyg_L_336_2, P[Z=1|X]=0.1537784202057456, p=0.158 

Amyg_L_73_1, P[Z=1|X]=0.1537784202057456, p=0.158 

Hippo_L_338_2, P[Z=1|X]=0.0002833414807840441, p=0.418 

Hippo_L_75_1, P[Z=1|X]=0.0002833414807840441, p=0.418 



In [45]:
def sorting_function(input_string):
    my_list = input_string
    my_list = sorted(my_list, key = lambda x : dictionary[x.split(",")[0]])  
    my_list = sorted(my_list, key = lambda x : float(x.split(",")[1].split("=")[-1]), reverse = True) 
    return my_list

## Case 1 repeats

General format of case 1 repeat files: case_1_experiment_x_repeat_xxxx.npz

Case 1 experiments 1-8, repeats 0000 to 0999 → 8 * 1000 = 8000 files

### Case 1: Obtaining a list of file names

In [13]:
case_1 = []

for i in range(1, 9):
    for j in range(0, 1000):
        if j < 10:
            string = "fwer_experiments/case_1_experiment_" + str(i) + "_repeat_" + "000" + str(j) + ".npz"
        elif j < 100:
            string = "fwer_experiments/case_1_experiment_" + str(i) + "_repeat_" + "00" + str(j) + ".npz"
        else:
            string = "fwer_experiments/case_1_experiment_" + str(i) + "_repeat_" + "0" + str(j) + ".npz"
        case_1.append(string)

case_1[0:4], case_1[7996:8001]

(['fwer_experiments/case_1_experiment_1_repeat_0000.npz',
  'fwer_experiments/case_1_experiment_1_repeat_0001.npz',
  'fwer_experiments/case_1_experiment_1_repeat_0002.npz',
  'fwer_experiments/case_1_experiment_1_repeat_0003.npz'],
 ['fwer_experiments/case_1_experiment_8_repeat_0996.npz',
  'fwer_experiments/case_1_experiment_8_repeat_0997.npz',
  'fwer_experiments/case_1_experiment_8_repeat_0998.npz',
  'fwer_experiments/case_1_experiment_8_repeat_0999.npz'])

### Case 1 experiment 1: Calculating the false positives rate

Case 1: nothing is affected

Experiment 1: clip the probabilities at 0.999 and 0.001 (original)

In [50]:
count = 0

for i in range(0, 1000): # For each of the 1000 case 1 experiment 1 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.048

Note: the false positive rate comes out to be 0.048 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [52]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.42201383886607474

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [53]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.5779861611339252

### Case 1 experiment 2: Calculating the false positives rate

Case 1: nothing is affected

Experiment 2: clip the probabilities at 0.9999 and 0.0001

In [55]:
count = 0

for i in range(1000, 2000): # For each of the 1000 case 1 experiment 2 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.04

Note: the false positive rate comes out to be 0.048 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [56]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.08063657325669037

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [58]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.9193634267433096

### Case 1 experiment 3: Calculating the false positives rate

Case 1: nothing is affected

Experiment 3: run 20 iterations of the EM algorithm (original)

In [60]:
count = 0

for i in range(2000, 3000): # For each of the 1000 case 1 experiment 3 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.052

Note: the false positive rate comes out to be 0.052 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [61]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.6486024357111534

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [62]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.3513975642888466

### Case 1 experiment 4: Calculating the false positives rate

Case 1: nothing is affected

Experiment 4: run 50 iterations of the EM algorithm

In [64]:
count = 0

for i in range(3000, 4000): # For each of the 1000 case 1 experiment 4 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.042

Note: the false positive rate comes out to be 0.042 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [65]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.1371327045494613

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [66]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.8628672954505388

### Case 1 experiment 5: Calculating the false positives rate

Case 1: nothing is affected

Experiment 5: run 100 iterations of the EM algorithm

In [68]:
count = 0

for i in range(4000, 5000): # For each of the 1000 case 1 experiment 5 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.04

Note: the false positive rate comes out to be 0.04 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [69]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.08063657325669037

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [70]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.9193634267433096

### Case 1 experiment 6: Calculating the false positives rate

Case 1: nothing is affected

Experiment 6: initial probabilities are 0.5 (original)

In [72]:
count = 0

for i in range(5000, 6000): # For each of the 1000 case 1 experiment 6 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.049

Note: the false positive rate comes out to be 0.049 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [73]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.47974105708732373

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [74]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.5202589429126763

### Case 1 experiment 7: Calculating the false positives rate

Case 1: nothing is affected

Experiment 7: initial probabilities are 0.25

In [76]:
count = 0

for i in range(6000, 7000): # For each of the 1000 case 1 experiment 7 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.047

Note: the false positive rate comes out to be 0.047 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [77]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.36556001516442105

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [78]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.634439984835579

### Case 1 experiment 8: Calculating the false positives rate

Case 1: nothing is affected

Experiment 8: initial probabilities are 0.75

In [86]:
count = 0

for i in range(7000, 8000): # For each of the 1000 case 1 experiment 8 files (0000-0999)...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_1[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For each structure in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else: # The test statistic is the probabilities, which were sorted, so stop after the first structure we fail to reject
            break
     
    if any(reject_p > 0): # If there's at least one structure with p < 0.05...
        count += 1 # Add 1 to the false positive count

count / 1000

0.058

Note: the false positive rate comes out to be 0.058 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [87]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.8894384782391881

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [88]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.11056152176081191

## Case 2 repeats

General format of case 2 repeat files: case_2_experiment_x_repeat_xxxx.npz

Case 2 experiments 1-8, repeats 0000 to 0999 → 8 * 1000 = 8000 files

### Case 2: Obtaining a list of file names

In [82]:
case_2 = []

for i in range(1, 9):
    for j in range(0, 1000):
        if j < 10:
            string = "fwer_experiments/case_2_experiment_" + str(i) + "_repeat_" + "000" + str(j) + ".npz"
        elif j < 100:
            string = "fwer_experiments/case_2_experiment_" + str(i) + "_repeat_" + "00" + str(j) + ".npz"
        else:
            string = "fwer_experiments/case_2_experiment_" + str(i) + "_repeat_" + "0" + str(j) + ".npz"
        case_2.append(string)

case_2[0:4], case_2[7996:8001]

(['fwer_experiments/case_2_experiment_1_repeat_0000.npz',
  'fwer_experiments/case_2_experiment_1_repeat_0001.npz',
  'fwer_experiments/case_2_experiment_1_repeat_0002.npz',
  'fwer_experiments/case_2_experiment_1_repeat_0003.npz'],
 ['fwer_experiments/case_2_experiment_8_repeat_0996.npz',
  'fwer_experiments/case_2_experiment_8_repeat_0997.npz',
  'fwer_experiments/case_2_experiment_8_repeat_0998.npz',
  'fwer_experiments/case_2_experiment_8_repeat_0999.npz'])

### Case 2 experiment 1: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 1: clip the probabilities at 0.999 and 0.001 (original)

In [89]:
count = 0

for i in range(0, 1000): # For each of the 1000 case 2 experiment 1 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 1000

0.057

Note: the false positive rate comes out to be 0.057 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [90]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.8610810536909479

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [91]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.1389189463090521

### Case 2 experiment 2: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 2: clip the probabilities at 0.9999 and 0.0001

In [93]:
count = 0

for i in range(1000, 2000): # For each of the 100 case 2 experiment 2 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 1000

0.047

Note: the false positive rate comes out to be 0.047 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [94]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.36556001516442105

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [95]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.634439984835579

### Case 2 experiment 3: Calculating the false positives rate

Case 2: left hippocampus is affected

Experiment 3: run 20 iterations of the EM algorithm (original)

In [97]:
count = 0

for i in range(2000, 3000): # For each of the 100 case 2 experiment 3 files...
    reject_p = np.array([0, 0, 0, 0, 0, 0, 0, 0]) # For each repeat, assume no structures are affected (null hypothesis)
    file = np.load(case_2[i]) # Load the file
    file = file["arr_0"] # Values corresponding to the key
    file = sorting_function(file)
    
    for j in range(0, 7): # For the last two structures (left amygdala levels 1 and 2) in each repeat...
        p = file[j].find("p=") # Index of "p" in the string
        pval = float(file[j][p::][2::]) # Extract the p-value
        if pval < 0.05: # If the p-value is < 0.05...
            reject_p[j] = 1
        else:
            break
    
    for r, s in zip(reject_p, file):
        if "Amyg" in s and r: # If "Amyg" is in the string and we decided to reject H0...
            count += 1
            break


count / 1000

0.06

Note: the false positive rate comes out to be 0.06 regardless of whether I comment out the line `file = sorting_function(file)` or not

#### Probability of estimating a false positive rate at least this small, when the true false positive rate is 0.05. 

In [98]:
binom.cdf(k = count, n = 1000, p = 0.05)

0.9329374813666923

#### Probability of estimating a false positive rate at least this big, when the true false positive rate is 0.05.

In [99]:
1 - binom.cdf(k = count, n = 1000, p = 0.05)

0.0670625186333077