## aggregating every test in an analysis

For complex analyses we often perform many different tests (e.g. compute pairwise correlations for every variable in each of several datasets). The `Union()` object provides helpful abstraction to aggregate all these pvalues and adjust them to correct for multiple testing. 

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline

from explore.Base import Union
from explore.SingleBlock import SingleBlock

In [2]:
# sample 4 datasets
# notice that the variables are all independent of each dataset
# so in principle we should not see any associations
dims = [10, 20, 30, 40]
n_samples = 100

np.random.seed(3423)
datasets = []
for dim in dims:
    X = np.random.normal(size=(n_samples, dim))
    datasets.append(X)

In [3]:
# SingleBlock() makes all pairwise comparisions of variables within a dataset
# e.g. computes all pairwise correlations
pairwise_explorations = []
for d in range(len(datasets)):
    pairwise_explorations.append(SingleBlock().fit(datasets[d]))

In [4]:
# (raw) pvalue for comparing variables 0 and 1 in dataset 0
# note that variables 0 and 1 are independent so the fact this pvalue 
# is less than 0.05 is a spurious correlation!
pairwise_explorations[0].comparisons_[frozenset({0, 1})].pval_

0.044014173512076225

In [5]:
# the Union() object aggregates every test which in this case
# means all 4 SingleBlock() objects. We can then control for 
# multiple testing using Benjamini Hochberg
agg = Union(multi_test='fdr_bh')

# put all test in a list of tuples where the first term is
# a name and the second term is the test object
all_tests = [('dataset_{}'.format(d), pairwise_explorations[d])
             for d in range(len(datasets))]

agg.add_tests(all_tests)
     
# correct pvalues for every test
agg.correct_multi_tests()

Union(alpha=0.05, multi_test='fdr_bh')

In [6]:
# (adjusted) pvalue for comparing variables 0 and 1 in dataset 0
# after correcting for multiple testing variables 0 and 1 are no longer
# show a significant association!
pairwise_explorations[0].comparisons_[frozenset({0, 1})].pval_

0.913103155104317

In [7]:
# we can still access the raw pvalue through .pval_raw_
pairwise_explorations[0].comparisons_[frozenset({0, 1})].pval_raw_

0.044014173512076225