# CorALS - Quickstart

## Prepare parallelization

Before running anything, we make sure that `numpy` will not  oversubscribe CPUs and slow things down.
Note that this has to be executed **before importing `numpy`**.

* For full correlation matrix calculation, setting `n_threads > 1` can be used to parallelize the calculation.
* For the top-k approaches, setting `n_threads=1` makes the most sense, since parallelization is specified separately.

In [1]:
from corals.threads import set_threads_for_external_libraries
set_threads_for_external_libraries(n_threads=1)

## Load data

In [2]:
import numpy as np

# create random data
n_features = 20000
n_samples = 50
X = np.random.random((n_samples, n_features))

## Full correlation matrix computation

In [3]:
# runtime: ~2 sec
from corals.correlation.full.default import cor_full
cor_values = cor_full(X)

## Top-k correlation matrix computation using Spearman correlation

In [4]:
# runtime: ~20 sec with `n_jobs=8`
from corals.correlation.topk.default import cor_topk
cor_topk_result = cor_topk(X, k=0.001, correlation_type="spearman", n_jobs=8)

## Top-k differential correlation matrix computation using Spearman correlation

In [None]:
# generate some more data
X1 = X
X2 = np.random.random((n_samples, n_features))

In [None]:
from corals.correlation.topkdiff.default import cor_topkdiff
cor_topkdiff_result = cor_topkdiff(X1, X2, k=0.001, correlation_type="spearman", n_jobs=8)

## Calculating p-values

In [None]:

# reusing correlation from the top-k example
# runtime: ~5 sec with `n_jobs=8`
from corals.correlation.topk.default import cor_topk
cor_topk_values, cor_topk_coo = cor_topk(X, correlation_type="spearman", k=0.001, n_jobs=8)

from corals.correlation.utils import derive_pvalues, multiple_test_correction
n_samples = X.shape[0]
n_features = X.shape[1]

# calculate p-values
pvalues = derive_pvalues(cor_topk_values, n_samples)

# multiple hypothesis correction
pvalues_corrected = multiple_test_correction(pvalues, n_features, method="fdr_bh")