# Description
This notebook demonstrates:

1. how to compute coefficients values
2. how to correlate gene expression data with categorical metadata

using CCC GPU with public data from GTEx v8.

Please follow the instructions in the [README](../README.md), section "Quick Install with pip" to install CCC-GPU with a conda environment `ccc-gpu-env`.

Then activate the environment and start the jupyter notebook server in order to run this notebook.

```bash
conda activate ccc-gpu-env
pip install notebook
jupyter notebook
```

In [22]:
import os
import re
import pandas as pd
import urllib.request
from tqdm import tqdm
from pathlib import Path

from ccc.utils import simplify_string
from ccc import conf

In [39]:
# Set this path to the directory where you want to save the intermediate data and results
ANALYSIS_DIR = Path("/mnt/data/proj_data/ccc-gpu/data/tutorial")

## Data Fetching and Preprocessing
This section downloads:
1. the public GTEx v8 gene TPMs data (https://www.gtexportal.org/home/downloads/adult-gtex/bulk_tissue_expression)
2. the GTEx sample attributes file (https://www.gtexportal.org/home/downloads/adult-gtex/metadata)
3. the GTEx subject attributes file (https://www.gtexportal.org/home/downloads/adult-gtex/metadata)

and perform preprocessing to prepare the data for the analysis.

### Download GTEx v8 gene expression data and split by tissue

In [24]:
# Create analysis directory if it doesn't exist
os.makedirs(ANALYSIS_DIR, exist_ok=True)

# Define files to download
files_to_download = {
    "gtex_all_sample_ids_with_expr_data": "https://storage.googleapis.com/adult-gtex/bulk-gex/v8/rna-seq/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz",
    "gtex_sample_attrs": "https://storage.googleapis.com/adult-gtex/annotations/v8/metadata-files/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt",
    "gtex_subject_attrs": "https://storage.googleapis.com/adult-gtex/annotations/v8/metadata-files/GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt"
}

# Dictionary to store file paths
file_paths = {}

# Download files
for var_name, url in files_to_download.items():
    filename = Path(url).name
    file_path = Path(ANALYSIS_DIR) / filename
    file_paths[var_name] = file_path
    
    if not file_path.exists():
        print(f"Downloading {var_name} to {file_path}")
        urllib.request.urlretrieve(url, file_path)
        print("Download completed!")
    else:
        print(f"{var_name} already exists at {file_path}")

gtex_all_sample_ids_with_expr_data already exists at /mnt/data/proj_data/ccc-gpu/data/tutorial/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz
gtex_sample_attrs already exists at /mnt/data/proj_data/ccc-gpu/data/tutorial/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
Downloading gtex_subject_attrs to /mnt/data/proj_data/ccc-gpu/data/tutorial/GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt
Download completed!


In [25]:
gtex_sample_attrs = pd.read_csv(file_paths["gtex_sample_attrs"], sep="\t")
print(f"GTEx sample attributes shape: {gtex_sample_attrs.shape}")
print(f"GTEx sample attributes columns: {gtex_sample_attrs.columns}")

GTEx sample attributes shape: (22951, 63)
GTEx sample attributes columns: Index(['SAMPID', 'SMATSSCR', 'SMCENTER', 'SMPTHNTS', 'SMRIN', 'SMTS', 'SMTSD',
       'SMUBRID', 'SMTSISCH', 'SMTSPAX', 'SMNABTCH', 'SMNABTCHT', 'SMNABTCHD',
       'SMGEBTCH', 'SMGEBTCHD', 'SMGEBTCHT', 'SMAFRZE', 'SMGTC', 'SME2MPRT',
       'SMCHMPRS', 'SMNTRART', 'SMNUMGPS', 'SMMAPRT', 'SMEXNCRT', 'SM550NRM',
       'SMGNSDTC', 'SMUNMPRT', 'SM350NRM', 'SMRDLGTH', 'SMMNCPB', 'SME1MMRT',
       'SMSFLGTH', 'SMESTLBS', 'SMMPPD', 'SMNTERRT', 'SMRRNANM', 'SMRDTTL',
       'SMVQCFL', 'SMMNCV', 'SMTRSCPT', 'SMMPPDPR', 'SMCGLGTH', 'SMGAPPCT',
       'SMUNPDRD', 'SMNTRNRT', 'SMMPUNRT', 'SMEXPEFF', 'SMMPPDUN', 'SME2MMRT',
       'SME2ANTI', 'SMALTALG', 'SME2SNSE', 'SMMFLGTH', 'SME1ANTI', 'SMSPLTRD',
       'SMBSMMRT', 'SME1SNSE', 'SME1PCTS', 'SMRRNART', 'SME1MPRT', 'SMNUM5CD',
       'SMDPMPRT', 'SME2PCTS'],
      dtype='object')


In [26]:
# Get tissue names
gtex_tissues = gtex_sample_attrs["SMTSD"].unique()
print(len(gtex_tissues))
print(gtex_tissues)

55
['Whole Blood' 'Brain - Frontal Cortex (BA9)' 'Adipose - Subcutaneous'
 'Muscle - Skeletal' 'Artery - Tibial' 'Artery - Coronary'
 'Heart - Atrial Appendage' 'Adipose - Visceral (Omentum)' 'Ovary'
 'Uterus' 'Vagina' 'Breast - Mammary Tissue'
 'Skin - Not Sun Exposed (Suprapubic)' 'Minor Salivary Gland'
 'Brain - Cortex' 'Adrenal Gland' 'Thyroid' 'Lung' 'Spleen' 'Pancreas'
 'Esophagus - Muscularis' 'Esophagus - Mucosa'
 'Esophagus - Gastroesophageal Junction' 'Stomach' 'Colon - Sigmoid'
 'Small Intestine - Terminal Ileum' 'Colon - Transverse' 'Prostate'
 'Testis' 'Skin - Sun Exposed (Lower leg)' 'Nerve - Tibial'
 'Heart - Left Ventricle' 'Pituitary' 'Brain - Cerebellum'
 'Cells - Cultured fibroblasts' 'Artery - Aorta'
 'Cells - EBV-transformed lymphocytes' 'Brain - Cerebellar Hemisphere'
 'Brain - Caudate (basal ganglia)'
 'Brain - Nucleus accumbens (basal ganglia)'
 'Brain - Putamen (basal ganglia)' 'Brain - Hypothalamus'
 'Brain - Spinal cord (cervical c-1)' 'Liver' 'Brain - Hippoc

#### Get sample IDs for each tissue

In [27]:
# first, get all sample IDs with expression data
gtex_all_sample_ids_with_expr_data = set(
    pd.read_csv(
        file_paths["gtex_all_sample_ids_with_expr_data"],
        sep="\t",
        skiprows=2,
        nrows=1,
        usecols=lambda x: x not in ("Name", "Description"),
    ).columns
)

print(f"Number of samples with expression data: {len(gtex_all_sample_ids_with_expr_data)}")
print(f"Sample IDs with expression data: {list(gtex_all_sample_ids_with_expr_data)[:10]}")

Number of samples with expression data: 17382
Sample IDs with expression data: ['GTEX-1HFI7-2426-SM-B2LXV', 'GTEX-11TTK-0226-SM-5N9EC', 'GTEX-11UD2-1226-SM-5EQMI', 'GTEX-X4EO-0006-SM-3P5ZF', 'GTEX-13O21-0326-SM-5J1N9', 'GTEX-XBED-1526-SM-4AT5W', 'GTEX-13NZ8-0011-R8b-SM-5KM48', 'GTEX-1H3O1-0005-SM-ACKV8', 'GTEX-13JVG-0011-R5a-SM-5MR4O', 'GTEX-1F88F-1126-SM-7MKHL']


In [28]:
# get sample IDs by tissue
sample_ids_by_tissue = {
    tissue_name: sorted(
        list(
            gtex_all_sample_ids_with_expr_data.intersection(
                set(
                    gtex_sample_attrs[gtex_sample_attrs["SMTSD"] == tissue_name][
                        "SAMPID"
                    ].tolist()
                )
            )
        )
    )
    for tissue_name in gtex_tissues
}

assert len(gtex_tissues) == len(sample_ids_by_tissue)

In [29]:
sample_ids_by_tissue["Whole Blood"][:10]

['GTEX-111YS-0006-SM-5NQBE',
 'GTEX-1122O-0005-SM-5O99J',
 'GTEX-1128S-0005-SM-5P9HI',
 'GTEX-113IC-0006-SM-5NQ9C',
 'GTEX-113JC-0006-SM-5O997',
 'GTEX-117XS-0005-SM-5PNU6',
 'GTEX-117YW-0005-SM-5NQ8Z',
 'GTEX-1192W-0005-SM-5NQBQ',
 'GTEX-1192X-0005-SM-5NQC3',
 'GTEX-11DXW-0006-SM-5NQ7Y']

In [30]:
# Ensure all IDs are unique
assert all(
    [
        len(sample_ids_by_tissue[tissue_name])
        == len(set(sample_ids_by_tissue[tissue_name]))
        for tissue_name in sample_ids_by_tissue.keys()
    ]
)

#### Show sample size by tissue

In [31]:
tissue_sample_size = pd.DataFrame(
    [{"tissue": k, "sample_size": len(v)} for k, v in sample_ids_by_tissue.items()]
)

tissue_sample_size = tissue_sample_size.sort_values("sample_size", ascending=False)
display(tissue_sample_size)

Unnamed: 0,tissue,sample_size
3,Muscle - Skeletal,803
0,Whole Blood,755
29,Skin - Sun Exposed (Lower leg),701
4,Artery - Tibial,663
2,Adipose - Subcutaneous,663
16,Thyroid,653
30,Nerve - Tibial,619
12,Skin - Not Sun Exposed (Suprapubic),604
17,Lung,578
21,Esophagus - Mucosa,555


In [32]:
# Simple validations
_tmp = tissue_sample_size.set_index("tissue").squeeze()
assert _tmp.loc["Muscle - Skeletal"] == 803
assert _tmp.loc["Whole Blood"] == 755
assert _tmp.loc["Skin - Not Sun Exposed (Suprapubic)"] == 604
assert _tmp.loc["Kidney - Medulla"] == 4

These numbers match those you can find here: https://gtexportal.org/home/tissueSummaryPage#sampleCountsPerTissue

### Split expression data by tissue

In [33]:
TISSUE_DATA_DIR = ANALYSIS_DIR / "data_by_tissue"
TISSUE_DATA_DIR.mkdir(parents=True, exist_ok=True)

pbar = tqdm(tissue_sample_size["tissue"])

gene_id_symbol_map_tuples = set()

for tissue_name in pbar:
    pbar.set_description(tissue_name)

    tissue_ids = sample_ids_by_tissue[tissue_name]
    if len(tissue_ids) == 0:
        continue

    # Generate output filename
    tissue_name_simple = simplify_string(simplify_string(tissue_name.lower()))
    output_file = TISSUE_DATA_DIR / f"gtex_v8_data_{tissue_name_simple}.pkl"
    output_gene_mappings = ANALYSIS_DIR / "gtex_gene_id_symbol_mappings.pkl"
    
    # Skip if file already exists
    if output_file.exists() and output_gene_mappings.exists():
        print(f"Skipping {tissue_name} - file already exists")
        continue

    try:
        tissue_data = pd.read_csv(
            file_paths["gtex_all_sample_ids_with_expr_data"],
            sep="\t",
            skiprows=2,
            usecols=["Name", "Description"] + tissue_ids,
        )

        tissue_data = tissue_data.rename(
            columns={
                "Name": "gene_ens_id",
                "Description": "gene_symbol",
            }
        )

        # Validate data before processing
        if tissue_data.empty:
            print(f"Warning: No data found for {tissue_name}")
            continue

        # add gene id / gene symbol to mapping variable
        gene_id_symbol_map_tuples.update(
            tissue_data[["gene_ens_id", "gene_symbol"]].itertuples(index=False)
        )

        tissue_data = tissue_data.drop(columns=["gene_symbol"]).set_index("gene_ens_id")

        # Data quality checks
        assert not tissue_data.isna().any().any(), f"NaN values found in {tissue_name}"
        assert tissue_data.index.is_unique, f"Non-unique gene IDs in {tissue_name}"
        assert tissue_data.columns.is_unique, f"Non-unique sample IDs in {tissue_name}"

        # save
        tissue_data.to_pickle(path=output_file)
        
    except Exception as e:
        print(f"Error processing {tissue_name}: {str(e)}")
        continue

Cells - Leukemia cell line (CML): 100%|█████████████████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 4357.51it/s]

Skipping Muscle - Skeletal - file already exists
Skipping Whole Blood - file already exists
Skipping Skin - Sun Exposed (Lower leg) - file already exists
Skipping Artery - Tibial - file already exists
Skipping Adipose - Subcutaneous - file already exists
Skipping Thyroid - file already exists
Skipping Nerve - Tibial - file already exists
Skipping Skin - Not Sun Exposed (Suprapubic) - file already exists
Skipping Lung - file already exists
Skipping Esophagus - Mucosa - file already exists
Skipping Adipose - Visceral (Omentum) - file already exists
Skipping Esophagus - Muscularis - file already exists
Skipping Cells - Cultured fibroblasts - file already exists
Skipping Breast - Mammary Tissue - file already exists
Skipping Heart - Left Ventricle - file already exists
Skipping Artery - Aorta - file already exists
Skipping Heart - Atrial Appendage - file already exists
Skipping Colon - Transverse - file already exists
Skipping Esophagus - Gastroesophageal Junction - file already exists
Ski




In [34]:
# Simple validations
_tmp = pd.read_pickle(TISSUE_DATA_DIR / "gtex_v8_data_brain_cerebellar_hemisphere.pkl")

assert "GTEX-11DXY-0011-R11a-SM-DNZZN" in _tmp.columns
assert "GTEX-WL46-0011-R11A-SM-3MJFT" in _tmp.columns
assert "GTEX-ZF28-0011-R11a-SM-4WWEI" in _tmp.columns

_v = _tmp.loc["ENSG00000223972.5", "GTEX-11DXY-0011-R11a-SM-DNZZN"]
assert _v == 0.04045, _v
_v = _tmp.loc["ENSG00000278267.1", "GTEX-11DXY-0011-R11a-SM-DNZZN"]
assert _v == 0.0, _v

_v = _tmp.loc["ENSG00000233327.10", "GTEX-WL46-0011-R11A-SM-3MJFT"]
assert _v == 146.4000, _v
_v = _tmp.loc["ENSG00000237118.2", "GTEX-WL46-0011-R11A-SM-3MJFT"]
assert _v == 0.3357, _v

_v = _tmp.loc["ENSG00000233327.10", "GTEX-ZF28-0011-R11a-SM-4WWEI"]
assert _v == 30.7200, _v
_v = _tmp.loc["ENSG00000186907.7", "GTEX-ZF28-0011-R11a-SM-4WWEI"]
assert _v == 0.94720, _v

### Save gene mappings

In [37]:
output_gene_mappings = ANALYSIS_DIR / "gtex_gene_id_symbol_mappings.pkl"

if output_gene_mappings.exists():
    gene_mappings = pd.read_pickle(output_gene_mappings)
    print(f"Loaded existing gene mappings from {output_gene_mappings}")
else:
    gene_mappings = pd.DataFrame(gene_id_symbol_map_tuples)
    gene_mappings.to_pickle(output_gene_mappings)
    print(f"Created and saved gene mappings to {output_gene_mappings}")

print(f"gene_mappings.shape: {gene_mappings.shape}")
print(gene_mappings.head())

Loaded existing gene mappings from /mnt/data/proj_data/ccc-gpu/data/tutorial/gtex_gene_id_symbol_mappings.pkl
gene_mappings.shape: (56200, 2)
          gene_ens_id  gene_symbol
0  ENSG00000144278.14      GALNT13
1   ENSG00000260976.1    LINC01633
2  ENSG00000186660.14        ZFP91
3  ENSG00000123560.13         PLP1
4   ENSG00000227371.1  RP11-3L10.2


In [43]:
# Simple validations
# no null
assert gene_mappings.dropna(how="any").shape == gene_mappings.shape
# no duplicates
assert gene_mappings.drop_duplicates().shape == gene_mappings.shape

_tmp = gene_mappings.set_index("gene_ens_id").squeeze()
assert _tmp.loc["ENSG00000223972.5"] == "DDX11L1"
assert _tmp.loc["ENSG00000243485.5"] == "MIR1302-2HG"
assert _tmp.loc["ENSG00000274059.1"] == "5S_rRNA"  # repeated gene
assert _tmp.loc["ENSG00000275305.1"] == "5S_rRNA"  # repeated gene

## Compute correlation coefficients

We provide a command-line tool for computing CCC, Spearman, and Pearson correlations between two genes in a given tissue.

```bash
usage: compute_single_gene_pair_correlations_cli.py [-h] [--tissue TISSUE] [--data-dir DATA_DIR] [--gene-mapping GENE_MAPPING] [--list-tissues] [--show-genes TISSUE] [--n-genes N_GENES] [--debug] [genes ...]
```

In [53]:
# Make sure you start the notebook from the ROOT directory of the project

# Preview genes in a tissue
%run ./nbs/common/compute_single_gene_pair_correlations_cli.py --show-genes whole_blood --data-dir {TISSUE_DATA_DIR} --gene-mapping {ANALYSIS_DIR}/gtex_gene_id_symbol_mappings.pkl

[2025-09-25 11:33:38,498 - root] INFO: Loading tissue data from: /mnt/data/proj_data/ccc-gpu/data/tutorial/data_by_tissue/gtex_v8_data_whole_blood.pkl
[2025-09-25 11:33:38,644 - root] INFO: Tissue data shape: (56200, 755)
[2025-09-25 11:33:38,644 - root] INFO: Loading gene mapping from: /mnt/data/proj_data/ccc-gpu/data/tutorial/gtex_gene_id_symbol_mappings.pkl
[2025-09-25 11:33:38,649 - root] INFO: Loaded 56200 gene mappings



=== Tissue: whole_blood ===
Total genes: 56,200
Total samples: 755

First 20 genes:
------------------------------------------------------------
#    Gene Symbol     Ensembl ID          
------------------------------------------------------------
1    DDX11L1         ENSG00000223972.5   
2    WASH7P          ENSG00000227232.5   
3    MIR6859-1       ENSG00000278267.1   
4    MIR1302-2HG     ENSG00000243485.5   
5    FAM138A         ENSG00000237613.2   
6    OR4G4P          ENSG00000268020.3   
7    OR4G11P         ENSG00000240361.1   
8    OR4F5           ENSG00000186092.4   
9    RP11-34P13.7    ENSG00000238009.6   
10   CICP27          ENSG00000233750.3   
11   RP11-34P13.15   ENSG00000268903.1   
12   RP11-34P13.16   ENSG00000269981.1   
13   RP11-34P13.14   ENSG00000239906.1   
14   RP11-34P13.13   ENSG00000241860.6   
15   RNU6-1100P      ENSG00000222623.1   
16   RP11-34P13.9    ENSG00000241599.1   
17   ABC7-43046700E7.1 ENSG00000279928.2   
18   RP11-34P13.18   ENSG0000027945

In [56]:
# Compute CCC, Spearman, and Pearson correlations between two genes in a given tissue
%run ./nbs/common/compute_single_gene_pair_correlations_cli.py DDX11L1 WASH7P --tissue whole_blood --data-dir {TISSUE_DATA_DIR} --gene-mapping {ANALYSIS_DIR}/gtex_gene_id_symbol_mappings.pkl

[2025-09-25 11:33:38,676 - root] INFO: Loading gene mapping from: /mnt/data/proj_data/ccc-gpu/data/tutorial/gtex_gene_id_symbol_mappings.pkl
[2025-09-25 11:33:38,681 - root] INFO: Loaded 56200 gene mappings
[2025-09-25 11:33:38,686 - root] INFO: Loading tissue data from: /mnt/data/proj_data/ccc-gpu/data/tutorial/data_by_tissue/gtex_v8_data_whole_blood.pkl
[2025-09-25 11:33:38,824 - root] INFO: Tissue data shape: (56200, 755)
[2025-09-25 11:33:38,827 - root] INFO: Computing correlations for 755 samples
[2025-09-25 11:33:38,832 - root] INFO: Computing CCC correlation...
[2025-09-25 11:33:38,857 - root] INFO: Computing Pearson correlation...
[2025-09-25 11:33:38,871 - root] INFO: Computing Spearman correlation...



GENE PAIR CORRELATION RESULTS
Gene 1: DDX11L1 (ENSG00000223972.5)
Gene 2: WASH7P (ENSG00000227232.5)
Tissue: whole_blood
Samples: 755
------------------------------------------------------------
         CCC: 0.005060
     PEARSON: 0.063041
    SPEARMAN: 0.040069



## Metadata Correlation
We will compute the correlation between the gene expression and the metadata for each tissue. Metadata is downloaded from: https://www.gtexportal.org/home/downloads/adult-gtex/metadata

### Data Preparation

In [None]:
# Load GTEx samples info
gtex_samples = pd.read_csv(file_paths["gtex_sample_attrs"], sep="\t", index_col="SAMPID")
print(gtex_samples.shape)
assert gtex_samples.index.is_unique

(22951, 62)


In [None]:
# Load GTEx subject attributes
gtex_phenotypes = pd.read_csv(file_paths["gtex_subject_attrs"], sep="\t")
print(gtex_phenotypes.shape)
assert gtex_phenotypes.index.is_unique

(980, 4)


In [None]:
# Get GTEx sample metadata
gtex_samples_ids = gtex_samples.index.to_list()
print(gtex_samples_ids[:5])

['GTEX-1117F-0003-SM-58Q7G', 'GTEX-1117F-0003-SM-5DWSB', 'GTEX-1117F-0003-SM-6WBT7', 'GTEX-1117F-0011-R10a-SM-AHZ7F', 'GTEX-1117F-0011-R10b-SM-CYKQ8']


In [None]:
gtex_samples_ids = pd.Series(gtex_samples_ids).rename("SAMPID")
gtex_samples_ids

0             GTEX-1117F-0003-SM-58Q7G
1             GTEX-1117F-0003-SM-5DWSB
2             GTEX-1117F-0003-SM-6WBT7
3        GTEX-1117F-0011-R10a-SM-AHZ7F
4        GTEX-1117F-0011-R10b-SM-CYKQ8
                     ...              
22946                   K-562-SM-E9EZC
22947                   K-562-SM-E9EZI
22948                   K-562-SM-E9EZO
22949                   K-562-SM-E9EZT
22950                   K-562-SM-E9EZZ
Name: SAMPID, Length: 22951, dtype: object

In [None]:
gtex_subjects_ids = gtex_samples_ids.str.extract(
    r"([\w\d]+\-[\w\d]+)", flags=re.IGNORECASE, expand=True
)[0].rename("SUBJID")

gtex_subjects_ids

0        GTEX-1117F
1        GTEX-1117F
2        GTEX-1117F
3        GTEX-1117F
4        GTEX-1117F
            ...    
22946         K-562
22947         K-562
22948         K-562
22949         K-562
22950         K-562
Name: SUBJID, Length: 22951, dtype: object

In [None]:
gtex_metadata = pd.concat([gtex_samples_ids, gtex_subjects_ids], axis=1)
gtex_metadata

Unnamed: 0,SAMPID,SUBJID
0,GTEX-1117F-0003-SM-58Q7G,GTEX-1117F
1,GTEX-1117F-0003-SM-5DWSB,GTEX-1117F
2,GTEX-1117F-0003-SM-6WBT7,GTEX-1117F
3,GTEX-1117F-0011-R10a-SM-AHZ7F,GTEX-1117F
4,GTEX-1117F-0011-R10b-SM-CYKQ8,GTEX-1117F
...,...,...
22946,K-562-SM-E9EZC,K-562
22947,K-562-SM-E9EZI,K-562
22948,K-562-SM-E9EZO,K-562
22949,K-562-SM-E9EZT,K-562


In [None]:
gtex_phenotypes

Unnamed: 0,SUBJID,SEX,AGE,DTHHRDY
0,GTEX-1117F,2,60-69,4.0
1,GTEX-111CU,1,50-59,0.0
2,GTEX-111FC,1,60-69,1.0
3,GTEX-111VG,1,60-69,3.0
4,GTEX-111YS,1,60-69,0.0
...,...,...,...,...
975,GTEX-ZYY3,2,60-69,4.0
976,GTEX-ZZ64,1,20-29,0.0
977,GTEX-ZZPT,1,50-59,4.0
978,GTEX-ZZPU,2,50-59,0.0


In [None]:
gtex_metadata = pd.merge(gtex_metadata, gtex_phenotypes).set_index("SAMPID")
gtex_metadata

Unnamed: 0_level_0,SUBJID,SEX,AGE,DTHHRDY
SAMPID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GTEX-1117F-0003-SM-58Q7G,GTEX-1117F,2,60-69,4.0
GTEX-1117F-0003-SM-5DWSB,GTEX-1117F,2,60-69,4.0
GTEX-1117F-0003-SM-6WBT7,GTEX-1117F,2,60-69,4.0
GTEX-1117F-0011-R10a-SM-AHZ7F,GTEX-1117F,2,60-69,4.0
GTEX-1117F-0011-R10b-SM-CYKQ8,GTEX-1117F,2,60-69,4.0
...,...,...,...,...
K-562-SM-E9EZC,K-562,2,50-59,
K-562-SM-E9EZI,K-562,2,50-59,
K-562-SM-E9EZO,K-562,2,50-59,
K-562-SM-E9EZT,K-562,2,50-59,


In [None]:
gtex_metadata = pd.merge(gtex_metadata, gtex_samples, left_index=True, right_index=True)

gtex_metadata = gtex_metadata.replace(
    {
        "SEX": {
            1: "Male",
            2: "Female",
        }
    }
)

gtex_metadata = gtex_metadata.sort_index()

gtex_metadata.head()

Unnamed: 0_level_0,SUBJID,SEX,AGE,DTHHRDY,SMATSSCR,SMCENTER,SMPTHNTS,SMRIN,SMTS,SMTSD,...,SME1ANTI,SMSPLTRD,SMBSMMRT,SME1SNSE,SME1PCTS,SMRRNART,SME1MPRT,SMNUM5CD,SMDPMPRT,SME2PCTS
SAMPID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GTEX-1117F-0003-SM-58Q7G,GTEX-1117F,Female,60-69,4.0,,B1,,,Blood,Whole Blood,...,,,,,,,,,,
GTEX-1117F-0003-SM-5DWSB,GTEX-1117F,Female,60-69,4.0,,B1,,,Blood,Whole Blood,...,,,,,,,,,,
GTEX-1117F-0003-SM-6WBT7,GTEX-1117F,Female,60-69,4.0,,B1,,,Blood,Whole Blood,...,,,,,,,,,,
GTEX-1117F-0011-R10a-SM-AHZ7F,GTEX-1117F,Female,60-69,4.0,,"B1, A1",,,Brain,Brain - Frontal Cortex (BA9),...,,,,,,,,,,
GTEX-1117F-0011-R10b-SM-CYKQ8,GTEX-1117F,Female,60-69,4.0,,"B1, A1",,7.2,Brain,Brain - Frontal Cortex (BA9),...,,,,,,,,,,


In [None]:
# Simple validations
assert not gtex_metadata["SUBJID"].isna().any()

assert not gtex_metadata["SMTS"].isna().any()
assert not gtex_metadata["SMTSD"].isna().any()

assert not gtex_metadata["SEX"].isna().any()
assert gtex_metadata["SEX"].unique().shape[0] == 2
assert set(gtex_metadata["SEX"].unique()) == {"Female", "Male"}

In [None]:
# Save metadata
gtex_metadatadata_filename = ANALYSIS_DIR / "gtex_v8-sample_metadata.pkl"
gtex_metadata.to_pickle(gtex_metadatadata_filename)

### Metadata correlation
We also provide a command-line tool `nbs/common/metadata_corr_cli.py` for computing the correlation between the gene expression and the metadata for each tissue.

```bash
usage: metadata_corr_cli.py [-h] [--expr-data-dir EXPR_DATA_DIR] [--include [INCLUDE ...]] [--exclude [EXCLUDE ...]] [--permutations PERMUTATIONS]
                            [--n-jobs N_JOBS] [--list-metadata-columns] [--list-tissues] [--output-dir OUTPUT_DIR] [--quiet] [--no-csv-output]
                            [--no-individual-logs] [--data-dir DATA_DIR]
                            gene_symbols [gene_symbols ...]

Analyze gene expression correlations with metadata using CCC across multiple tissues

positional arguments:
  gene_symbols          Gene symbol(s) to analyze (e.g., RASSF2 TP53 BRCA1)

options:
  -h, --help            show this help message and exit
  --expr-data-dir EXPR_DATA_DIR
                        Directory containing expression data files (default: /pividori_lab/haoyu_projects/ccc-gpu/data/gtex/gene_selection/all)
  --include [INCLUDE ...]
                        Include only tissues matching these patterns (fuzzy match on tissue name) (default: None)
  --exclude [EXCLUDE ...]
                        Exclude tissues matching these patterns (fuzzy match on tissue name) (default: None)
  --permutations PERMUTATIONS
                        Number of permutations for p-value calculation (default: 100000)
  --n-jobs N_JOBS       Number of parallel jobs for computation (default: 4)
  --list-metadata-columns
                        List available metadata columns and exit (default: False)
  --list-tissues        List available tissue files and exit (default: False)
  --output-dir OUTPUT_DIR
                        Directory to save output files (default: current directory) (default: .)
  --quiet               Reduce output verbosity for batch processing (default: False)
  --no-csv-output       Skip CSV file generation (only create pickle files) (default: False)
  --no-individual-logs  Skip individual tissue log files (only keep summary logs) (default: False)
  --data-dir DATA_DIR   Directory containing GTEx data files (metadata and gene mappings) (default: /pividori_lab/haoyu_projects/ccc-gpu/data/gtex)
```

In [None]:
METADATA_CORRELATIONS_RESULT_DIR = ANALYSIS_DIR / "metadata_correlations"
os.makedirs(METADATA_CORRELATIONS_RESULT_DIR, exist_ok=True)

In [None]:
%run  ./nbs/common/metadata_corr_cli.py RASSF2 CYTIP --include whole_blood --expr-data-dir {TISSUE_DATA_DIR} --data-dir {ANALYSIS_DIR} --output-dir {METADATA_CORRELATIONS_RESULT_DIR}

[2025-09-25 13:05:17,840 - summary] INFO: Output directory: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations
[2025-09-25 13:05:17,840 - summary] INFO: Summary log file: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/_RASSF2_CYTIP_summary_execution.log
[2025-09-25 13:05:17,840 - summary] INFO: Summary tables file: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/_RASSF2_CYTIP_summary_tables.log
[2025-09-25 13:05:17,840 - summary] INFO: Gene symbols to analyze: RASSF2, CYTIP
[2025-09-25 13:05:17,857 - summary] INFO: 
[2025-09-25 13:05:17,858 - summary] INFO: PROCESSING GENE 1/2: RASSF2
[2025-09-25 13:05:17,858 - summary] INFO: 
[1/1] Starting processing for RASSF2 in whole_blood...
[2025-09-25 13:05:17,859 - tissue_RASSF2_whole_blood] INFO: 
[2025-09-25 13:05:17,859 - tissue_RASSF2_whole_blood] INFO: Processing tissue: whole_blood
[2025-09-25 13:05:17,859 - tissue_RASSF2_whole_blood] INFO: File: gtex_v8_data_whole_blood.pkl
[2025-09-25 13:05

Output directory: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations
Summary log file: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/_RASSF2_CYTIP_summary_execution.log
Summary tables file: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/_RASSF2_CYTIP_summary_tables.log
Gene symbols to analyze: RASSF2, CYTIP
Found 1 expression files to process:
  whole_blood: gtex_v8_data_whole_blood.pkl
Loading metadata and gene mapping files...
Loaded metadata: (22951, 66)
Loaded gene mapping: (56200, 2)

PROCESSING GENE 1/2: RASSF2

[1/1] Starting processing for RASSF2 in whole_blood...

Processing tissue: whole_blood
File: gtex_v8_data_whole_blood.pkl
Log file: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_whole_blood.log
Loading expression data...
Expression data shape: (56200, 755)
Gene ID for RASSF2: ENSG00000101265.15
Number of samples: 755
Common samples: 755
Computing CCC between RASSF2 expression and all metadata columns

[2025-09-25 13:05:18,143 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.000000, p-value: 1.00e+00
[2025-09-25 13:05:18,144 - tissue_RASSF2_whole_blood] INFO: Processing column 2/66: SEX
[2025-09-25 13:05:18,217 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.007134, p-value: 1.23e-02
[2025-09-25 13:05:18,217 - tissue_RASSF2_whole_blood] INFO: Processing column 3/66: AGE
[2025-09-25 13:05:18,291 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.039824, p-value: 1.00e-05
[2025-09-25 13:05:18,291 - tissue_RASSF2_whole_blood] INFO: Processing column 4/66: DTHHRDY


  CCC: 0.000000, p-value: 1.00e+00
Processing column 2/66: SEX
  CCC: 0.007134, p-value: 1.23e-02
Processing column 3/66: AGE
  CCC: 0.039824, p-value: 1.00e-05
Processing column 4/66: DTHHRDY


[2025-09-25 13:05:18,547 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.464582, p-value: 1.00e-05
[2025-09-25 13:05:18,548 - tissue_RASSF2_whole_blood] INFO: Processing column 5/66: SMATSSCR
[2025-09-25 13:05:18,548 - tissue_RASSF2_whole_blood] INFO:   Skipping SMATSSCR: all values are NaN
[2025-09-25 13:05:18,548 - tissue_RASSF2_whole_blood] INFO: Processing column 6/66: SMCENTER
[2025-09-25 13:05:18,618 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.108148, p-value: 1.00e-05
[2025-09-25 13:05:18,618 - tissue_RASSF2_whole_blood] INFO: Processing column 7/66: SMPTHNTS
[2025-09-25 13:05:18,619 - tissue_RASSF2_whole_blood] INFO:   Skipping SMPTHNTS: all values are NaN
[2025-09-25 13:05:18,619 - tissue_RASSF2_whole_blood] INFO: Processing column 8/66: SMRIN


  CCC: 0.464582, p-value: 1.00e-05
Processing column 5/66: SMATSSCR
  Skipping SMATSSCR: all values are NaN
Processing column 6/66: SMCENTER
  CCC: 0.108148, p-value: 1.00e-05
Processing column 7/66: SMPTHNTS
  Skipping SMPTHNTS: all values are NaN
Processing column 8/66: SMRIN


[2025-09-25 13:05:18,872 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.048847, p-value: 1.00e-05
[2025-09-25 13:05:18,872 - tissue_RASSF2_whole_blood] INFO: Processing column 9/66: SMTS
[2025-09-25 13:05:18,873 - tissue_RASSF2_whole_blood] INFO:   Skipping SMTS: only 1 unique value(s)
[2025-09-25 13:05:18,874 - tissue_RASSF2_whole_blood] INFO: Processing column 10/66: SMTSD
[2025-09-25 13:05:18,874 - tissue_RASSF2_whole_blood] INFO:   Skipping SMTSD: only 1 unique value(s)
[2025-09-25 13:05:18,874 - tissue_RASSF2_whole_blood] INFO: Processing column 11/66: SMUBRID
[2025-09-25 13:05:18,875 - tissue_RASSF2_whole_blood] INFO:   Skipping SMUBRID: only 1 unique value(s)
[2025-09-25 13:05:18,875 - tissue_RASSF2_whole_blood] INFO: Processing column 12/66: SMTSISCH


  CCC: 0.048847, p-value: 1.00e-05
Processing column 9/66: SMTS
  Skipping SMTS: only 1 unique value(s)
Processing column 10/66: SMTSD
  Skipping SMTSD: only 1 unique value(s)
Processing column 11/66: SMUBRID
  Skipping SMUBRID: only 1 unique value(s)
Processing column 12/66: SMTSISCH


[2025-09-25 13:05:19,129 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.528125, p-value: 1.00e-05
[2025-09-25 13:05:19,130 - tissue_RASSF2_whole_blood] INFO: Processing column 13/66: SMTSPAX
[2025-09-25 13:05:19,130 - tissue_RASSF2_whole_blood] INFO:   Skipping SMTSPAX: all values are NaN
[2025-09-25 13:05:19,130 - tissue_RASSF2_whole_blood] INFO: Processing column 14/66: SMNABTCH
[2025-09-25 13:05:19,194 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.000884, p-value: 1.00e-05
[2025-09-25 13:05:19,195 - tissue_RASSF2_whole_blood] INFO: Processing column 15/66: SMNABTCHT
[2025-09-25 13:05:19,196 - tissue_RASSF2_whole_blood] INFO:   Skipping SMNABTCHT: only 1 unique value(s)
[2025-09-25 13:05:19,196 - tissue_RASSF2_whole_blood] INFO: Processing column 16/66: SMNABTCHD
[2025-09-25 13:05:19,259 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.000900, p-value: 1.00e-05
[2025-09-25 13:05:19,259 - tissue_RASSF2_whole_blood] INFO: Processing column 17/66: SMGEBTCH
[2025-09-25 13:05:19,316 - tissue_RASSF2

  CCC: 0.528125, p-value: 1.00e-05
Processing column 13/66: SMTSPAX
  Skipping SMTSPAX: all values are NaN
Processing column 14/66: SMNABTCH
  CCC: 0.000884, p-value: 1.00e-05
Processing column 15/66: SMNABTCHT
  Skipping SMNABTCHT: only 1 unique value(s)
Processing column 16/66: SMNABTCHD
  CCC: 0.000900, p-value: 1.00e-05
Processing column 17/66: SMGEBTCH
  CCC: 0.003663, p-value: 1.00e-05
Processing column 18/66: SMGEBTCHD


[2025-09-25 13:05:19,374 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.005827, p-value: 1.00e-05
[2025-09-25 13:05:19,374 - tissue_RASSF2_whole_blood] INFO: Processing column 19/66: SMGEBTCHT
[2025-09-25 13:05:19,375 - tissue_RASSF2_whole_blood] INFO:   Skipping SMGEBTCHT: only 1 unique value(s)
[2025-09-25 13:05:19,375 - tissue_RASSF2_whole_blood] INFO: Processing column 20/66: SMAFRZE
[2025-09-25 13:05:19,375 - tissue_RASSF2_whole_blood] INFO:   Skipping SMAFRZE: only 1 unique value(s)
[2025-09-25 13:05:19,376 - tissue_RASSF2_whole_blood] INFO: Processing column 21/66: SMGTC
[2025-09-25 13:05:19,376 - tissue_RASSF2_whole_blood] INFO:   Skipping SMGTC: all values are NaN
[2025-09-25 13:05:19,376 - tissue_RASSF2_whole_blood] INFO: Processing column 22/66: SME2MPRT


  CCC: 0.005827, p-value: 1.00e-05
Processing column 19/66: SMGEBTCHT
  Skipping SMGEBTCHT: only 1 unique value(s)
Processing column 20/66: SMAFRZE
  Skipping SMAFRZE: only 1 unique value(s)
Processing column 21/66: SMGTC
  Skipping SMGTC: all values are NaN
Processing column 22/66: SME2MPRT


[2025-09-25 13:05:19,629 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.172974, p-value: 1.00e-05
[2025-09-25 13:05:19,629 - tissue_RASSF2_whole_blood] INFO: Processing column 23/66: SMCHMPRS


  CCC: 0.172974, p-value: 1.00e-05
Processing column 23/66: SMCHMPRS


[2025-09-25 13:05:19,882 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.143365, p-value: 1.00e-05
[2025-09-25 13:05:19,882 - tissue_RASSF2_whole_blood] INFO: Processing column 24/66: SMNTRART


  CCC: 0.143365, p-value: 1.00e-05
Processing column 24/66: SMNTRART


[2025-09-25 13:05:20,136 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.243071, p-value: 1.00e-05
[2025-09-25 13:05:20,137 - tissue_RASSF2_whole_blood] INFO: Processing column 25/66: SMNUMGPS
[2025-09-25 13:05:20,137 - tissue_RASSF2_whole_blood] INFO:   Skipping SMNUMGPS: all values are NaN
[2025-09-25 13:05:20,137 - tissue_RASSF2_whole_blood] INFO: Processing column 26/66: SMMAPRT


  CCC: 0.243071, p-value: 1.00e-05
Processing column 25/66: SMNUMGPS
  Skipping SMNUMGPS: all values are NaN
Processing column 26/66: SMMAPRT


[2025-09-25 13:05:20,392 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.168576, p-value: 1.00e-05
[2025-09-25 13:05:20,393 - tissue_RASSF2_whole_blood] INFO: Processing column 27/66: SMEXNCRT


  CCC: 0.168576, p-value: 1.00e-05
Processing column 27/66: SMEXNCRT


[2025-09-25 13:05:20,646 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.040140, p-value: 1.00e-05
[2025-09-25 13:05:20,647 - tissue_RASSF2_whole_blood] INFO: Processing column 28/66: SM550NRM
[2025-09-25 13:05:20,647 - tissue_RASSF2_whole_blood] INFO:   Skipping SM550NRM: all values are NaN
[2025-09-25 13:05:20,648 - tissue_RASSF2_whole_blood] INFO: Processing column 29/66: SMGNSDTC


  CCC: 0.040140, p-value: 1.00e-05
Processing column 28/66: SM550NRM
  Skipping SM550NRM: all values are NaN
Processing column 29/66: SMGNSDTC


[2025-09-25 13:05:20,902 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.043013, p-value: 1.00e-05
[2025-09-25 13:05:20,903 - tissue_RASSF2_whole_blood] INFO: Processing column 30/66: SMUNMPRT
[2025-09-25 13:05:20,903 - tissue_RASSF2_whole_blood] INFO:   Skipping SMUNMPRT: only 1 unique value(s)
[2025-09-25 13:05:20,903 - tissue_RASSF2_whole_blood] INFO: Processing column 31/66: SM350NRM
[2025-09-25 13:05:20,903 - tissue_RASSF2_whole_blood] INFO:   Skipping SM350NRM: all values are NaN
[2025-09-25 13:05:20,904 - tissue_RASSF2_whole_blood] INFO: Processing column 32/66: SMRDLGTH


  CCC: 0.043013, p-value: 1.00e-05
Processing column 30/66: SMUNMPRT
  Skipping SMUNMPRT: only 1 unique value(s)
Processing column 31/66: SM350NRM
  Skipping SM350NRM: all values are NaN
Processing column 32/66: SMRDLGTH


[2025-09-25 13:05:21,156 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.000028, p-value: 1.73e-01
[2025-09-25 13:05:21,157 - tissue_RASSF2_whole_blood] INFO: Processing column 33/66: SMMNCPB
[2025-09-25 13:05:21,157 - tissue_RASSF2_whole_blood] INFO:   Skipping SMMNCPB: all values are NaN
[2025-09-25 13:05:21,157 - tissue_RASSF2_whole_blood] INFO: Processing column 34/66: SME1MMRT


  CCC: 0.000028, p-value: 1.73e-01
Processing column 33/66: SMMNCPB
  Skipping SMMNCPB: all values are NaN
Processing column 34/66: SME1MMRT


[2025-09-25 13:05:21,411 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.018125, p-value: 1.40e-04
[2025-09-25 13:05:21,412 - tissue_RASSF2_whole_blood] INFO: Processing column 35/66: SMSFLGTH


  CCC: 0.018125, p-value: 1.40e-04
Processing column 35/66: SMSFLGTH


[2025-09-25 13:05:21,665 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.047258, p-value: 1.00e-05
[2025-09-25 13:05:21,666 - tissue_RASSF2_whole_blood] INFO: Processing column 36/66: SMESTLBS
[2025-09-25 13:05:21,666 - tissue_RASSF2_whole_blood] INFO:   Skipping SMESTLBS: only 1 unique value(s)
[2025-09-25 13:05:21,666 - tissue_RASSF2_whole_blood] INFO: Processing column 37/66: SMMPPD


  CCC: 0.047258, p-value: 1.00e-05
Processing column 36/66: SMESTLBS
  Skipping SMESTLBS: only 1 unique value(s)
Processing column 37/66: SMMPPD


[2025-09-25 13:05:21,921 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.007761, p-value: 3.43e-02
[2025-09-25 13:05:21,921 - tissue_RASSF2_whole_blood] INFO: Processing column 38/66: SMNTERRT


  CCC: 0.007761, p-value: 3.43e-02
Processing column 38/66: SMNTERRT


[2025-09-25 13:05:22,175 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.250997, p-value: 1.00e-05
[2025-09-25 13:05:22,175 - tissue_RASSF2_whole_blood] INFO: Processing column 39/66: SMRRNANM


  CCC: 0.250997, p-value: 1.00e-05
Processing column 39/66: SMRRNANM


[2025-09-25 13:05:22,430 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.036631, p-value: 1.00e-05
[2025-09-25 13:05:22,430 - tissue_RASSF2_whole_blood] INFO: Processing column 40/66: SMRDTTL


  CCC: 0.036631, p-value: 1.00e-05
Processing column 40/66: SMRDTTL


[2025-09-25 13:05:22,686 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.010388, p-value: 6.63e-03
[2025-09-25 13:05:22,686 - tissue_RASSF2_whole_blood] INFO: Processing column 41/66: SMVQCFL


  CCC: 0.010388, p-value: 6.63e-03
Processing column 41/66: SMVQCFL


[2025-09-25 13:05:22,941 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.001442, p-value: 9.24e-01
[2025-09-25 13:05:22,942 - tissue_RASSF2_whole_blood] INFO: Processing column 42/66: SMMNCV
[2025-09-25 13:05:22,943 - tissue_RASSF2_whole_blood] INFO:   Skipping SMMNCV: all values are NaN
[2025-09-25 13:05:22,943 - tissue_RASSF2_whole_blood] INFO: Processing column 43/66: SMTRSCPT


  CCC: 0.001442, p-value: 9.24e-01
Processing column 42/66: SMMNCV
  Skipping SMMNCV: all values are NaN
Processing column 43/66: SMTRSCPT


[2025-09-25 13:05:23,199 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.042714, p-value: 1.00e-05
[2025-09-25 13:05:23,199 - tissue_RASSF2_whole_blood] INFO: Processing column 44/66: SMMPPDPR


  CCC: 0.042714, p-value: 1.00e-05
Processing column 44/66: SMMPPDPR


[2025-09-25 13:05:23,453 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.007761, p-value: 3.48e-02
[2025-09-25 13:05:23,454 - tissue_RASSF2_whole_blood] INFO: Processing column 45/66: SMCGLGTH
[2025-09-25 13:05:23,454 - tissue_RASSF2_whole_blood] INFO:   Skipping SMCGLGTH: all values are NaN
[2025-09-25 13:05:23,454 - tissue_RASSF2_whole_blood] INFO: Processing column 46/66: SMGAPPCT
[2025-09-25 13:05:23,455 - tissue_RASSF2_whole_blood] INFO:   Skipping SMGAPPCT: all values are NaN
[2025-09-25 13:05:23,455 - tissue_RASSF2_whole_blood] INFO: Processing column 47/66: SMUNPDRD
[2025-09-25 13:05:23,455 - tissue_RASSF2_whole_blood] INFO:   Skipping SMUNPDRD: only 1 unique value(s)
[2025-09-25 13:05:23,455 - tissue_RASSF2_whole_blood] INFO: Processing column 48/66: SMNTRNRT


  CCC: 0.007761, p-value: 3.48e-02
Processing column 45/66: SMCGLGTH
  Skipping SMCGLGTH: all values are NaN
Processing column 46/66: SMGAPPCT
  Skipping SMGAPPCT: all values are NaN
Processing column 47/66: SMUNPDRD
  Skipping SMUNPDRD: only 1 unique value(s)
Processing column 48/66: SMNTRNRT


[2025-09-25 13:05:23,710 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.202936, p-value: 1.00e-05
[2025-09-25 13:05:23,710 - tissue_RASSF2_whole_blood] INFO: Processing column 49/66: SMMPUNRT


  CCC: 0.202936, p-value: 1.00e-05
Processing column 49/66: SMMPUNRT


[2025-09-25 13:05:23,964 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.168576, p-value: 1.00e-05
[2025-09-25 13:05:23,964 - tissue_RASSF2_whole_blood] INFO: Processing column 50/66: SMEXPEFF


  CCC: 0.168576, p-value: 1.00e-05
Processing column 50/66: SMEXPEFF


[2025-09-25 13:05:24,219 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.059931, p-value: 1.00e-05
[2025-09-25 13:05:24,219 - tissue_RASSF2_whole_blood] INFO: Processing column 51/66: SMMPPDUN


  CCC: 0.059931, p-value: 1.00e-05
Processing column 51/66: SMMPPDUN


[2025-09-25 13:05:24,474 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.007761, p-value: 3.43e-02
[2025-09-25 13:05:24,474 - tissue_RASSF2_whole_blood] INFO: Processing column 52/66: SME2MMRT


  CCC: 0.007761, p-value: 3.43e-02
Processing column 52/66: SME2MMRT


[2025-09-25 13:05:24,730 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.003990, p-value: 4.05e-01
[2025-09-25 13:05:24,731 - tissue_RASSF2_whole_blood] INFO: Processing column 53/66: SME2ANTI


  CCC: 0.003990, p-value: 4.05e-01
Processing column 53/66: SME2ANTI


[2025-09-25 13:05:24,987 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.020742, p-value: 1.10e-04
[2025-09-25 13:05:24,988 - tissue_RASSF2_whole_blood] INFO: Processing column 54/66: SMALTALG


  CCC: 0.020742, p-value: 1.10e-04
Processing column 54/66: SMALTALG


[2025-09-25 13:05:25,242 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.177009, p-value: 1.00e-05
[2025-09-25 13:05:25,242 - tissue_RASSF2_whole_blood] INFO: Processing column 55/66: SME2SNSE


  CCC: 0.177009, p-value: 1.00e-05
Processing column 55/66: SME2SNSE


[2025-09-25 13:05:25,496 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.019048, p-value: 1.80e-04
[2025-09-25 13:05:25,497 - tissue_RASSF2_whole_blood] INFO: Processing column 56/66: SMMFLGTH


  CCC: 0.019048, p-value: 1.80e-04
Processing column 56/66: SMMFLGTH


[2025-09-25 13:05:25,751 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.019296, p-value: 1.50e-04
[2025-09-25 13:05:25,751 - tissue_RASSF2_whole_blood] INFO: Processing column 57/66: SME1ANTI


  CCC: 0.019296, p-value: 1.50e-04
Processing column 57/66: SME1ANTI


[2025-09-25 13:05:26,007 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.021058, p-value: 1.20e-04
[2025-09-25 13:05:26,008 - tissue_RASSF2_whole_blood] INFO: Processing column 58/66: SMSPLTRD


  CCC: 0.021058, p-value: 1.20e-04
Processing column 58/66: SMSPLTRD


[2025-09-25 13:05:26,263 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.057786, p-value: 1.00e-05
[2025-09-25 13:05:26,264 - tissue_RASSF2_whole_blood] INFO: Processing column 59/66: SMBSMMRT


  CCC: 0.057786, p-value: 1.00e-05
Processing column 59/66: SMBSMMRT


[2025-09-25 13:05:26,518 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.005333, p-value: 1.83e-01
[2025-09-25 13:05:26,518 - tissue_RASSF2_whole_blood] INFO: Processing column 60/66: SME1SNSE


  CCC: 0.005333, p-value: 1.83e-01
Processing column 60/66: SME1SNSE


[2025-09-25 13:05:26,773 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.022008, p-value: 5.00e-05
[2025-09-25 13:05:26,773 - tissue_RASSF2_whole_blood] INFO: Processing column 61/66: SME1PCTS


  CCC: 0.022008, p-value: 5.00e-05
Processing column 61/66: SME1PCTS


[2025-09-25 13:05:27,030 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.032073, p-value: 2.00e-05
[2025-09-25 13:05:27,030 - tissue_RASSF2_whole_blood] INFO: Processing column 62/66: SMRRNART


  CCC: 0.032073, p-value: 2.00e-05
Processing column 62/66: SMRRNART


[2025-09-25 13:05:27,285 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.048437, p-value: 1.00e-05
[2025-09-25 13:05:27,286 - tissue_RASSF2_whole_blood] INFO: Processing column 63/66: SME1MPRT


  CCC: 0.048437, p-value: 1.00e-05
Processing column 63/66: SME1MPRT


[2025-09-25 13:05:27,541 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.181940, p-value: 1.00e-05
[2025-09-25 13:05:27,541 - tissue_RASSF2_whole_blood] INFO: Processing column 64/66: SMNUM5CD
[2025-09-25 13:05:27,541 - tissue_RASSF2_whole_blood] INFO:   Skipping SMNUM5CD: all values are NaN
[2025-09-25 13:05:27,542 - tissue_RASSF2_whole_blood] INFO: Processing column 65/66: SMDPMPRT
[2025-09-25 13:05:27,542 - tissue_RASSF2_whole_blood] INFO:   Skipping SMDPMPRT: only 1 unique value(s)
[2025-09-25 13:05:27,542 - tissue_RASSF2_whole_blood] INFO: Processing column 66/66: SME2PCTS


  CCC: 0.181940, p-value: 1.00e-05
Processing column 64/66: SMNUM5CD
  Skipping SMNUM5CD: all values are NaN
Processing column 65/66: SMDPMPRT
  Skipping SMDPMPRT: only 1 unique value(s)
Processing column 66/66: SME2PCTS


[2025-09-25 13:05:27,796 - tissue_RASSF2_whole_blood] INFO:   CCC: 0.029344, p-value: 1.00e-05
[2025-09-25 13:05:27,798 - tissue_RASSF2_whole_blood] INFO: 
Completed processing whole_blood:
[2025-09-25 13:05:27,798 - tissue_RASSF2_whole_blood] INFO:   Total metadata columns: 66
[2025-09-25 13:05:27,798 - tissue_RASSF2_whole_blood] INFO:   Successful analyses: 44
[2025-09-25 13:05:27,798 - tissue_RASSF2_whole_blood] INFO:   Skipped/Failed: 22
[2025-09-25 13:05:27,821 - summary] INFO: Results for RASSF2 in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_whole_blood_correlation_results.pkl
[2025-09-25 13:05:27,821 - summary] INFO: Log file for RASSF2 in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_whole_blood.log
[2025-09-25 13:05:27,822 - summary] INFO: Runtime for RASSF2 in whole_blood: 9.96 seconds (0.17 minutes)
[2025-09-25 13:05:27,823 - summary] INFO: 
[2025-09-25 13:05:27,823 - summary] INF

  CCC: 0.029344, p-value: 1.00e-05

Completed processing whole_blood:
  Total metadata columns: 66
  Successful analyses: 44
  Skipped/Failed: 22
Results for RASSF2 in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_whole_blood_correlation_results.pkl
Log file for RASSF2 in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_whole_blood.log
Runtime for RASSF2 in whole_blood: 9.96 seconds (0.17 minutes)

COMBINED RESULTS SUMMARY
Gene Symbol: RASSF2
Gene ID: ENSG00000101265.15
Permutations: 100,000
Tissues processed: 1
Combined results saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_all_tissues_correlation_results.pkl
Combined results (CSV) saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/RASSF2_all_tissues_correlation_results.csv

Total successful analyses across all tissues: 44

TOP CORRELATIONS ACROSS ALL TISSUES (by absolute CCC value)
Tissue

[2025-09-25 13:05:28,086 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.000000, p-value: 1.00e+00
[2025-09-25 13:05:28,086 - tissue_CYTIP_whole_blood] INFO: Processing column 2/66: SEX
[2025-09-25 13:05:28,156 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.001409, p-value: 3.98e-01
[2025-09-25 13:05:28,157 - tissue_CYTIP_whole_blood] INFO: Processing column 3/66: AGE
[2025-09-25 13:05:28,228 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.018997, p-value: 1.00e-05
[2025-09-25 13:05:28,228 - tissue_CYTIP_whole_blood] INFO: Processing column 4/66: DTHHRDY


  CCC: 0.000000, p-value: 1.00e+00
Processing column 2/66: SEX
  CCC: 0.001409, p-value: 3.98e-01
Processing column 3/66: AGE
  CCC: 0.018997, p-value: 1.00e-05
Processing column 4/66: DTHHRDY


[2025-09-25 13:05:28,481 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.184226, p-value: 1.00e-05
[2025-09-25 13:05:28,482 - tissue_CYTIP_whole_blood] INFO: Processing column 5/66: SMATSSCR
[2025-09-25 13:05:28,482 - tissue_CYTIP_whole_blood] INFO:   Skipping SMATSSCR: all values are NaN
[2025-09-25 13:05:28,482 - tissue_CYTIP_whole_blood] INFO: Processing column 6/66: SMCENTER
[2025-09-25 13:05:28,551 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.084684, p-value: 1.00e-05
[2025-09-25 13:05:28,552 - tissue_CYTIP_whole_blood] INFO: Processing column 7/66: SMPTHNTS
[2025-09-25 13:05:28,552 - tissue_CYTIP_whole_blood] INFO:   Skipping SMPTHNTS: all values are NaN
[2025-09-25 13:05:28,552 - tissue_CYTIP_whole_blood] INFO: Processing column 8/66: SMRIN


  CCC: 0.184226, p-value: 1.00e-05
Processing column 5/66: SMATSSCR
  Skipping SMATSSCR: all values are NaN
Processing column 6/66: SMCENTER
  CCC: 0.084684, p-value: 1.00e-05
Processing column 7/66: SMPTHNTS
  Skipping SMPTHNTS: all values are NaN
Processing column 8/66: SMRIN


[2025-09-25 13:05:28,806 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.003196, p-value: 5.68e-01
[2025-09-25 13:05:28,806 - tissue_CYTIP_whole_blood] INFO: Processing column 9/66: SMTS
[2025-09-25 13:05:28,807 - tissue_CYTIP_whole_blood] INFO:   Skipping SMTS: only 1 unique value(s)
[2025-09-25 13:05:28,807 - tissue_CYTIP_whole_blood] INFO: Processing column 10/66: SMTSD
[2025-09-25 13:05:28,807 - tissue_CYTIP_whole_blood] INFO:   Skipping SMTSD: only 1 unique value(s)
[2025-09-25 13:05:28,808 - tissue_CYTIP_whole_blood] INFO: Processing column 11/66: SMUBRID
[2025-09-25 13:05:28,808 - tissue_CYTIP_whole_blood] INFO:   Skipping SMUBRID: only 1 unique value(s)
[2025-09-25 13:05:28,808 - tissue_CYTIP_whole_blood] INFO: Processing column 12/66: SMTSISCH


  CCC: 0.003196, p-value: 5.68e-01
Processing column 9/66: SMTS
  Skipping SMTS: only 1 unique value(s)
Processing column 10/66: SMTSD
  Skipping SMTSD: only 1 unique value(s)
Processing column 11/66: SMUBRID
  Skipping SMUBRID: only 1 unique value(s)
Processing column 12/66: SMTSISCH


[2025-09-25 13:05:29,062 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.215092, p-value: 1.00e-05
[2025-09-25 13:05:29,062 - tissue_CYTIP_whole_blood] INFO: Processing column 13/66: SMTSPAX
[2025-09-25 13:05:29,063 - tissue_CYTIP_whole_blood] INFO:   Skipping SMTSPAX: all values are NaN
[2025-09-25 13:05:29,063 - tissue_CYTIP_whole_blood] INFO: Processing column 14/66: SMNABTCH
[2025-09-25 13:05:29,128 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.000304, p-value: 1.00e-05
[2025-09-25 13:05:29,128 - tissue_CYTIP_whole_blood] INFO: Processing column 15/66: SMNABTCHT
[2025-09-25 13:05:29,129 - tissue_CYTIP_whole_blood] INFO:   Skipping SMNABTCHT: only 1 unique value(s)
[2025-09-25 13:05:29,129 - tissue_CYTIP_whole_blood] INFO: Processing column 16/66: SMNABTCHD
[2025-09-25 13:05:29,197 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.000256, p-value: 1.00e-05
[2025-09-25 13:05:29,197 - tissue_CYTIP_whole_blood] INFO: Processing column 17/66: SMGEBTCH
[2025-09-25 13:05:29,258 - tissue_CYTIP_whole_bloo

  CCC: 0.215092, p-value: 1.00e-05
Processing column 13/66: SMTSPAX
  Skipping SMTSPAX: all values are NaN
Processing column 14/66: SMNABTCH
  CCC: 0.000304, p-value: 1.00e-05
Processing column 15/66: SMNABTCHT
  Skipping SMNABTCHT: only 1 unique value(s)
Processing column 16/66: SMNABTCHD
  CCC: 0.000256, p-value: 1.00e-05
Processing column 17/66: SMGEBTCH
  CCC: 0.001533, p-value: 1.00e-05
Processing column 18/66: SMGEBTCHD


[2025-09-25 13:05:29,317 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.002104, p-value: 1.00e-05
[2025-09-25 13:05:29,318 - tissue_CYTIP_whole_blood] INFO: Processing column 19/66: SMGEBTCHT
[2025-09-25 13:05:29,318 - tissue_CYTIP_whole_blood] INFO:   Skipping SMGEBTCHT: only 1 unique value(s)
[2025-09-25 13:05:29,319 - tissue_CYTIP_whole_blood] INFO: Processing column 20/66: SMAFRZE
[2025-09-25 13:05:29,319 - tissue_CYTIP_whole_blood] INFO:   Skipping SMAFRZE: only 1 unique value(s)
[2025-09-25 13:05:29,319 - tissue_CYTIP_whole_blood] INFO: Processing column 21/66: SMGTC
[2025-09-25 13:05:29,320 - tissue_CYTIP_whole_blood] INFO:   Skipping SMGTC: all values are NaN
[2025-09-25 13:05:29,320 - tissue_CYTIP_whole_blood] INFO: Processing column 22/66: SME2MPRT


  CCC: 0.002104, p-value: 1.00e-05
Processing column 19/66: SMGEBTCHT
  Skipping SMGEBTCHT: only 1 unique value(s)
Processing column 20/66: SMAFRZE
  Skipping SMAFRZE: only 1 unique value(s)
Processing column 21/66: SMGTC
  Skipping SMGTC: all values are NaN
Processing column 22/66: SME2MPRT


[2025-09-25 13:05:29,573 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.021744, p-value: 5.00e-05
[2025-09-25 13:05:29,574 - tissue_CYTIP_whole_blood] INFO: Processing column 23/66: SMCHMPRS


  CCC: 0.021744, p-value: 5.00e-05
Processing column 23/66: SMCHMPRS


[2025-09-25 13:05:29,828 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.015946, p-value: 4.50e-04
[2025-09-25 13:05:29,828 - tissue_CYTIP_whole_blood] INFO: Processing column 24/66: SMNTRART


  CCC: 0.015946, p-value: 4.50e-04
Processing column 24/66: SMNTRART


[2025-09-25 13:05:30,082 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.024407, p-value: 3.00e-05
[2025-09-25 13:05:30,083 - tissue_CYTIP_whole_blood] INFO: Processing column 25/66: SMNUMGPS
[2025-09-25 13:05:30,083 - tissue_CYTIP_whole_blood] INFO:   Skipping SMNUMGPS: all values are NaN
[2025-09-25 13:05:30,084 - tissue_CYTIP_whole_blood] INFO: Processing column 26/66: SMMAPRT


  CCC: 0.024407, p-value: 3.00e-05
Processing column 25/66: SMNUMGPS
  Skipping SMNUMGPS: all values are NaN
Processing column 26/66: SMMAPRT


[2025-09-25 13:05:30,338 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.021052, p-value: 7.00e-05
[2025-09-25 13:05:30,339 - tissue_CYTIP_whole_blood] INFO: Processing column 27/66: SMEXNCRT


  CCC: 0.021052, p-value: 7.00e-05
Processing column 27/66: SMEXNCRT


[2025-09-25 13:05:30,593 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.126241, p-value: 1.00e-05
[2025-09-25 13:05:30,593 - tissue_CYTIP_whole_blood] INFO: Processing column 28/66: SM550NRM
[2025-09-25 13:05:30,593 - tissue_CYTIP_whole_blood] INFO:   Skipping SM550NRM: all values are NaN
[2025-09-25 13:05:30,594 - tissue_CYTIP_whole_blood] INFO: Processing column 29/66: SMGNSDTC


  CCC: 0.126241, p-value: 1.00e-05
Processing column 28/66: SM550NRM
  Skipping SM550NRM: all values are NaN
Processing column 29/66: SMGNSDTC


[2025-09-25 13:05:30,847 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.050841, p-value: 1.00e-05
[2025-09-25 13:05:30,847 - tissue_CYTIP_whole_blood] INFO: Processing column 30/66: SMUNMPRT
[2025-09-25 13:05:30,848 - tissue_CYTIP_whole_blood] INFO:   Skipping SMUNMPRT: only 1 unique value(s)
[2025-09-25 13:05:30,848 - tissue_CYTIP_whole_blood] INFO: Processing column 31/66: SM350NRM
[2025-09-25 13:05:30,848 - tissue_CYTIP_whole_blood] INFO:   Skipping SM350NRM: all values are NaN
[2025-09-25 13:05:30,849 - tissue_CYTIP_whole_blood] INFO: Processing column 32/66: SMRDLGTH


  CCC: 0.050841, p-value: 1.00e-05
Processing column 30/66: SMUNMPRT
  Skipping SMUNMPRT: only 1 unique value(s)
Processing column 31/66: SM350NRM
  Skipping SM350NRM: all values are NaN
Processing column 32/66: SMRDLGTH


[2025-09-25 13:05:31,103 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.000003, p-value: 9.42e-01
[2025-09-25 13:05:31,104 - tissue_CYTIP_whole_blood] INFO: Processing column 33/66: SMMNCPB
[2025-09-25 13:05:31,104 - tissue_CYTIP_whole_blood] INFO:   Skipping SMMNCPB: all values are NaN
[2025-09-25 13:05:31,104 - tissue_CYTIP_whole_blood] INFO: Processing column 34/66: SME1MMRT


  CCC: 0.000003, p-value: 9.42e-01
Processing column 33/66: SMMNCPB
  Skipping SMMNCPB: all values are NaN
Processing column 34/66: SME1MMRT


[2025-09-25 13:05:31,359 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.005089, p-value: 2.16e-01
[2025-09-25 13:05:31,359 - tissue_CYTIP_whole_blood] INFO: Processing column 35/66: SMSFLGTH


  CCC: 0.005089, p-value: 2.16e-01
Processing column 35/66: SMSFLGTH


[2025-09-25 13:05:31,614 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.015322, p-value: 7.10e-04
[2025-09-25 13:05:31,615 - tissue_CYTIP_whole_blood] INFO: Processing column 36/66: SMESTLBS
[2025-09-25 13:05:31,615 - tissue_CYTIP_whole_blood] INFO:   Skipping SMESTLBS: only 1 unique value(s)
[2025-09-25 13:05:31,616 - tissue_CYTIP_whole_blood] INFO: Processing column 37/66: SMMPPD


  CCC: 0.015322, p-value: 7.10e-04
Processing column 36/66: SMESTLBS
  Skipping SMESTLBS: only 1 unique value(s)
Processing column 37/66: SMMPPD


[2025-09-25 13:05:31,870 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007147, p-value: 5.32e-02
[2025-09-25 13:05:31,870 - tissue_CYTIP_whole_blood] INFO: Processing column 38/66: SMNTERRT


  CCC: 0.007147, p-value: 5.32e-02
Processing column 38/66: SMNTERRT


[2025-09-25 13:05:32,125 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.023433, p-value: 4.00e-05
[2025-09-25 13:05:32,126 - tissue_CYTIP_whole_blood] INFO: Processing column 39/66: SMRRNANM


  CCC: 0.023433, p-value: 4.00e-05
Processing column 39/66: SMRRNANM


[2025-09-25 13:05:32,379 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.005677, p-value: 1.45e-01
[2025-09-25 13:05:32,379 - tissue_CYTIP_whole_blood] INFO: Processing column 40/66: SMRDTTL


  CCC: 0.005677, p-value: 1.45e-01
Processing column 40/66: SMRDTTL


[2025-09-25 13:05:32,632 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.008033, p-value: 2.92e-02
[2025-09-25 13:05:32,633 - tissue_CYTIP_whole_blood] INFO: Processing column 41/66: SMVQCFL


  CCC: 0.008033, p-value: 2.92e-02
Processing column 41/66: SMVQCFL


[2025-09-25 13:05:32,887 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.003136, p-value: 6.00e-01
[2025-09-25 13:05:32,888 - tissue_CYTIP_whole_blood] INFO: Processing column 42/66: SMMNCV
[2025-09-25 13:05:32,889 - tissue_CYTIP_whole_blood] INFO:   Skipping SMMNCV: all values are NaN
[2025-09-25 13:05:32,889 - tissue_CYTIP_whole_blood] INFO: Processing column 43/66: SMTRSCPT


  CCC: 0.003136, p-value: 6.00e-01
Processing column 42/66: SMMNCV
  Skipping SMMNCV: all values are NaN
Processing column 43/66: SMTRSCPT


[2025-09-25 13:05:33,142 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.051533, p-value: 1.00e-05
[2025-09-25 13:05:33,143 - tissue_CYTIP_whole_blood] INFO: Processing column 44/66: SMMPPDPR


  CCC: 0.051533, p-value: 1.00e-05
Processing column 44/66: SMMPPDPR


[2025-09-25 13:05:33,397 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.005880, p-value: 1.29e-01
[2025-09-25 13:05:33,397 - tissue_CYTIP_whole_blood] INFO: Processing column 45/66: SMCGLGTH
[2025-09-25 13:05:33,397 - tissue_CYTIP_whole_blood] INFO:   Skipping SMCGLGTH: all values are NaN
[2025-09-25 13:05:33,398 - tissue_CYTIP_whole_blood] INFO: Processing column 46/66: SMGAPPCT
[2025-09-25 13:05:33,398 - tissue_CYTIP_whole_blood] INFO:   Skipping SMGAPPCT: all values are NaN
[2025-09-25 13:05:33,398 - tissue_CYTIP_whole_blood] INFO: Processing column 47/66: SMUNPDRD
[2025-09-25 13:05:33,399 - tissue_CYTIP_whole_blood] INFO:   Skipping SMUNPDRD: only 1 unique value(s)
[2025-09-25 13:05:33,399 - tissue_CYTIP_whole_blood] INFO: Processing column 48/66: SMNTRNRT


  CCC: 0.005880, p-value: 1.29e-01
Processing column 45/66: SMCGLGTH
  Skipping SMCGLGTH: all values are NaN
Processing column 46/66: SMGAPPCT
  Skipping SMGAPPCT: all values are NaN
Processing column 47/66: SMUNPDRD
  Skipping SMUNPDRD: only 1 unique value(s)
Processing column 48/66: SMNTRNRT


[2025-09-25 13:05:33,653 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.261762, p-value: 1.00e-05
[2025-09-25 13:05:33,653 - tissue_CYTIP_whole_blood] INFO: Processing column 49/66: SMMPUNRT


  CCC: 0.261762, p-value: 1.00e-05
Processing column 49/66: SMMPUNRT


[2025-09-25 13:05:33,907 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.021052, p-value: 7.00e-05
[2025-09-25 13:05:33,907 - tissue_CYTIP_whole_blood] INFO: Processing column 50/66: SMEXPEFF


  CCC: 0.021052, p-value: 7.00e-05
Processing column 50/66: SMEXPEFF


[2025-09-25 13:05:34,163 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.086945, p-value: 1.00e-05
[2025-09-25 13:05:34,163 - tissue_CYTIP_whole_blood] INFO: Processing column 51/66: SMMPPDUN


  CCC: 0.086945, p-value: 1.00e-05
Processing column 51/66: SMMPPDUN


[2025-09-25 13:05:34,419 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007147, p-value: 5.32e-02
[2025-09-25 13:05:34,419 - tissue_CYTIP_whole_blood] INFO: Processing column 52/66: SME2MMRT


  CCC: 0.007147, p-value: 5.32e-02
Processing column 52/66: SME2MMRT


[2025-09-25 13:05:34,674 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.004187, p-value: 3.68e-01
[2025-09-25 13:05:34,675 - tissue_CYTIP_whole_blood] INFO: Processing column 53/66: SME2ANTI


  CCC: 0.004187, p-value: 3.68e-01
Processing column 53/66: SME2ANTI


[2025-09-25 13:05:34,929 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007334, p-value: 4.67e-02
[2025-09-25 13:05:34,930 - tissue_CYTIP_whole_blood] INFO: Processing column 54/66: SMALTALG


  CCC: 0.007334, p-value: 4.67e-02
Processing column 54/66: SMALTALG


[2025-09-25 13:05:35,186 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.038381, p-value: 1.00e-05
[2025-09-25 13:05:35,187 - tissue_CYTIP_whole_blood] INFO: Processing column 55/66: SME2SNSE


  CCC: 0.038381, p-value: 1.00e-05
Processing column 55/66: SME2SNSE


[2025-09-25 13:05:35,441 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.006734, p-value: 7.08e-02
[2025-09-25 13:05:35,442 - tissue_CYTIP_whole_blood] INFO: Processing column 56/66: SMMFLGTH


  CCC: 0.006734, p-value: 7.08e-02
Processing column 56/66: SMMFLGTH


[2025-09-25 13:05:35,696 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.010863, p-value: 4.63e-03
[2025-09-25 13:05:35,696 - tissue_CYTIP_whole_blood] INFO: Processing column 57/66: SME1ANTI


  CCC: 0.010863, p-value: 4.63e-03
Processing column 57/66: SME1ANTI


[2025-09-25 13:05:35,950 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007210, p-value: 5.04e-02
[2025-09-25 13:05:35,951 - tissue_CYTIP_whole_blood] INFO: Processing column 58/66: SMSPLTRD


  CCC: 0.007210, p-value: 5.04e-02
Processing column 58/66: SMSPLTRD


[2025-09-25 13:05:36,208 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.030117, p-value: 1.00e-05
[2025-09-25 13:05:36,208 - tissue_CYTIP_whole_blood] INFO: Processing column 59/66: SMBSMMRT


  CCC: 0.030117, p-value: 1.00e-05
Processing column 59/66: SMBSMMRT


[2025-09-25 13:05:36,464 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.004293, p-value: 3.48e-01
[2025-09-25 13:05:36,465 - tissue_CYTIP_whole_blood] INFO: Processing column 60/66: SME1SNSE


  CCC: 0.004293, p-value: 3.48e-01
Processing column 60/66: SME1SNSE


[2025-09-25 13:05:36,719 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007285, p-value: 4.87e-02
[2025-09-25 13:05:36,719 - tissue_CYTIP_whole_blood] INFO: Processing column 61/66: SME1PCTS


  CCC: 0.007285, p-value: 4.87e-02
Processing column 61/66: SME1PCTS


[2025-09-25 13:05:36,973 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.004663, p-value: 2.81e-01
[2025-09-25 13:05:36,973 - tissue_CYTIP_whole_blood] INFO: Processing column 62/66: SMRRNART


  CCC: 0.004663, p-value: 2.81e-01
Processing column 62/66: SMRRNART


[2025-09-25 13:05:37,228 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.013729, p-value: 1.10e-03
[2025-09-25 13:05:37,229 - tissue_CYTIP_whole_blood] INFO: Processing column 63/66: SME1MPRT


  CCC: 0.013729, p-value: 1.10e-03
Processing column 63/66: SME1MPRT


[2025-09-25 13:05:37,483 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.021952, p-value: 4.00e-05
[2025-09-25 13:05:37,483 - tissue_CYTIP_whole_blood] INFO: Processing column 64/66: SMNUM5CD
[2025-09-25 13:05:37,484 - tissue_CYTIP_whole_blood] INFO:   Skipping SMNUM5CD: all values are NaN
[2025-09-25 13:05:37,484 - tissue_CYTIP_whole_blood] INFO: Processing column 65/66: SMDPMPRT
[2025-09-25 13:05:37,485 - tissue_CYTIP_whole_blood] INFO:   Skipping SMDPMPRT: only 1 unique value(s)
[2025-09-25 13:05:37,485 - tissue_CYTIP_whole_blood] INFO: Processing column 66/66: SME2PCTS


  CCC: 0.021952, p-value: 4.00e-05
Processing column 64/66: SMNUM5CD
  Skipping SMNUM5CD: all values are NaN
Processing column 65/66: SMDPMPRT
  Skipping SMDPMPRT: only 1 unique value(s)
Processing column 66/66: SME2PCTS


[2025-09-25 13:05:37,739 - tissue_CYTIP_whole_blood] INFO:   CCC: 0.007812, p-value: 3.38e-02
[2025-09-25 13:05:37,740 - tissue_CYTIP_whole_blood] INFO: 
Completed processing whole_blood:
[2025-09-25 13:05:37,741 - tissue_CYTIP_whole_blood] INFO:   Total metadata columns: 66
[2025-09-25 13:05:37,741 - tissue_CYTIP_whole_blood] INFO:   Successful analyses: 44
[2025-09-25 13:05:37,741 - tissue_CYTIP_whole_blood] INFO:   Skipped/Failed: 22
[2025-09-25 13:05:37,765 - summary] INFO: Results for CYTIP in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_whole_blood_correlation_results.pkl
[2025-09-25 13:05:37,765 - summary] INFO: Log file for CYTIP in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_whole_blood.log
[2025-09-25 13:05:37,766 - summary] INFO: Runtime for CYTIP in whole_blood: 9.93 seconds (0.17 minutes)
[2025-09-25 13:05:37,767 - summary] INFO: 
[2025-09-25 13:05:37,767 - summary] INFO: COMBINE

  CCC: 0.007812, p-value: 3.38e-02

Completed processing whole_blood:
  Total metadata columns: 66
  Successful analyses: 44
  Skipped/Failed: 22
Results for CYTIP in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_whole_blood_correlation_results.pkl
Log file for CYTIP in whole_blood saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_whole_blood.log
Runtime for CYTIP in whole_blood: 9.93 seconds (0.17 minutes)

COMBINED RESULTS SUMMARY
Gene Symbol: CYTIP
Gene ID: ENSG00000115165.9
Permutations: 100,000
Tissues processed: 1
Combined results saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_all_tissues_correlation_results.pkl
Combined results (CSV) saved to: /mnt/data/proj_data/ccc-gpu/data/tutorial/metadata_correlations/CYTIP_all_tissues_correlation_results.csv

Total successful analyses across all tissues: 44

TOP CORRELATIONS ACROSS ALL TISSUES (by absolute CCC value)
Tissue         

In [None]:
# You can find the results in the `METADATA_CORRELATIONS_RESULT_DIR` directory
os.listdir(METADATA_CORRELATIONS_RESULT_DIR)

['RASSF2_all_tissues_correlation_results.csv',
 'CYTIP_whole_blood.log',
 'RASSF2_whole_blood_correlation_results.pkl',
 'RASSF2_all_tissues_correlation_results.pkl',
 'RASSF2_whole_blood.log',
 'CYTIP_all_tissues_correlation_results.pkl',
 'CYTIP_whole_blood_correlation_results.pkl',
 '_all_genes_all_tissues_correlation_results.csv',
 '_RASSF2_CYTIP_summary_tables.log',
 '_RASSF2_CYTIP_summary_execution.log',
 'CYTIP_all_tissues_correlation_results.csv',
 '_all_genes_all_tissues_correlation_results.pkl']