# Description
This notebook demonstrates:

1. how to compute coefficients values
2. how to correlate gene expression data with categorical metadata

using CCC GPU with public data from GTEx v8.

Please follow the instructions in the [README](../README.md), section "Quick Install with pip" to install CCC-GPU with a conda environment `ccc-gpu-env`.

Then activate the environment and start the jupyter notebook server in order to run this notebook.

```bash
conda activate ccc-gpu-env
pip install notebook
jupyter notebook
```

In [1]:
import pandas as pd
from tqdm import tqdm

from ccc.utils import simplify_string
from ccc import conf

In [2]:
# Set this path to the directory where you want to save the intermediate data and results
ANALYSIS_DIR = "/mnt/data/proj_data/ccc-gpu/data/tutorial"

## Data Fetching and Preprocessing
This section downloads the public GTEx v8 gene TPMs data (https://www.gtexportal.org/home/downloads/adult-gtex/bulk_tissue_expression) and preprocesses it for the analysis.

In [None]:
import os
import urllib.request
from pathlib import Path

# Create analysis directory if it doesn't exist
os.makedirs(ANALYSIS_DIR, exist_ok=True)

# Download GTEx v8 gene TPM data
gtex_url = "https://storage.googleapis.com/adult-gtex/bulk-gex/v8/rna-seq/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz"
gtex_file = Path(ANALYSIS_DIR) / "GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz"

if not gtex_file.exists():
    print(f"Downloading GTEx v8 gene TPM data to {gtex_file}")
    urllib.request.urlretrieve(gtex_url, gtex_file)
    print("Download completed!")
else:
    print(f"GTEx data already exists at {gtex_file}")
