# 02: Cluster analysis: DBSCAN

**Author:** Grace Akatsu

**Class:** CPBS 7602, Fall 2025

---
## Overview
This notebook performs cluster analysis on the GTEx data from notebook 01, using DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering.

## Table of Contents
*   [Import libraries](#import_libraries)
*   [Set paths and seed](#set_paths)
*   [Read in data](#read_data)
*   [Parameter tuning: consensus](#read_data)
---

## Import libraries <a class="anchor" id="import_libraries"></a>

In [None]:
import os
import numpy as np
import pandas as pd

## Set paths and seed <a class="anchor" id="set_paths"></a>

In [2]:
DATA_FILE = "/Users/akatsug/OneDrive - The University of Colorado Denver/CPBS_7602_big_data_in_biomedical_informatics/assignment01/clean_data/gtex_top10_tissues_top5000_variable_genes_standardized.csv"
DBSCAN_OUTPUTS = "/Users/akatsug/OneDrive - The University of Colorado Denver/CPBS_7602_big_data_in_biomedical_informatics/assignment01/DBSCAN_outputs"

os.makedirs(DBSCAN_OUTPUTS, exist_ok=True)

In [None]:
np.random.seed(0)

## Read in data <a class="anchor" id="read_data"></a>

In [None]:
data = pd.read_csv(
    DATA_FILE,
    index_col="SAMPID"
)

data.head()

Unnamed: 0,SAMPID,Tissue,ENSG00000244734.3,ENSG00000188536.12,ENSG00000198804.2,ENSG00000198938.2,ENSG00000163220.10,ENSG00000198899.2,ENSG00000198886.2,ENSG00000198712.1,...,ENSG00000261236.7,ENSG00000188112.8,ENSG00000170035.15,ENSG00000024862.17,ENSG00000213619.9,ENSG00000176087.14,ENSG00000115596.3,ENSG00000138386.16,ENSG00000182872.15,ENSG00000070669.16
0,GTEX-1117F-0226-SM-5GZZ7,Adipose - Subcutaneous,-0.317682,-0.320624,-0.822484,-0.623593,-0.278153,-0.845429,-0.920044,-0.94437,...,-0.259958,0.087855,0.714803,0.76994,0.143898,-0.47663,-0.245165,-0.547999,-0.027921,-0.037272
1,GTEX-1117F-0426-SM-5EGHI,Muscle - Skeletal,-0.319849,-0.32218,0.219063,1.660759,-0.278587,1.28664,0.307966,0.663645,...,1.470513,-0.589025,-0.033767,-0.57209,0.467826,-0.715005,-0.381079,-1.292842,-0.339644,-1.050707
2,GTEX-1117F-0526-SM-5EGHJ,Artery - Tibial,-0.31943,-0.321842,-0.872736,-0.647149,-0.278998,-0.710659,-0.839425,-0.911312,...,-0.509545,-0.527738,-0.000939,-0.229877,-0.396138,-1.214898,-0.315997,-0.447965,1.078165,0.048887
3,GTEX-1117F-2926-SM-5GZYI,Skin - Not Sun Exposed (Suprapubic),-0.320043,-0.321507,-0.525812,-0.609139,-0.276681,-0.63397,-0.646396,-0.870145,...,-0.344541,0.708257,-0.008799,0.094763,-0.049997,-0.492368,-0.070677,1.017824,0.276392,0.139678
4,GTEX-111CU-0226-SM-5GZXC,Thyroid,-0.321842,-0.323617,-0.064829,0.022578,-0.267405,0.605461,0.046808,0.438473,...,0.941758,0.128075,0.421663,0.986367,0.335941,1.52109,-0.36826,-0.127483,0.448233,5.554269
