[Step 3] Database structure
Mikhail Dozmorov edited this page Feb 24, 2016
·
1 revision
The database structure is organized to simplify navigation and selection cell type-specific (categories of) regulatory datasets.
The ENCODE data is organized using [category]/([subcategory])/[tier]
schema.
- The DNase, Histone, TFBS_cellspecific categories contain corresponding cell type-specific regulatory data. The TFBS_combined category contains the non-cell type-specific summary of binding of 161 transcription factors. The [chromStates](ENCODE chromStates) category contains cell type-specific chromatin states obtained using different methods.
- The
tier
system reflects [cell type specificity and quality](ENCODE cell types) of the data.
The Roadmap Epigenomics data follows the [category]/[cell/tissue type]
schema.
- The DNase/Histone categories contain corresponding cell type-specific regulatory data. The _bPk/_gPk/_nPk suffixes correspond to peaks called using broad/gapped/narrow peaks settings, respectively. See c. Peak Calling section for more details. We recommend using _bPk data. The processed/imputed suffixes correspond to experimentally obtained/computationally imputed regulatory data, respectively. See Imputed signal tracks for more details. We recommend using processed data.
- The
cell/tissue type
system organizes data derived from [general anatomical categories](Roadmap cell types).
The file names generally follow [cell]-[factor]-[category]
schema to quickly identify regulatory datasets without the need to consult detailed description.