Skip to content
zouzhaonan edited this page Apr 17, 2024 · 95 revisions

ChIP-Atlas / Documents

Documents for computational processing in ChIP-Atlas.

Table of Contents

  1. Data source
  2. Primary processing
  3. Data Annotation
  4. Peak Browser
  5. Target Genes
  6. Colocalization
  7. Enrichment Analysis
  8. Diff Analysis
  9. Downloads
  10. External Genome Browser

1. Data source

Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:

  • LIBRARY STRATEGY == ChIP-Seq, ATAC-Seq, DNase-Hypersensitivity, or Bisulfite-Seq
  • LIBRARY_SOURCE == GENOMIC
  • taxonomy_name == Homo sapiens, Mus musculus, Rattus norvegicus, Caenorhabditis elegans, Drosophila melanogaster, or Saccharomyces cerevisiae
  • INSTRUMENT_MODEL ~ Illumina, NextSeq or HiSeq

2. Primary processing

Introduction

Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.

Methods

  1. Binarized sequence raw data (.sra) for each SRX were downloaded and decoded into Fastq format with the fastq-dump command of SRA Toolkit (ver 2.3.2-4) with a default mode, except paired-end reads, which were decoded with the --split-files option. In an SRX including multiple runs, decoded Fastq files were concatenated into a single one.

  2. Fastq files were then aligned with Bowtie 2 (ver 2.2.2) with a default mode, except paired-end reads, for which two Fastq files were specified with -1 and -2 options. The following genome assemblies were used for the alignment and subsequent processing:

    • hg38, hg19 (H. sapiens)
    • mm10, mm9 (M. musculus)
    • rn6 (R. norvegicus)
    • dm6, dm3 (D. melanogaster)
    • ce11, ce10 (C. elegans)
    • sacCer3 (S. cerevisiae)
  3. Resultant SAM-formatted files were then binarized into BAM format with SAMtools (ver 0.1.19; samtools view) and sorted (samtools sort) before removing PCR duplicates (samtools rmdup).

  4. BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0; genomeCoverageBed) in RPM (Reads Per Million mapped reads) units with -scale 1000000/N option, where N is mapped read counts after removing PCR duplicates as shown in section 3.

  5. BedGraph files were binarized into BigWig format with UCSC bedGraphToBigWig tool (ver 4). BAM files made in (3) were used to peak-call with MACS2 (ver 2.1.0; macs2 callpeak) in BED4 format. Options for Q-value threshold were set (-q 1e-05, 1e-10, or 1e-20), with the options for genome sizes as follows:

    • hg38, hg19: -g hs
    • mm10, mm9: -g mm
    • rn6: -g 2.15e9
    • dm6, dm3: -g dm
    • ce11, ce10: -g ce
    • sacCer3: -g 12100000

    Each row in the BED4 files includes the genomic location in the 1st to 3rd columns and MACS2 score (-10*Log10[MACS2 Q-value]) in the 4th column.

  6. BED4 files were binarized into BigBed format with UCSC bedToBigBed tool (ver 2.5).

3. Data Annotation

Introduction

Experimental materials used for each SRX were manually annotated to allow for extracting data via keywords for track types and cell types.

Methods

  1. Sample metadata for all SRXs (biosample_set.xml) were downloaded from the NCBI FTP site to extract the attributes for antigens and antibodies (see here) as well as cell types and tissues (see here).

  2. According to the attribute values ascribed to each SRX, antigens and cell types used were manually annotated by curators who have been fully trained on molecular and developmental biology. Each annotation has a ‘Class’ and ‘Subclass’ as shown in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema). 3.| Guidelines for antigens annotation:

    • Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
    • Gene-encoded proteins
      • Gene symbols were recorded according to the following gene nomenclature databases:

        • HGNC (H. sapiens)
        • MGI (M. musculus)
        • RGD (R. norvegicus)
        • FlyBase (D. melanogaster)
        • WormBase (C. elegans)
        • SGD (S. cerevisiae) (e.g., OCT3/4 → POU5F1; p53 → TP53)
      • Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)

      • If an antibody recognizes multiple molecules in a family, the first in an ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)

  3. Criteria for cell types annotation:

    • H. sapiens, M. musculus and R. norvegicus Cell types were mainly classified by the tissues derived from. ES and iPS cells were exceptionally classified in ‘Pluripotent stem cell’ class.
    Cell-type class Cell type
    Blood K-562; CD4-Positive T-Lymphocytes
    Breast MCF-7; T-47D
    Pluripotent stem cell hESC H1; iPS cells
    • D. melanogaster Cell types were mainly classified by cell lines and developmental stages.
    • C. elegans Mainly classified by developmental stages.
    • S. cerevisiae Classified by yeast strains.
    • Standardized Nomenclatures Nomenclatures of cell lines and tissue names were standardized according to the following frameworks and resources:
      • Supplementary Table S2 in Yu et. al 2015 (PMID: 25877200), proposing unified cell-line names
      • ATCC, a nonprofit repository of cell lines
      • MeSH (Medical Subject Headings) for tissue names
      • FlyBase for cell lines of D. melanogaster (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
  4. Antigens or cell types were classified in ‘Uncategorized’ class if the curators could not understand attribute values.

  5. Antigens or cell types were classified in ‘No description’ class if there was no attribute value.

4. Peak Browser

ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-calls, such as transcription factors and histone modifications, along with ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This is useful for predicting cis-regulatory elements, as well as to find regulatory proteins and the epigenetic status of given regions. BED4-formatted peak-call data from 2.5 were concatenated and converted to BED9 + GFF3 format to browse on genome browser IGV. The BED9 files can be downloaded from Peak Browser web site, and the table schema is as follows:

Column Description Example
Header Track name and link URL (Strings)
Column 1 Chromosome chr12
Column 2 Begin 1234
Column 3 End 5678
Column 4* Sample metadata (Strings)
Column 5 -10Log10(MACS2 Q-value) 345
Column 6 . .
Column 7 Begin (= Column 2) 1234
Column 8 End (= Column 3) 5678
Column 9** Color code 255,61,0
  • *Column 4 Sample metadata described in GFF3 format to show annotated antigens and cell types on IGV. Furthermore, mousing over a peak displays accession number, title, and all attribute values described in Biosample metadata for the SRX.
  • **Column 9 Heatmap color codes for Column 5. (If Column 5 is 0, 500, or 1000, then colors are blue, green, or red, respectively.)

To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” section of 8. Downloads chapter.

Annotation tracks

In addition to browsing the above experimental tracks, users can use the “Annotation Tracks” in the Peak Browser tool to visualize functional annotations in the genomic regions of their interest. Available annotation tracks are as follows:

Genome hg38 hg19 mm10 mm9 rn6 dm6 dm3 ce11 ce10 sacCer3
ENCODE Hi-C
GTEx eQTL
ChromHMM
FANTOM5 enhancers
JASPAR TF motif
GWAS Catalog
ClinVar
Orphanet
MGI Phenotype
PhastCons
RepeatMasker
RNA-seq1,2,3,4
Ensembl genes
GENCODE genes
ENCODE Blacklist
CpG Islands

5. Target Genes

Introduction

The ChIP-Atlas Target Genes feature predicts genes directly regulated by given proteins, based on binding profiles of all public ChIP-seq data for particular gene loci. Target genes were accepted if the peak-call intervals of a given protein overlapped with a transcription start site (TSS) ± N kb (N = 1, 5, or 10).

Methods

  1. Peak-call data: BED4-formatted peak-call data of each SRX made in section 2.5 were used (MACS2 Q-value < 1E-05; antigen class = ‘TFs and others’).
  2. Preparation of TSS library: Location of TSSs and gene symbols were according to refFlat files (at UCSC FTP site); only protein-coding genes were used for this analysis.
  3. Preparation of STRING library: STRING is a comprehensive database recording protein-protein and protein-gene interactions based on experimental evidence. A file describing all interactions was downloaded from protein.actions.v10.txt.gz, and the protein IDs were converted to gene symbols with protein.aliases.v10.txt.gz.
  4. Processing: bedtools window command (bedtools ver 2.17.0) was used to search target genes from peak-call data (5.1) from the TSS library (5.2) with a window size option (-w 1000, 5000, or 10000). Peak-call data of the same antigens were collected, and MACS2 scores (-10*Log10[MACS2 Q-value]) were indicated as heatmap colors on the web browser (MACS2 score = 0, 500, 1000 → color = blue, green, red) (see example). If a gene intersected with multiple peaks of a single SRX, the highest MACS2 score was chosen for the color indication. The ‘Average’ column at the far left of the table shows the means of the MACS2 scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-gene interaction according to STRING library (5.3). For more details, protein-gene pairs in protein.actions.v10.txt.gz file were extracted when meeting the following conditions:
    • 1st column (item_id_a) == Query antigen
    • 2nd column (item_id_b) == Target gene
    • 3rd column (mode) == "expression"
    • 5th column (a_is_acting) == "1"

6. Colocalization

Introduction

Many transcription factors (TFs) form complexes to promote or enhance transcriptional activity (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). ChIP-seq profiles of such TFs are often similar, showing colocalization on multiple genomic regions. The ChIP-Atlas Colocalization predicts colocalization partners of given TFs, evaluated through comprehensive and combinatorial similarity analyses of all public ChIP-seq data.

Algorithms

BED4-formatted peak-call data made in section 2.5 were analyzed to evaluate the similarities to other peak-call data in identical cell-type classes. Their similarities were analyzed with CoLo, a tool to evaluate the colocalization of transcription factors (TFs) with multiple ChIP-seq peak-call data. Advantages of CoLo are:

(a) it compensates for biases derived from different experimental conditions. (b) it adjusts the difference of the peak numbers and distributions coming from innate characteristics of the TFs.

The function (a) is programed so that MACS2 scores in each BED4 file were fitted to a Gaussian distribution, dividing the BED4 files into three groups:

  • H (High binding; Z-score > 0.5)
  • M (Middle binding; -0.5 ≤ Z-score ≤ 0.5)
  • L (Low binding; Z-score < -0.5)

These three groups are used as independent data to evaluate similarity through the function (b). Thus, CoLo evaluates the similarity of two SRXs (e.g., SRX_1 and SRX_2) with nine combinations:

[H/M/L of SRX_1] x [H/M/L of SRX_2]

Eventually, a set of nine Boolean results (similar or not) is returned to indicate the similarity of SRX_1 and SRX_2.

Methods

  1. Peak-call data: Same as (5.1).
  2. STRING library: Same as (5.2).
  3. Processing: Peak-call data in identical cell-type classes were processed through CoLo. The scores between the two BED files were calculated by multiplication of the combination of the H (= 3), M (= 2), or L (= 1) as follows:
SRX_1 SRX_2 Scores
H H 9
H M 6
H L 3
M H 6
M M 4
M L 2
L H 3
L M 2
L L 1

If multiple H/M/L combinations were returned from SRX_1 and SRX_2, the highest score was adopted. The scores (1 to 9) were colored in blue, green to red, and gray if all nine H/M/L combinations were false (see example). The ‘Average’ column on the far left of the table shows the means of the CoLo scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-protein interaction (6.2). For more detail, protein-protein pairs in protein.actions.v10.txt.gz file were extracted if meeting all the following conditions:

  • 1st column (item_id_a) == query antigen
  • 2nd column (item_id_b) == co-association partner
  • 3rd column (mode) == "binding"

7. Enrichment Analysis

Introduction

ChIP-Atlas Enrichment Analysis accepts users’ data in the following three formats:

  • Genomic regions in BED format (to search proteins bound to the regions)
  • Sequence motif (to search proteins bound to the motif)
  • Gene list (to search proteins bound to the genes)

In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:

Data in panel 4. Data in panel 5. Aims and analyses
BED Random permutation Proteins bound to BED intervals more often than by chance.
BED BED Proteins differentially bound between the two sets of BED intervals.
Motif Random permutation Proteins bound to a sequence motif more often than by chance.
Motif Motif Proteins differentially bound between the two motifs.
Genes RefSeq coding genes Proteins bound to genes more often than other RefSeq genes.
Genes Genes Proteins differentially bound between the two sets of gene lists.

Requirements and acceptable data

  • Reference peak-call data (upper panels (1 to 3) of the submission form): Comprehensive peak-call data as described above (4. Peak browser). The result will be returned more quickly if the classes of antigens and cell-types are specified.

  • BED (lower panels (4 and 5) of the submission form): UCSC BED format, minimally requiring three tab-delimited columns describing chromosome, and starting and ending positions.

    chr1<tab>1435385<tab>1436458
    chrX<tab>4634643<tab>4635798

    A header and column 4 or later can be included, but they are ignored for the analysis. BE CAREFUL that only BED files in the following genome assemblies are acceptable:

    • hg38, hg19 (H. sapiens)
    • mm10, mm9 (M. musculus)
    • rn6 (R. norvegicus)
    • dm6, dm3 (D. melanogaster)
    • ce11, ce10 (C. elegans)
    • sacCer3 (S. cerevisiae)

    If the BED file is in other genome assembly, convert it to a suitable one with UCSC liftOver tool.

  • Motif (lower panels (4 and 5) of the submission form): A sequence motif described in IUPAC nucleic acid notation. In addition to normal codes (ATGC), ambiguity codes are also acceptable (WSMKRYBDHVN).

  • Gene list (lower panels (4 and 5) of the submission form): Gene symbols must be entered according to following nomenclatures:

    (e.g., OCT3/4 → POU5F1; p53 → TP53)
    If the gene lists are described using any other format (e.g., Gene IDs in Refseq or Emsemble format), use a batch conversion tool such as DAVID (Convert into OFFICIAL_GENE_SYMBOL with Gene ID Conversion Tool).

Methods

  1. Submitted data are converted to BED files depending on the data types.
  • BED Submitted BED files are used only for further processing. If ‘Random permutation’ is set for the comparison, the submitted BED intervals are permuted on a random chromosome at a random position for specified times with bedtools shuffle command (bedtools; ver 2.17.0).

  • Motif Genomic locations perfectly matching to submitted sequence are searched by Bowtie (ver 0.12.8) and converted to BED format. If ‘Random permutation’ is set for the comparison, the BED is used for random permutation as described above.

  • Gene list Unique TSSs of submitted genes are defined with xxxCanonical.txt.gz* library distributed from UCSC FTP site. * xxx is a placeholder for:

    • “known” (H. sapiens and M. musculus)
    • “flyBase” (D. melanogaster)
    • “sanger” (C. elegans)
    • “sgd” (S. cerevisiae)

    Unique TSSs of R. norvegicus genes are defined with a gene list distributed from RGD.

    The locations of TSSs are converted to BED format with the addition of widths specified in ‘Distance range from TSS’ on the submission form. If ‘RefSeq coding gene’ is set for the comparison, RefSeq coding genes excluding those in submitted list are processed to BED format as mentioned above.

  1. The overlaps between the BED (originated from panels 4 and 5 of the submission form) and reference peak-call data (specified on upper panels 1 to 3 of the submission form) are counted with bedtools intersect command (BedTools2; ver 2.23.0).
  2. P-values are calculated with two-tailed Fisher’s exact probability test (see example). The null hypothesis is that the intersection of reference peaks with submitted data in panel 4 occurs in the same proportion to those with data in pannel 5 of the submission form. Q-values are calculated with the Benjamini & Hochberg method.
  3. Fold enrichment is calculated by (column 6) / (column 7) of of the same row. If the ratio > 1, the rightmost column is ‘TRUE’, meaning that the proteins from column 3 binds to the data of panel 4 in a greater proportion than to those of panel 5 specified in the submission form.

API of Enrichment Analysis

An API is available to perform Enrichment Analysis programmatically. Please see here for the details.

8. Diff Analysis

Introduction

The Diff Analysis tool can be used to identify differential peak regions (DPRs) or differentially methylated regions (DMRs) from two given sets of ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.

Requirements and acceptable data

Experiment type (panel (1) of the submission form):

  • ChIP/ATAC/DNase-seq: to obtain DPRs from given two sets of ChIP-seq, ATAC-seq, or DNase-seq data.
  • Bisulfite-Seq: to obtain DMRs from given two sets of Bisulfite-seq data.

Experiment ID(s) (panels (2 and 3) of the submission form):

  • ID(s) of NCBI, ENA, and DDBJ (e.g., SRX18419259, ERX1103210, DRX335588)
  • ID(s) of NCBI GEO (e.g., GSM6765200)
  • Publicly accessible URL(s) of data on custom web server
    • ChIP/ATAC/DNase-seq: The URLs of raw read coverage (UCSC bigWig) in integer values and peak-call (UCSC BED) data, and total number of mapped reads. Try with the following example in hg38.
      • Dataset A (tab-separated)
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed	205201674
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed	208332830
      • Dataset B (tab-separated)
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_B1.bw	https://chip-atlas.dbcls.jp/data/manual/example/sample_B1.bed	87753996
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_B2.bw	https://chip-atlas.dbcls.jp/data/manual/example/sample_B2.bed	90562744
    • Bisulfite-seq: BigWig data of methylation rate (between 0 and 1). Try with the following example in rn6.
      • Dataset A
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw
      • Dataset B
        https://chip-atlas.dbcls.jp/data/manual/examples/sample_B3.bw
    The Dataset Search tool is useful to find the IDs of your interest by keyword search. NCBI GEO is also useful for finding the IDs of interest within an identical series of experiments.

Methods

DPR detection

  1. Alignment data (BigWig) and peak calling data (BED) for ChIP-seq, ATAC-seq, and DNase-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.

  2. BigWig formats were converted to bedGraph format for each query experiment while detecting DPRs from ChIP-seq, ATAC-seq, or DNase-seq datasets.

  3. The raw sequence read coverage is reconstructed with reference to the total mapped sequencing read count of the experiment.

  4. The entire genome is fragmented based on the peak calling data from the query experiments.

  5. The number of sequence reads aligning with each genome fragment is aggregated and the result is organized into an m × n matrix, with m representing the number of genome fragments and n representing the number of query experiments.

  6. The matrix is entered into the R package edgeR, and the difference in read counts between the two sets of query experiments is assessed for each genome fragment using the standard algorithm used for detecting differentially expressed genes in comparative transcriptome analysis.

  7. The outcomes are further summarized and documented in BED format, which includes coordinates of the genome fragment in columns 13 and corresponding intergroup statistical values in columns 4 and beyond. Our approach refers to the R package DiffBind but differs in that we do not utilize the BAM files required for directly applying DiffBind.

DMR detection

  1. Methylation level data (BigWig) for Bisulfite-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.

  2. BigWig formats were converted to bedGraph format for each query experiment while detecting DMRs from Bisulfite-seq datasets.

  3. A DMR detector named metilene (PMID: 26631489); in particular, metilene_input.pl* provided by metilene is used to aggregate methylation rates per genomic base for each query experiment, using the bedGraph of the query experiment generated in the prior step as input.

  4. The resulting TSV file is used as the input into the main metilene* program, which returns DMRs along with statistics such as mean methylation differences and Q-values in a BED format.

  • * Default parameters are applied when executing both the metilene_input.pl script and the metilene command.

Download and view the results

Once the calculations are complete, the results (ZIP format) could be downloaded from the results page.

.zip: Compressed file consisting of the following files.

  • .igv.xml: IGV session file to display DPRs or DMRs together with the original data to be compared.
  • .log: Analysis log to identify DPRs or DMRs.
  • .bed: DPRs or DMRs in BED9 format suitable for further analysis.
  • .igv.bed: DPRs or DMRs in BED9 + GFF3 format suitable for browsing on IGV

API of Diff Analysis

An API is available to perform Diff Analysis programmatically. Please see here for the details.

9. Downloads

Data for each SRX

All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema)

  • BigWig Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bw/Experimental_ID.bw

    Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw

  • Peak-call (BED) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bedThreshold/Experimental_ID.Threshold.bed (Threshold = 05, 10, or 20)

    Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed (Peak-call data of SRX097088 with Q-value < 1E-05.)

  • Peak-call (BigBed) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bbThreshold/Experimental_ID.Threshold.bb (Threshold = 05, 10, or 20)

    Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb (Peak-call data of SRX097088 with Q-value < 1E-05.)


Assembled Peak-call data used in “Peak Browser”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/assembled/File_name.bed (Genome and File_name are listed in fileList.tab [Download, Table schema])

Example: https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed (All peak-call data of GATA2 in all cell types with Q-value < 1E-05.)

Note: As the file size of the assembled peak-call data used in “Peak Browser” is very huge, we recommend you to download the lighter version of all peak-call data (see below URLs and table schema), and to join the SRXs with the sample metadata described in experimentList.tab (Download, Table schema) on a command-line interface.


  • Table schema of the lighter version of all peak-call data:

    Column Description Example
    Column 1 Chromosome chr12
    Column 2 Begin 1234
    Column 3 End 5678
    Column 4 SRX SRX344646
    Column 5 -10Log10(MACS2 Q-value) 345

Analyzed data used in “Target Genes”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/target/Protein.Distance.tsv (Proteins are listed in analysisList.tab [Download, Table schema]) (Distance = 1, 5, or 10, indicating the distance [kb] from TSS.)

Example: https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv (TSV file describing the genes bound by POU5F1 at TSS ± 5 kb.)


Analyzed data used in “Colocalization”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/colo/Protein.Cell_type_class.tsv (Protein and Cell_type_class are listed in analysisList.tab [Download, Table schema])

Example: https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv (TSV file describing the proteins colocalizing with POU5F1 in Pluripotent stem cell.) (Spaces in the name of cell type class must be replaced with underscores _.)


Tables summarizing metadata and files

  • experimentList.tab (Download) All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas.
Column Description Example
1 Experimental ID (SRX, ERX, DRX) SRX097088
2 Genome assembly hg19
3 Track type class TFs and others
4 Track type GATA2
5 Cell type class Blood
6 Cell type K-562
7 Cell type description Primary Tissue=Blood|Tissue Diagnosis=Leukemia Chronic Myelogenous
8 Processing logs of ChIP, ATAC, DNase-seq (# of reads, % mapped, % duplicates, # of peaks [Q < 1E-05]) 30180878,82.3,42.1,6691
8 Processing logs of Bisulfite-seq (# of reads, % mapped, × coverage, # of hyper MR) 132179672,88.1,3.4,311292
9 Title GSM722415: GATA2 K562bmp r1 110325 3
10- Meta data submitted by authors source_name=GATA2 ChIP-seq K562 BMP
cell line=K562
chip antibody=GATA2
antibody catalog number=Santa Cruz SC-9008

  • fileList.tab (Download) All assembled peak-call data used in Peak Browser.
Column Description Example
1 File name Oth.ALL.05.GATA2.AllCell
2 Genome assembly hg19
3 Track type class TFs and others
4 Track type GATA2
5 Cell type class All cell types
6 Cell type -
7 Threshold 05 (indicating Q-value < 1E-05)
8 Experimental IDs included SRX070877,SRX150427,SRX092303,SRX070876,SRX150668,...

  • analysisList.tab (Download) All proteins shown in “Target Genes” and “Colocalization”.
Column Description Example
1 Antigen POU5F1
2 Cell type class in Colocalization Epidermis,Pluripotent stem cell
3 Recorded (+) or not (-) in Target Genes +
4 Genome assembly hg19

  • antigenList.tab (Download) All track types recorded in ChIP-Atlas.
Column Description Example
1 Genome assembly hg19
2 Track type class TFs and others
3 Track type POU5F1
4 Number of experiments 24
5 Experimental IDs included SRX011571,SRX011572,SRX017276,SRX021069,SRX021070,...


  • celltypeList.tab (Download) All cell types recorded in ChIP-Atlas.
Column Description Example
1 Genome assembly hg19
2 Cell type class Prostate
3 Cell type VCaP
4 Number of experiments 185
5 Experimental IDs included SRX020917,SRX020918,SRX020919,SRX020920,SRX020921,...

10. External Genome Browser

BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.