Home

ChIP-Atlas / Documents

Documents for computational processing in ChIP-Atlas.

1. Data source

Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:

LIBRARY STRATEGY == ChIP-Seq, ATAC-Seq, DNase-Hypersensitivity, or Bisulfite-Seq
LIBRARY_SOURCE == GENOMIC
taxonomy_name == Homo sapiens, Mus musculus, Rattus norvegicus, Caenorhabditis elegans, Drosophila melanogaster, or Saccharomyces cerevisiae
INSTRUMENT_MODEL ~ Illumina, NextSeq or HiSeq

2. Primary processing

Introduction

Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.

Methods

Binarized sequence raw data (.sra) for each SRX were downloaded and decoded into Fastq format with the fastq-dump command of SRA Toolkit (ver 2.3.2-4) with a default mode, except paired-end reads, which were decoded with the --split-files option. In an SRX including multiple runs, decoded Fastq files were concatenated into a single one.
Fastq files were then aligned with Bowtie 2 (ver 2.2.2) with a default mode, except paired-end reads, for which two Fastq files were specified with -1 and -2 options. The following genome assemblies were used for the alignment and subsequent processing:
- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
Resultant SAM-formatted files were then binarized into BAM format with SAMtools (ver 0.1.19; samtools view) and sorted (samtools sort) before removing PCR duplicates (samtools rmdup).
BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0; genomeCoverageBed) in RPM (Reads Per Million mapped reads) units with -scale 1000000/N option, where N is mapped read counts after removing PCR duplicates as shown in section 3.
BedGraph files were binarized into BigWig format with UCSC bedGraphToBigWig tool (ver 4). BAM files made in (3) were used to peak-call with MACS2 (ver 2.1.0; macs2 callpeak) in BED4 format. Options for Q-value threshold were set (-q 1e-05, 1e-10, or 1e-20), with the options for genome sizes as follows:
- hg38, hg19: -g hs
- mm10, mm9: -g mm
- rn6: -g 2.15e9
- dm6, dm3: -g dm
- ce11, ce10: -g ce
- sacCer3: -g 12100000
Each row in the BED4 files includes the genomic location in the 1st to 3rd columns and MACS2 score (-10*Log₁₀[MACS2 Q-value]) in the 4th column.
BED4 files were binarized into BigBed format with UCSC bedToBigBed tool (ver 2.5).

3. Data Annotation

Introduction

Experimental materials used for each SRX were manually annotated to allow for extracting data via keywords for track types and cell types.

Methods

Sample metadata for all SRXs (biosample_set.xml) were downloaded from the NCBI FTP site to extract the attributes for antigens and antibodies (see here) as well as cell types and tissues (see here).
According to the attribute values ascribed to each SRX, antigens and cell types used were manually annotated by curators who have been fully trained on molecular and developmental biology. Each annotation has a ‘Class’ and ‘Subclass’ as shown in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema). 3.| Guidelines for antigens annotation:
- Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
- Gene-encoded proteins
  - Gene symbols were recorded according to the following gene nomenclature databases:
    - HGNC (H. sapiens)
    - MGI (M. musculus)
    - RGD (R. norvegicus)
    - FlyBase (D. melanogaster)
    - WormBase (C. elegans)
    - SGD (S. cerevisiae) (e.g., OCT3/4 → POU5F1; p53 → TP53)
  - Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)
  - If an antibody recognizes multiple molecules in a family, the first in an ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)
Criteria for cell types annotation:
- H. sapiens, M. musculus and R. norvegicus Cell types were mainly classified by the tissues derived from. ES and iPS cells were exceptionally classified in ‘Pluripotent stem cell’ class.
Cell-type class Cell type

Blood K-562; CD4-Positive T-Lymphocytes

Breast MCF-7; T-47D

Pluripotent stem cell hESC H1; iPS cells
- D. melanogaster Cell types were mainly classified by cell lines and developmental stages.
- C. elegans Mainly classified by developmental stages.
- S. cerevisiae Classified by yeast strains.
- Standardized Nomenclatures Nomenclatures of cell lines and tissue names were standardized according to the following frameworks and resources:
  - Supplementary Table S2 in Yu et. al 2015 (PMID: 25877200), proposing unified cell-line names
  - ATCC, a nonprofit repository of cell lines
  - MeSH (Medical Subject Headings) for tissue names
  - FlyBase for cell lines of D. melanogaster (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
Antigens or cell types were classified in ‘Uncategorized’ class if the curators could not understand attribute values.
Antigens or cell types were classified in ‘No description’ class if there was no attribute value.

4. Peak Browser

ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-calls, such as transcription factors and histone modifications, along with ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This is useful for predicting cis-regulatory elements, as well as to find regulatory proteins and the epigenetic status of given regions. BED4-formatted peak-call data from 2.5 were concatenated and converted to BED9 + GFF3 format to browse on genome browser IGV. The BED9 files can be downloaded from Peak Browser web site, and the table schema is as follows:

Column	Description	Example
Header	Track name and link URL	(Strings)
Column 1	Chromosome	chr12
Column 2	Begin	1234
Column 3	End	5678
Column 4*	Sample metadata	(Strings)
Column 5	-10Log₁₀(MACS2 Q-value)	345
Column 6	.	.
Column 7	Begin (= Column 2)	1234
Column 8	End (= Column 3)	5678
Column 9**	Color code	255,61,0

*Column 4 Sample metadata described in GFF3 format to show annotated antigens and cell types on IGV. Furthermore, mousing over a peak displays accession number, title, and all attribute values described in Biosample metadata for the SRX.
**Column 9 Heatmap color codes for Column 5. (If Column 5 is 0, 500, or 1000, then colors are blue, green, or red, respectively.)

To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” section of 8. Downloads chapter.

Annotation tracks

In addition to browsing the above experimental tracks, users can use the “Annotation Tracks” in the Peak Browser tool to visualize functional annotations in the genomic regions of their interest. Available annotation tracks are as follows:

Genome	hg38	hg19	mm10	mm9	rn6	dm6	dm3	ce11	ce10	sacCer3
ENCODE Hi-C	○	○	○	○
GTEx eQTL	○	○
ChromHMM	○	○	○	○
FANTOM5 enhancers	○	○
JASPAR TF motif	○	○	○	○		○	○	○	○	○
GWAS Catalog	○	○
ClinVar	○	○
Orphanet	○	○
MGI Phenotype			○	○
PhastCons	○	○	○	○	○	○	○	○	○	○
RepeatMasker	○	○	○	○	○	○	○	○	○
RNA-seq^1,2,3,4	○	○	○	○		○	○
Ensembl genes	○	○	○	○	○	○	○	○	○	○
GENCODE genes	○	○	○	○
ENCODE Blacklist	○	○	○	○		○	○	○	○
CpG Islands	○	○	○	○	○	○	○	○	○

5. Target Genes

Introduction

The ChIP-Atlas Target Genes feature predicts genes directly regulated by given proteins, based on binding profiles of all public ChIP-seq data for particular gene loci. Target genes were accepted if the peak-call intervals of a given protein overlapped with a transcription start site (TSS) ± N kb (N = 1, 5, or 10).

Methods

Peak-call data: BED4-formatted peak-call data of each SRX made in section 2.5 were used (MACS2 Q-value < 1E-05; antigen class = ‘TFs and others’).
Preparation of TSS library: Location of TSSs and gene symbols were according to refFlat files (at UCSC FTP site); only protein-coding genes were used for this analysis.
Preparation of STRING library: STRING is a comprehensive database recording protein-protein and protein-gene interactions based on experimental evidence. A file describing all interactions was downloaded from protein.actions.v10.txt.gz, and the protein IDs were converted to gene symbols with protein.aliases.v10.txt.gz.
Processing: bedtools window command (bedtools ver 2.17.0) was used to search target genes from peak-call data (5.1) from the TSS library (5.2) with a window size option (-w 1000, 5000, or 10000). Peak-call data of the same antigens were collected, and MACS2 scores (-10*Log₁₀[MACS2 Q-value]) were indicated as heatmap colors on the web browser (MACS2 score = 0, 500, 1000 → color = blue, green, red) (see example). If a gene intersected with multiple peaks of a single SRX, the highest MACS2 score was chosen for the color indication. The ‘Average’ column at the far left of the table shows the means of the MACS2 scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-gene interaction according to STRING library (5.3). For more details, protein-gene pairs in protein.actions.v10.txt.gz file were extracted when meeting the following conditions:
- 1st column (item_id_a) == Query antigen
- 2nd column (item_id_b) == Target gene
- 3rd column (mode) == "expression"
- 5th column (a_is_acting) == "1"

6. Colocalization

Introduction

Many transcription factors (TFs) form complexes to promote or enhance transcriptional activity (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). ChIP-seq profiles of such TFs are often similar, showing colocalization on multiple genomic regions. The ChIP-Atlas Colocalization predicts colocalization partners of given TFs, evaluated through comprehensive and combinatorial similarity analyses of all public ChIP-seq data.

Algorithms

BED4-formatted peak-call data made in section 2.5 were analyzed to evaluate the similarities to other peak-call data in identical cell-type classes. Their similarities were analyzed with CoLo, a tool to evaluate the colocalization of transcription factors (TFs) with multiple ChIP-seq peak-call data. Advantages of CoLo are:

(a) it compensates for biases derived from different experimental conditions. (b) it adjusts the difference of the peak numbers and distributions coming from innate characteristics of the TFs.

The function (a) is programed so that MACS2 scores in each BED4 file were fitted to a Gaussian distribution, dividing the BED4 files into three groups:

H (High binding; Z-score > 0.5)
M (Middle binding; -0.5 ≤ Z-score ≤ 0.5)
L (Low binding; Z-score < -0.5)

These three groups are used as independent data to evaluate similarity through the function (b). Thus, CoLo evaluates the similarity of two SRXs (e.g., SRX_1 and SRX_2) with nine combinations:

[H/M/L of SRX_1] x [H/M/L of SRX_2]

Eventually, a set of nine Boolean results (similar or not) is returned to indicate the similarity of SRX_1 and SRX_2.

Methods

Peak-call data: Same as (5.1).
STRING library: Same as (5.2).
Processing: Peak-call data in identical cell-type classes were processed through CoLo. The scores between the two BED files were calculated by multiplication of the combination of the H (= 3), M (= 2), or L (= 1) as follows:

SRX_1	SRX_2	Scores
H	H	9
H	M	6
H	L	3
M	H	6
M	M	4
M	L	2
L	H	3
L	M	2
L	L	1

If multiple H/M/L combinations were returned from SRX_1 and SRX_2, the highest score was adopted. The scores (1 to 9) were colored in blue, green to red, and gray if all nine H/M/L combinations were false (see example). The ‘Average’ column on the far left of the table shows the means of the CoLo scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-protein interaction (6.2). For more detail, protein-protein pairs in protein.actions.v10.txt.gz file were extracted if meeting all the following conditions:

1st column (item_id_a) == query antigen
2nd column (item_id_b) == co-association partner
3rd column (mode) == "binding"

7. Enrichment Analysis

Introduction

ChIP-Atlas Enrichment Analysis accepts users’ data in the following three formats:

Genomic regions in BED format (to search proteins bound to the regions)
Sequence motif (to search proteins bound to the motif)
Gene list (to search proteins bound to the genes)

In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:

Data in panel 4.	Data in panel 5.	Aims and analyses
BED	Random permutation	Proteins bound to BED intervals more often than by chance.
BED	BED	Proteins differentially bound between the two sets of BED intervals.
Motif	Random permutation	Proteins bound to a sequence motif more often than by chance.
Motif	Motif	Proteins differentially bound between the two motifs.
Genes	RefSeq coding genes	Proteins bound to genes more often than other RefSeq genes.
Genes	Genes	Proteins differentially bound between the two sets of gene lists.

Requirements and acceptable data

Reference peak-call data (upper panels (1 to 3) of the submission form): Comprehensive peak-call data as described above (4. Peak browser). The result will be returned more quickly if the classes of antigens and cell-types are specified.
BED (lower panels (4 and 5) of the submission form): UCSC BED format, minimally requiring three tab-delimited columns describing chromosome, and starting and ending positions.
```
chr1<tab>1435385<tab>1436458
chrX<tab>4634643<tab>4635798
```
A header and column 4 or later can be included, but they are ignored for the analysis. BE CAREFUL that only BED files in the following genome assemblies are acceptable:
- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
If the BED file is in other genome assembly, convert it to a suitable one with UCSC liftOver tool.
Motif (lower panels (4 and 5) of the submission form): A sequence motif described in IUPAC nucleic acid notation. In addition to normal codes (ATGC), ambiguity codes are also acceptable (WSMKRYBDHVN).
Gene list (lower panels (4 and 5) of the submission form): Gene symbols must be entered according to following nomenclatures:
- HGNC (H. sapiens)
- MGI (M. musculus)
- RGD (R. norvegicus)
- FlyBase (D. melanogaster)
- WormBase (C. elegans)
- SGD (S. cerevisiae)
(e.g., OCT3/4 → POU5F1; p53 → TP53)
If the gene lists are described using any other format (e.g., Gene IDs in Refseq or Emsemble format), use a batch conversion tool such as DAVID (Convert into OFFICIAL_GENE_SYMBOL with Gene ID Conversion Tool).

Methods

Submitted data are converted to BED files depending on the data types.

BED Submitted BED files are used only for further processing. If ‘Random permutation’ is set for the comparison, the submitted BED intervals are permuted on a random chromosome at a random position for specified times with bedtools shuffle command (bedtools; ver 2.17.0).
Motif Genomic locations perfectly matching to submitted sequence are searched by Bowtie (ver 0.12.8) and converted to BED format. If ‘Random permutation’ is set for the comparison, the BED is used for random permutation as described above.
Gene list Unique TSSs of submitted genes are defined with xxxCanonical.txt.gz* library distributed from UCSC FTP site. * xxx is a placeholder for:
- “known” (H. sapiens and M. musculus)
- “flyBase” (D. melanogaster)
- “sanger” (C. elegans)
- “sgd” (S. cerevisiae)
Unique TSSs of R. norvegicus genes are defined with a gene list distributed from RGD.

The locations of TSSs are converted to BED format with the addition of widths specified in ‘Distance range from TSS’ on the submission form. If ‘RefSeq coding gene’ is set for the comparison, RefSeq coding genes excluding those in submitted list are processed to BED format as mentioned above.

The overlaps between the BED (originated from panels 4 and 5 of the submission form) and reference peak-call data (specified on upper panels 1 to 3 of the submission form) are counted with bedtools intersect command (BedTools2; ver 2.23.0).
P-values are calculated with two-tailed Fisher’s exact probability test (see example). The null hypothesis is that the intersection of reference peaks with submitted data in panel 4 occurs in the same proportion to those with data in pannel 5 of the submission form. Q-values are calculated with the Benjamini & Hochberg method.
Fold enrichment is calculated by (column 6) / (column 7) of of the same row. If the ratio > 1, the rightmost column is ‘TRUE’, meaning that the proteins from column 3 binds to the data of panel 4 in a greater proportion than to those of panel 5 specified in the submission form.

API of Enrichment Analysis

An API is available to perform Enrichment Analysis programmatically. Please see here for the details.

8. Diff Analysis

Introduction

The Diff Analysis tool can be used to identify differential peak regions (DPRs) or differentially methylated regions (DMRs) from two given sets of ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.

Requirements and acceptable data

Experiment type (panel (1) of the submission form):

ChIP/ATAC/DNase-seq: to obtain DPRs from given two sets of ChIP-seq, ATAC-seq, or DNase-seq data.
Bisulfite-Seq: to obtain DMRs from given two sets of Bisulfite-seq data.

Experiment ID(s) (panels (2 and 3) of the submission form):

ID(s) of NCBI, ENA, and DDBJ (e.g., SRX18419259, ERX1103210, DRX335588)
ID(s) of NCBI GEO (e.g., GSM6765200)

Publicly accessible URL(s) of data on custom web server

ChIP/ATAC/DNase-seq: The URLs of raw read coverage (UCSC bigWig) in integer values and peak-call (UCSC BED) data, and total number of mapped reads. Try with the following example in hg38.

Dataset A (tab-separated)

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed	205201674
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw	https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed	208332830

Dataset B (tab-separated)

https://chip-atlas.dbcls.jp/data/manual/examples/sample_B1.bw	https://chip-atlas.dbcls.jp/data/manual/example/sample_B1.bed	87753996
https://chip-atlas.dbcls.jp/data/manual/examples/sample_B2.bw	https://chip-atlas.dbcls.jp/data/manual/example/sample_B2.bed	90562744

Bisulfite-seq: BigWig data of methylation rate (between 0 and 1). Try with the following example in rn6.

Dataset A

https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw

Dataset B

https://chip-atlas.dbcls.jp/data/manual/examples/sample_B3.bw

The Dataset Search tool is useful to find the IDs of your interest by keyword search. NCBI GEO is also useful for finding the IDs of interest within an identical series of experiments.

Methods

DPR detection

Alignment data (BigWig) and peak calling data (BED) for ChIP-seq, ATAC-seq, and DNase-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.
BigWig formats were converted to bedGraph format for each query experiment while detecting DPRs from ChIP-seq, ATAC-seq, or DNase-seq datasets.
The raw sequence read coverage is reconstructed with reference to the total mapped sequencing read count of the experiment.
The entire genome is fragmented based on the peak calling data from the query experiments.
The number of sequence reads aligning with each genome fragment is aggregated and the result is organized into an m × n matrix, with m representing the number of genome fragments and n representing the number of query experiments.
The matrix is entered into the R package edgeR, and the difference in read counts between the two sets of query experiments is assessed for each genome fragment using the standard algorithm used for detecting differentially expressed genes in comparative transcriptome analysis.
The outcomes are further summarized and documented in BED format, which includes coordinates of the genome fragment in columns 1–3 and corresponding intergroup statistical values in columns 4 and beyond. Our approach refers to the R package DiffBind but differs in that we do not utilize the BAM files required for directly applying DiffBind.

DMR detection

Methylation level data (BigWig) for Bisulfite-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.
BigWig formats were converted to bedGraph format for each query experiment while detecting DMRs from Bisulfite-seq datasets.
A DMR detector named metilene (PMID: 26631489); in particular, metilene_input.pl* provided by metilene is used to aggregate methylation rates per genomic base for each query experiment, using the bedGraph of the query experiment generated in the prior step as input.
The resulting TSV file is used as the input into the main metilene* program, which returns DMRs along with statistics such as mean methylation differences and Q-values in a BED format.

* Default parameters are applied when executing both the metilene_input.pl script and the metilene command.

Download and view the results

Once the calculations are complete, the results (ZIP format) could be downloaded from the results page.

.zip: Compressed file consisting of the following files.

.igv.xml: IGV session file to display DPRs or DMRs together with the original data to be compared.
.log: Analysis log to identify DPRs or DMRs.
.bed: DPRs or DMRs in BED9 format suitable for further analysis.
.igv.bed: DPRs or DMRs in BED9 + GFF3 format suitable for browsing on IGV

API of Diff Analysis

An API is available to perform Diff Analysis programmatically. Please see here for the details.

9. Downloads

Data for each SRX

All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema)

BigWig Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bw/Experimental_ID.bw

Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw
Peak-call (BED) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bedThreshold/Experimental_ID.Threshold.bed (Threshold = 05, 10, or 20)

Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed (Peak-call data of SRX097088 with Q-value < 1E-05.)
Peak-call (BigBed) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bbThreshold/Experimental_ID.Threshold.bb (Threshold = 05, 10, or 20)

Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb (Peak-call data of SRX097088 with Q-value < 1E-05.)

Assembled Peak-call data used in “Peak Browser”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/assembled/File_name.bed (Genome and File_name are listed in fileList.tab [Download, Table schema])

Example: https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed (All peak-call data of GATA2 in all cell types with Q-value < 1E-05.)

Note: As the file size of the assembled peak-call data used in “Peak Browser” is very huge, we recommend you to download the lighter version of all peak-call data (see below URLs and table schema), and to join the SRXs with the sample metadata described in experimentList.tab (Download, Table schema) on a command-line interface.

Download the lighter version of all peak-call data (Q: MACS2 Q-value thresholds).

Genome	Q < 1E-05	Q < 1E-10	Q < 1E-20	Q < 1E-50	WGBS
hg38	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
hg19	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
mm10	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
mm9	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
rn6	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
dm6	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
dm3	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
ce11	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
ce10	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎
sacCer3	⬇︎	⬇︎	⬇︎	⬇︎	⬇︎

Table schema of the lighter version of all peak-call data:

Column Description Example

Column 1 Chromosome chr12

Column 2 Begin 1234

Column 3 End 5678

Column 4 SRX SRX344646

Column 5 -10Log₁₀(MACS2 Q-value) 345

Analyzed data used in “Target Genes”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/target/Protein.Distance.tsv (Proteins are listed in analysisList.tab [Download, Table schema]) (Distance = 1, 5, or 10, indicating the distance [kb] from TSS.)

Example: https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv (TSV file describing the genes bound by POU5F1 at TSS ± 5 kb.)

Analyzed data used in “Colocalization”

Download URL: https://chip-atlas.dbcls.jp/data/Genome/colo/Protein.Cell_type_class.tsv (Protein and Cell_type_class are listed in analysisList.tab [Download, Table schema])

Example: https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv (TSV file describing the proteins colocalizing with POU5F1 in Pluripotent stem cell.) (Spaces in the name of cell type class must be replaced with underscores _.)

Tables summarizing metadata and files

experimentList.tab (Download) All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas.

Column	Description	Example
1	Experimental ID (SRX, ERX, DRX)	SRX097088
2	Genome assembly	hg19
3	Track type class	TFs and others
4	Track type	GATA2
5	Cell type class	Blood
6	Cell type	K-562
7	Cell type description	Primary Tissue=Blood\|Tissue Diagnosis=Leukemia Chronic Myelogenous
8	Processing logs of ChIP, ATAC, DNase-seq (# of reads, % mapped, % duplicates, # of peaks [Q < 1E-05])	30180878,82.3,42.1,6691
8	Processing logs of Bisulfite-seq (# of reads, % mapped, × coverage, # of hyper MR)	132179672,88.1,3.4,311292
9	Title	GSM722415: GATA2 K562bmp r1 110325 3
10-	Meta data submitted by authors	source_name=GATA2 ChIP-seq K562 BMP
		cell line=K562
		chip antibody=GATA2
		antibody catalog number=Santa Cruz SC-9008

fileList.tab (Download) All assembled peak-call data used in Peak Browser.

Column	Description	Example
1	File name	Oth.ALL.05.GATA2.AllCell
2	Genome assembly	hg19
3	Track type class	TFs and others
4	Track type	GATA2
5	Cell type class	All cell types
6	Cell type	-
7	Threshold	05 (indicating Q-value < 1E-05)
8	Experimental IDs included	SRX070877,SRX150427,SRX092303,SRX070876,SRX150668,...

analysisList.tab (Download) All proteins shown in “Target Genes” and “Colocalization”.

Column	Description	Example
1	Antigen	POU5F1
2	Cell type class in Colocalization	Epidermis,Pluripotent stem cell
3	Recorded (+) or not (-) in Target Genes	+
4	Genome assembly	hg19

antigenList.tab (Download) All track types recorded in ChIP-Atlas.

Column	Description	Example
1	Genome assembly	hg19
2	Track type class	TFs and others
3	Track type	POU5F1
4	Number of experiments	24
5	Experimental IDs included	SRX011571,SRX011572,SRX017276,SRX021069,SRX021070,...

celltypeList.tab (Download) All cell types recorded in ChIP-Atlas.

Column	Description	Example
1	Genome assembly	hg19
2	Cell type class	Prostate
3	Cell type	VCaP
4	Number of experiments	185
5	Experimental IDs included	SRX020917,SRX020918,SRX020919,SRX020920,SRX020921,...

10. External Genome Browser

BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.

genome.ucsc.edu
genome-asia.ucsc.edu (asian mirror) Currently track hub feature is only provided based on files for each individual experiment, but we are working on to browse files assembled by antigen and cell types. See Using UCSC Genome Browser Track Hubs for more details.

Cell-type class	Cell type
Blood	K-562; CD4-Positive T-Lymphocytes
Breast	MCF-7; T-47D
Pluripotent stem cell	hESC H1; iPS cells

Home

ChIP-Atlas / Documents

Table of Contents

1. Data source

2. Primary processing

Introduction

Methods

3. Data Annotation

Introduction

Methods

4. Peak Browser

Annotation tracks

5. Target Genes

Introduction

Methods

6. Colocalization

Introduction

Algorithms

Methods

7. Enrichment Analysis

Introduction

Requirements and acceptable data

Methods

API of Enrichment Analysis

8. Diff Analysis

Introduction

Requirements and acceptable data

Methods

DPR detection

DMR detection

Download and view the results

API of Diff Analysis

9. Downloads

Data for each SRX

Assembled Peak-call data used in “Peak Browser”

Analyzed data used in “Target Genes”

Analyzed data used in “Colocalization”

Tables summarizing metadata and files

10. External Genome Browser

Clone this wiki locally