-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Documents for computational processing in ChIP-Atlas.
- Data source
- Primary processing
- Data Annotation
- Peak Browser
- Target Genes
- Colocalization
- Enrichment Analysis
- Diff Analysis
- Downloads
- External Genome Browser
Currently, most academic journals require that authors of studies including high-throughput sequencing must submit their raw sequence data as SRAs (Sequence Read Archives) to public repositories (NCBI, DDBJ or ENA). Each experiment is assigned an ID, called an experimental accession, beginning with SRX, DRX, or ERX (hereafter ‘SRXs’). To refer to corresponding ‘experiment’ and ‘biosample’ metadata in the XML format (available from NCBI FTP site), ChIP-Atlas uses SRXs with the following criteria:
-
LIBRARY STRATEGY ==
ChIP-Seq
,ATAC-Seq
,DNase-Hypersensitivity
, orBisulfite-Seq
-
LIBRARY_SOURCE ==
GENOMIC
-
taxonomy_name ==
Homo sapiens
,Mus musculus
,Rattus norvegicus
,Caenorhabditis elegans
,Drosophila melanogaster
, orSaccharomyces cerevisiae
-
INSTRUMENT_MODEL ~
Illumina
,NextSeq
orHiSeq
Raw sequence data from SRXs as shown above were aligned to reference genomes with Bowtie2 before being analyzed for coverage in BigWig format and peak-calls in BED format.
-
Binarized sequence raw data (.sra) for each SRX were downloaded and decoded into Fastq format with the
fastq-dump
command of SRA Toolkit (ver 2.3.2-4) with a default mode, except paired-end reads, which were decoded with the--split-files
option. In an SRX including multiple runs, decoded Fastq files were concatenated into a single one. -
Fastq files were then aligned with Bowtie 2 (ver 2.2.2) with a default mode, except paired-end reads, for which two Fastq files were specified with
-1
and-2
options. The following genome assemblies were used for the alignment and subsequent processing:- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
-
Resultant SAM-formatted files were then binarized into BAM format with SAMtools (ver 0.1.19;
samtools view
) and sorted (samtools sort
) before removing PCR duplicates (samtools rmdup
). -
BedGraph-formatted coverage scores were calculated with bedtools (ver 2.17.0;
genomeCoverageBed
) in RPM (Reads Per Million mapped reads) units with-scale 1000000/N
option, where N is mapped read counts after removing PCR duplicates as shown in section 3. -
BedGraph files were binarized into BigWig format with UCSC
bedGraphToBigWig
tool (ver 4). BAM files made in (3) were used to peak-call with MACS2 (ver 2.1.0;macs2 callpeak
) in BED4 format. Options for Q-value threshold were set (-q 1e-05
,1e-10
, or1e-20
), with the options for genome sizes as follows:- hg38, hg19:
-g hs
- mm10, mm9:
-g mm
- rn6:
-g 2.15e9
- dm6, dm3:
-g dm
- ce11, ce10:
-g ce
- sacCer3:
-g 12100000
Each row in the BED4 files includes the genomic location in the 1st to 3rd columns and MACS2 score (-10*Log10[MACS2 Q-value]) in the 4th column.
- hg38, hg19:
-
BED4 files were binarized into BigBed format with UCSC
bedToBigBed
tool (ver 2.5).
Experimental materials used for each SRX were manually annotated to allow for extracting data via keywords for track types and cell types.
-
Sample metadata for all SRXs (biosample_set.xml) were downloaded from the NCBI FTP site to extract the attributes for antigens and antibodies (see here) as well as cell types and tissues (see here).
-
According to the attribute values ascribed to each SRX, antigens and cell types used were manually annotated by curators who have been fully trained on molecular and developmental biology. Each annotation has a ‘Class’ and ‘Subclass’ as shown in antigenList.tab (Download, Table schema) and celltypeList.tab (Download, Table schema). 3.| Guidelines for antigens annotation:
- Histones Based on Brno nomenclature (PMID: 15702071). (e.g., H3K4me3, H3K27ac)
-
Gene-encoded proteins
-
Gene symbols were recorded according to the following gene nomenclature databases:
-
Modifications such as phosphorylation were ignored. (e.g., phospho-SMAD3 → SMAD3)
-
If an antibody recognizes multiple molecules in a family, the first in an ascending order was chosen. (e.g., Anti-SMAD2/3 antibody → SMAD2)
-
-
Criteria for cell types annotation:
- H. sapiens, M. musculus and R. norvegicus Cell types were mainly classified by the tissues derived from. ES and iPS cells were exceptionally classified in ‘Pluripotent stem cell’ class.
Cell-type class Cell type Blood K-562; CD4-Positive T-Lymphocytes Breast MCF-7; T-47D Pluripotent stem cell hESC H1; iPS cells - D. melanogaster Cell types were mainly classified by cell lines and developmental stages.
- C. elegans Mainly classified by developmental stages.
- S. cerevisiae Classified by yeast strains.
-
Standardized Nomenclatures
Nomenclatures of cell lines and tissue names were standardized according to the following frameworks and resources:
- Supplementary Table S2 in Yu et. al 2015 (PMID: 25877200), proposing unified cell-line names
- ATCC, a nonprofit repository of cell lines
- MeSH (Medical Subject Headings) for tissue names
- FlyBase for cell lines of D. melanogaster (e.g., MDA-231, MDA231, MDAMB231 → MDA-MB-231)
-
Antigens or cell types were classified in ‘Uncategorized’ class if the curators could not understand attribute values.
-
Antigens or cell types were classified in ‘No description’ class if there was no attribute value.
ChIP-Atlas Peak Browser allows users to browse multiple ChIP-seq peak-calls, such as transcription factors and histone modifications, along with ATAC-seq, DNase-seq, and Bisulfite-seq data on the genome browser IGV. This is useful for predicting cis-regulatory elements, as well as to find regulatory proteins and the epigenetic status of given regions. BED4-formatted peak-call data from 2.5 were concatenated and converted to BED9 + GFF3 format to browse on genome browser IGV. The BED9 files can be downloaded from Peak Browser web site, and the table schema is as follows:
Column | Description | Example |
---|---|---|
Header | Track name and link URL | (Strings) |
Column 1 | Chromosome | chr12 |
Column 2 | Begin | 1234 |
Column 3 | End | 5678 |
Column 4* | Sample metadata | (Strings) |
Column 5 | -10Log10(MACS2 Q-value) | 345 |
Column 6 | . | . |
Column 7 | Begin (= Column 2) | 1234 |
Column 8 | End (= Column 3) | 5678 |
Column 9** | Color code | 255,61,0 |
- *Column 4 Sample metadata described in GFF3 format to show annotated antigens and cell types on IGV. Furthermore, mousing over a peak displays accession number, title, and all attribute values described in Biosample metadata for the SRX.
- **Column 9 Heatmap color codes for Column 5. (If Column 5 is 0, 500, or 1000, then colors are blue, green, or red, respectively.)
To find the URLs of the BED9 files, see Assembled Peak-call data used in “Peak Browser” section of 8. Downloads chapter.
In addition to browsing the above experimental tracks, users can use the “Annotation Tracks” in the Peak Browser tool to visualize functional annotations in the genomic regions of their interest. Available annotation tracks are as follows:
Genome | hg38 | hg19 | mm10 | mm9 | rn6 | dm6 | dm3 | ce11 | ce10 | sacCer3 |
---|---|---|---|---|---|---|---|---|---|---|
ENCODE Hi-C | ○ | ○ | ○ | ○ | ||||||
GTEx eQTL | ○ | ○ | ||||||||
ChromHMM | ○ | ○ | ○ | ○ | ||||||
FANTOM5 enhancers | ○ | ○ | ||||||||
JASPAR TF motif | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |
GWAS Catalog | ○ | ○ | ||||||||
ClinVar | ○ | ○ | ||||||||
Orphanet | ○ | ○ | ||||||||
MGI Phenotype | ○ | ○ | ||||||||
PhastCons | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
RepeatMasker | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |
RNA-seq1,2,3,4 | ○ | ○ | ○ | ○ | ○ | ○ | ||||
Ensembl genes | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
GENCODE genes | ○ | ○ | ○ | ○ | ||||||
ENCODE Blacklist | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||
CpG Islands | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
The ChIP-Atlas Target Genes feature predicts genes directly regulated by given proteins, based on binding profiles of all public ChIP-seq data for particular gene loci. Target genes were accepted if the peak-call intervals of a given protein overlapped with a transcription start site (TSS) ± N kb (N = 1, 5, or 10).
- Peak-call data: BED4-formatted peak-call data of each SRX made in section 2.5 were used (MACS2 Q-value < 1E-05; antigen class = ‘TFs and others’).
- Preparation of TSS library: Location of TSSs and gene symbols were according to refFlat files (at UCSC FTP site); only protein-coding genes were used for this analysis.
- Preparation of STRING library: STRING is a comprehensive database recording protein-protein and protein-gene interactions based on experimental evidence. A file describing all interactions was downloaded from protein.actions.v10.txt.gz, and the protein IDs were converted to gene symbols with protein.aliases.v10.txt.gz.
- Processing:
bedtools window
command (bedtools ver 2.17.0) was used to search target genes from peak-call data (5.1) from the TSS library (5.2) with a window size option (-w 1000
,5000
, or10000
). Peak-call data of the same antigens were collected, and MACS2 scores (-10*Log10[MACS2 Q-value]) were indicated as heatmap colors on the web browser (MACS2 score = 0, 500, 1000 → color = blue, green, red) (see example). If a gene intersected with multiple peaks of a single SRX, the highest MACS2 score was chosen for the color indication. The ‘Average’ column at the far left of the table shows the means of the MACS2 scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-gene interaction according to STRING library (5.3). For more details, protein-gene pairs in protein.actions.v10.txt.gz file were extracted when meeting the following conditions:- 1st column (item_id_a) == Query antigen
- 2nd column (item_id_b) == Target gene
- 3rd column (mode) == "expression"
- 5th column (a_is_acting) == "1"
Many transcription factors (TFs) form complexes to promote or enhance transcriptional activity (e.g., Pou5f1, Nanog, and Sox2 in mouse ES cells). ChIP-seq profiles of such TFs are often similar, showing colocalization on multiple genomic regions. The ChIP-Atlas Colocalization predicts colocalization partners of given TFs, evaluated through comprehensive and combinatorial similarity analyses of all public ChIP-seq data.
BED4-formatted peak-call data made in section 2.5 were analyzed to evaluate the similarities to other peak-call data in identical cell-type classes. Their similarities were analyzed with CoLo, a tool to evaluate the colocalization of transcription factors (TFs) with multiple ChIP-seq peak-call data. Advantages of CoLo are:
(a) it compensates for biases derived from different experimental conditions. (b) it adjusts the difference of the peak numbers and distributions coming from innate characteristics of the TFs.
The function (a) is programed so that MACS2 scores in each BED4 file were fitted to a Gaussian distribution, dividing the BED4 files into three groups:
- H (High binding; Z-score > 0.5)
- M (Middle binding; -0.5 ≤ Z-score ≤ 0.5)
- L (Low binding; Z-score < -0.5)
These three groups are used as independent data to evaluate similarity through the function (b). Thus, CoLo evaluates the similarity of two SRXs (e.g., SRX_1 and SRX_2) with nine combinations:
[H/M/L of SRX_1] x [H/M/L of SRX_2]
Eventually, a set of nine Boolean results (similar or not) is returned to indicate the similarity of SRX_1 and SRX_2.
- Peak-call data: Same as (5.1).
- STRING library: Same as (5.2).
- Processing: Peak-call data in identical cell-type classes were processed through CoLo. The scores between the two BED files were calculated by multiplication of the combination of the H (= 3), M (= 2), or L (= 1) as follows:
SRX_1 | SRX_2 | Scores |
---|---|---|
H | H | 9 |
H | M | 6 |
H | L | 3 |
M | H | 6 |
M | M | 4 |
M | L | 2 |
L | H | 3 |
L | M | 2 |
L | L | 1 |
If multiple H/M/L combinations were returned from SRX_1 and SRX_2, the highest score was adopted. The scores (1 to 9) were colored in blue, green to red, and gray if all nine H/M/L combinations were false (see example). The ‘Average’ column on the far left of the table shows the means of the CoLo scores in the same row. The ‘STRING’ column on the far right indicates the STRING scores for the protein-protein interaction (6.2). For more detail, protein-protein pairs in protein.actions.v10.txt.gz file were extracted if meeting all the following conditions:
- 1st column (item_id_a) == query antigen
- 2nd column (item_id_b) == co-association partner
- 3rd column (mode) == "binding"
ChIP-Atlas Enrichment Analysis accepts users’ data in the following three formats:
- Genomic regions in BED format (to search proteins bound to the regions)
- Sequence motif (to search proteins bound to the motif)
- Gene list (to search proteins bound to the genes)
In addition, the following analyses are possible by specifying the data for comparison on the submission form of Enrichment Analysis:
Data in panel 4. | Data in panel 5. | Aims and analyses |
---|---|---|
BED | Random permutation | Proteins bound to BED intervals more often than by chance. |
BED | BED | Proteins differentially bound between the two sets of BED intervals. |
Motif | Random permutation | Proteins bound to a sequence motif more often than by chance. |
Motif | Motif | Proteins differentially bound between the two motifs. |
Genes | RefSeq coding genes | Proteins bound to genes more often than other RefSeq genes. |
Genes | Genes | Proteins differentially bound between the two sets of gene lists. |
-
Reference peak-call data (upper panels (1 to 3) of the submission form): Comprehensive peak-call data as described above (4. Peak browser). The result will be returned more quickly if the classes of antigens and cell-types are specified.
-
BED (lower panels (4 and 5) of the submission form): UCSC BED format, minimally requiring three tab-delimited columns describing chromosome, and starting and ending positions.
chr1<tab>1435385<tab>1436458 chrX<tab>4634643<tab>4635798
A header and column 4 or later can be included, but they are ignored for the analysis. BE CAREFUL that only BED files in the following genome assemblies are acceptable:
- hg38, hg19 (H. sapiens)
- mm10, mm9 (M. musculus)
- rn6 (R. norvegicus)
- dm6, dm3 (D. melanogaster)
- ce11, ce10 (C. elegans)
- sacCer3 (S. cerevisiae)
If the BED file is in other genome assembly, convert it to a suitable one with UCSC liftOver tool.
-
Motif (lower panels (4 and 5) of the submission form): A sequence motif described in IUPAC nucleic acid notation. In addition to normal codes (ATGC), ambiguity codes are also acceptable (WSMKRYBDHVN).
-
Gene list (lower panels (4 and 5) of the submission form):
-
Gene symbols must be entered according to following nomenclatures (e.g., OCT3/4 → POU5F1; p53 → TP53):
-
In addition to official gene symbols (e.g. POU5F1), Ensembl IDs (ENSG00000204531), Uniprot IDs (Q01860), and RefSeq IDs (NM_002701) are also acceptable.
-
If the gene lists are described using any other format, use a batch conversion tool such as DAVID (Convert into OFFICIAL_GENE_SYMBOL or IDs mentioned above with Gene ID Conversion Tool).
-
- Submitted data are converted to BED files depending on the data types.
-
BED Submitted BED files are used only for further processing. If ‘Random permutation’ is set for the comparison, the submitted BED intervals are permuted on a random chromosome at a random position for specified times with
bedtools shuffle
command (bedtools; ver 2.17.0). -
Motif Genomic locations perfectly matching to submitted sequence are searched by Bowtie (ver 0.12.8) and converted to BED format. If ‘Random permutation’ is set for the comparison, the BED is used for random permutation as described above.
-
Gene list Unique TSSs of submitted genes are defined with xxxCanonical.txt.gz* library distributed from UCSC FTP site. * xxx is a placeholder for:
- “known” (H. sapiens and M. musculus)
- “flyBase” (D. melanogaster)
- “sanger” (C. elegans)
- “sgd” (S. cerevisiae)
Unique TSSs of R. norvegicus genes are defined with a gene list distributed from RGD.
The locations of TSSs are converted to BED format with the addition of widths specified in ‘Distance range from TSS’ on the submission form. If ‘RefSeq coding gene’ is set for the comparison, RefSeq coding genes excluding those in submitted list are processed to BED format as mentioned above.
- The overlaps between the BED (originated from panels 4 and 5 of the submission form) and reference peak-call data (specified on upper panels 1 to 3 of the submission form) are counted with
bedtools intersect
command (BedTools2; ver 2.23.0). - P-values are calculated with two-tailed Fisher’s exact probability test (see example). The null hypothesis is that the intersection of reference peaks with submitted data in panel 4 occurs in the same proportion to those with data in pannel 5 of the submission form. Q-values are calculated with the Benjamini & Hochberg method.
- Fold enrichment is calculated by (column 6) / (column 7) of of the same row. If the ratio > 1, the rightmost column is ‘TRUE’, meaning that the proteins from column 3 binds to the data of panel 4 in a greater proportion than to those of panel 5 specified in the submission form.
An API is available to perform Enrichment Analysis programmatically. Please see here for the details.
The Diff Analysis tool can be used to identify differential peak regions (DPRs) or differentially methylated regions (DMRs) from two given sets of ChIP/ATAC/DNase-seq or Bisulfite-seq data, respectively.
Experiment type (panel (1) of the submission form):
- ChIP/ATAC/DNase-seq: to obtain DPRs from given two sets of ChIP-seq, ATAC-seq, or DNase-seq data.
- Bisulfite-Seq: to obtain DMRs from given two sets of Bisulfite-seq data.
Experiment ID(s) (panels (2 and 3) of the submission form):
- ID(s) of NCBI, ENA, and DDBJ (e.g., SRX18419259, ERX1103210, DRX335588)
- ID(s) of NCBI GEO (e.g., GSM6765200)
- Publicly accessible URL(s) of data on custom web server
- ChIP/ATAC/DNase-seq: The URLs of raw read coverage (UCSC bigWig) in integer values and peak-call (UCSC BED) data, and total number of mapped reads. Try with the following example in hg38.
- Dataset A (tab-separated)
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bw https://chip-atlas.dbcls.jp/data/manual/examples/sample_A1.bed 205201674 https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bw https://chip-atlas.dbcls.jp/data/manual/examples/sample_A2.bed 208332830
- Dataset B (tab-separated)
https://chip-atlas.dbcls.jp/data/manual/examples/sample_B1.bw https://chip-atlas.dbcls.jp/data/manual/example/sample_B1.bed 87753996 https://chip-atlas.dbcls.jp/data/manual/examples/sample_B2.bw https://chip-atlas.dbcls.jp/data/manual/example/sample_B2.bed 90562744
- Dataset A (tab-separated)
- Bisulfite-seq: BigWig data of methylation rate (between 0 and 1). Try with the following example in rn6.
- Dataset A
https://chip-atlas.dbcls.jp/data/manual/examples/sample_A3.bw
- Dataset B
https://chip-atlas.dbcls.jp/data/manual/examples/sample_B3.bw
- Dataset A
- ChIP/ATAC/DNase-seq: The URLs of raw read coverage (UCSC bigWig) in integer values and peak-call (UCSC BED) data, and total number of mapped reads. Try with the following example in hg38.
-
Alignment data (BigWig) and peak calling data (BED) for ChIP-seq, ATAC-seq, and DNase-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.
-
BigWig formats were converted to bedGraph format for each query experiment while detecting DPRs from ChIP-seq, ATAC-seq, or DNase-seq datasets.
-
The raw sequence read coverage is reconstructed with reference to the total mapped sequencing read count of the experiment.
-
The entire genome is fragmented based on the peak calling data from the query experiments.
-
The number of sequence reads aligning with each genome fragment is aggregated and the result is organized into an m × n matrix, with m representing the number of genome fragments and n representing the number of query experiments.
-
The matrix is entered into the R package edgeR, and the difference in read counts between the two sets of query experiments is assessed for each genome fragment using the standard algorithm used for detecting differentially expressed genes in comparative transcriptome analysis.
-
The outcomes are further summarized and documented in BED format, which includes coordinates of the genome fragment in columns 1–3 and corresponding intergroup statistical values in columns 4 and beyond. Our approach refers to the R package DiffBind but differs in that we do not utilize the BAM files required for directly applying DiffBind.
-
Methylation level data (BigWig) for Bisulfite-seq recorded in ChIP-Atlas are identified based on submitted IDs and used for comparative analysis.
-
BigWig formats were converted to bedGraph format for each query experiment while detecting DMRs from Bisulfite-seq datasets.
-
A DMR detector named metilene (PMID: 26631489); in particular,
metilene_input.pl
* provided by metilene is used to aggregate methylation rates per genomic base for each query experiment, using the bedGraph of the query experiment generated in the prior step as input. -
The resulting TSV file is used as the input into the main
metilene
* program, which returns DMRs along with statistics such as mean methylation differences and Q-values in a BED format.
- * Default parameters are applied when executing both the
metilene_input.pl
script and themetilene
command.
Once the calculations are complete, the results (ZIP format) could be downloaded from the results page.
.zip
: Compressed file consisting of the following files.
-
.igv.xml
: IGV session file to display DPRs or DMRs together with the original data to be compared. -
.log
: Analysis log to identify DPRs or DMRs. -
.bed
: DPRs or DMRs in BED9 format suitable for further analysis. -
.igv.bed
: DPRs or DMRs in BED9 + GFF3 format suitable for browsing on IGV
An API is available to perform Diff Analysis programmatically. Please see here for the details.
All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas are described in experimentList.tab (Download, Table schema)
-
BigWig Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bw/Experimental_ID.bw
Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bw/SRX097088.bw
-
Peak-call (BED) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bedThreshold/Experimental_ID.Threshold.bed (Threshold = 05, 10, or 20)
Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bed05/SRX097088.05.bed (Peak-call data of SRX097088 with Q-value < 1E-05.)
-
Peak-call (BigBed) Download URL: https://chip-atlas.dbcls.jp/data/Genome/eachData/bbThreshold/Experimental_ID.Threshold.bb (Threshold = 05, 10, or 20)
Example: https://chip-atlas.dbcls.jp/data/hg19/eachData/bb05/SRX097088.05.bb (Peak-call data of SRX097088 with Q-value < 1E-05.)
Download URL: https://chip-atlas.dbcls.jp/data/Genome/assembled/File_name.bed (Genome and File_name are listed in fileList.tab [Download, Table schema])
Example: https://chip-atlas.dbcls.jp/data/hg19/assembled/Oth.ALL.05.GATA2.AllCell.bed (All peak-call data of GATA2 in all cell types with Q-value < 1E-05.)
Note: As the file size of the assembled peak-call data used in “Peak Browser” is very huge, we recommend you to download the lighter version of all peak-call data (see below URLs and table schema), and to join the SRXs with the sample metadata described in experimentList.tab (Download, Table schema) on a command-line interface.
-
Download the lighter version of all peak-call data (Q: MACS2 Q-value thresholds).
Genome Q < 1E-05 Q < 1E-10 Q < 1E-20 Q < 1E-50 WGBS hg38 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ hg19 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ mm10 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ mm9 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ rn6 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ dm6 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ dm3 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ ce11 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ ce10 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎ sacCer3 ⬇︎ ⬇︎ ⬇︎ ⬇︎ ⬇︎
-
Table schema of the lighter version of all peak-call data:
Column Description Example Column 1 Chromosome chr12 Column 2 Begin 1234 Column 3 End 5678 Column 4 SRX SRX344646 Column 5 -10Log10(MACS2 Q-value) 345
Download URL: https://chip-atlas.dbcls.jp/data/Genome/target/Protein.Distance.tsv (Proteins are listed in analysisList.tab [Download, Table schema]) (Distance = 1, 5, or 10, indicating the distance [kb] from TSS.)
Example: https://chip-atlas.dbcls.jp/data/hg19/target/POU5F1.5.tsv (TSV file describing the genes bound by POU5F1 at TSS ± 5 kb.)
Download URL: https://chip-atlas.dbcls.jp/data/Genome/colo/Protein.Cell_type_class.tsv (Protein and Cell_type_class are listed in analysisList.tab [Download, Table schema])
Example:
https://chip-atlas.dbcls.jp/data/hg19/colo/POU5F1.Pluripotent_stem_cell.tsv
(TSV file describing the proteins colocalizing with POU5F1 in Pluripotent
stem
cell.)
(Spaces
in the name of cell type class must be replaced with underscores _
.)
- experimentList.tab (Download) All ChIP-seq, ATAC-seq, DNase-seq, and Bisulfite-seq experiments recorded in ChIP-Atlas.
Column | Description | Example |
---|---|---|
1 | Experimental ID (SRX, ERX, DRX) | SRX097088 |
2 | Genome assembly | hg19 |
3 | Track type class | TFs and others |
4 | Track type | GATA2 |
5 | Cell type class | Blood |
6 | Cell type | K-562 |
7 | Cell type description | Primary Tissue=Blood|Tissue Diagnosis=Leukemia Chronic Myelogenous |
8 | Processing logs of ChIP, ATAC, DNase-seq (# of reads, % mapped, % duplicates, # of peaks [Q < 1E-05]) | 30180878,82.3,42.1,6691 |
8 | Processing logs of Bisulfite-seq (# of reads, % mapped, × coverage, # of hyper MR) | 132179672,88.1,3.4,311292 |
9 | Title | GSM722415: GATA2 K562bmp r1 110325 3 |
10- | Meta data submitted by authors | source_name=GATA2 ChIP-seq K562 BMP |
cell line=K562 | ||
chip antibody=GATA2 | ||
antibody catalog number=Santa Cruz SC-9008 |
- fileList.tab (Download) All assembled peak-call data used in Peak Browser.
Column | Description | Example |
---|---|---|
1 | File name | Oth.ALL.05.GATA2.AllCell |
2 | Genome assembly | hg19 |
3 | Track type class | TFs and others |
4 | Track type | GATA2 |
5 | Cell type class | All cell types |
6 | Cell type | - |
7 | Threshold | 05 (indicating Q-value < 1E-05) |
8 | Experimental IDs included | SRX070877,SRX150427,SRX092303,SRX070876,SRX150668,... |
- analysisList.tab (Download) All proteins shown in “Target Genes” and “Colocalization”.
Column | Description | Example |
---|---|---|
1 | Antigen | POU5F1 |
2 | Cell type class in Colocalization | Epidermis,Pluripotent stem cell |
3 | Recorded (+) or not (-) in Target Genes | + |
4 | Genome assembly | hg19 |
- antigenList.tab (Download) All track types recorded in ChIP-Atlas.
Column | Description | Example |
---|---|---|
1 | Genome assembly | hg19 |
2 | Track type class | TFs and others |
3 | Track type | POU5F1 |
4 | Number of experiments | 24 |
5 | Experimental IDs included | SRX011571,SRX011572,SRX017276,SRX021069,SRX021070,... |
- celltypeList.tab (Download) All cell types recorded in ChIP-Atlas.
Column | Description | Example |
---|---|---|
1 | Genome assembly | hg19 |
2 | Cell type class | Prostate |
3 | Cell type | VCaP |
4 | Number of experiments | 185 |
5 | Experimental IDs included | SRX020917,SRX020918,SRX020919,SRX020920,SRX020921,... |
BigBed and BigWig format files in ChIP-Atlas database are now able to be browsed on UCSC Genome Browser. Use links below to jump to UCSC Genome Browser.
- genome.ucsc.edu
- genome-asia.ucsc.edu (asian mirror) Currently track hub feature is only provided based on files for each individual experiment, but we are working on to browse files assembled by antigen and cell types. See Using UCSC Genome Browser Track Hubs for more details.