Find file
6693c72 May 17, 2016
@nellore @mikelove
37 lines (30 sloc) 2.61 KB


intropolis is a list of exon-exon junctions found across 21,504 human RNA-seq samples on the Sequence Read Archive (SRA) from spliced read alignment to hg19 with Rail-RNA. Two files are provided:

A. intropolis.v1.hg19.tsv.gz : a 6.6-GB gzipped TSV (18.3 GB uncompressed) with fields

  1. chromosome
  2. intron start position (1-based; inclusive)
  3. intron end position (1-based; inclusive)
  4. strand (+ or -)
  5. donor dinucleotide (e.g., GT)
  6. acceptor dinucleotide (e.g., AG)
  7. comma-separated list of indexes of samples in which junction was found
  8. comma-separated list of corresponding numbers of reads mapping across junction in samples from field 7

B. intropolis.idmap.v1.hg19.tsv : a small TSV with fields

  1. sample index used in field 7 of intropolis.v1.hg19.tsv.gz
  2. SRA project accession number
  3. SRA sample accession number
  4. SRA experiment accession number
  5. SRA run accession number

Metadata on SRA specifying e.g. tissue and cell type is incomplete and does not have a controlled vocabulary. Some is available in this file derived from the fantastic SRAdb R package by Jack Zhu and Sean Davis. Still more metadata taken from Biosample is available in this file. But probably the best effort to infer metadata for SRA RNA-seq (with a controlled vocabulary for tissues!) is SHARQ, by Darya Filippova while in Carl Kingsford's group.

Expect new versions of intropolis spanning more samples as they are added to SRA. If you use intropolis, cite Human splicing diversity across the Sequence Read Archive, by