Data Repository

All reference files should be pre-processed with ref.py, as explained in the Manual.

SAM Samples

Organism	Technology	Coverage	Uncompressed size	Link	Reference
E.coli	Illumina MiSeq	420x	5.2 GB	MiSeq_Ecoli_DH10B_110721_PF¹ (1.3 GB)	CP000948
H.sapiens	IonTorrent	0.6x	5.6 GB	sample-2-10_sorted (1.4 GB)	Homo_sapiens_assembly19
H.sapiens	Illumina HiSeq	2x	20 GB	9827_2#49 (6.1 GB)	hs37d5
D.melangoster	PacBio	75x	29 GB	dm3PacBio (12 GB)	dm3
H.sapiens	RNASeq	6x	71 GB	K562_cytosol_LID8465_TopHat_v (12.8 GB)	hg19²
H.sapiens	PacBio	15x	118 GB	NA12878.pacbio.bwa-sw.20140202 (53.8 GB)	hs37d5
H.sapiens	Illumina-like Cancer Cell	30x	398 GB	HCC1954.mix1.n80t20³ (122.5 GB)	Homo_sapiens_assembly19
H.sapiens	Illumina HiSeq	50x	549 GB	NA12878_S1 (113.3 GB)	hg19

All BAM files must be decompressed via samtools view -h <bam> -o <sam>.
You can concatenate separate chromosome files into a large FASTA file via cat chr*.fa > hg19.fa.
You need GeneTorrent to download this sample. After you obtain it, you can fetch this particular sample via gtdownload -vv -c https://cghub.ucsc.edu/software/downloads/cghub_public.key -d 360b4736-6c5e-48df-af58-c1cf51609350.

FASTQ Samples

Organism	Technology	Paired Coverage	Uncompressed size	Link	Reference
P.aeruginosa	Illumina GAIIx	50x	1 GB	SRR554369_1¹ (119 MB) SRR554369_2 (120 MB)	NC_002516.2
E.coli	PacBio	140x	1.3 GB	SRR1284073³ (2.2 GB)	Arabidopsis
H.sapiens gut	Illumina GAII	Unknown	3.6 GB	MH0001_081026_clean.1² (478 MB) MH0001_081026_clean.2 (550 MB)	hg19
S.cerevisiae	Illumina GAII	175x	7.7 GB	SRR327342_1 (792 MB) SRR327342_2 (947 MB)	ACFL01000033
T.cacao	Illumina GAIIx	35x	39 GB	SRR870667_1 (5.2 GB) SRR870667_2 (4.0 GB)	Cacao
H.sapiens	Illumina HiSeq	13x	102 GB	ERR174310_1 (17.3 GB) ERR174310_2 (16.8)	hg19
H.sapiens	Illumina HiSeq	120x (single-end)	887 GB	ERR174324 ⁽⁴⁾ (17.5 GB) ERR174325 (16.7 GB) ERR174326 (16.3 GB) ERR174327 (16.3 GB) ERR174328 (16.3 GB) ERR174329 (16.3 GB) ERR174330 (16.1 GB) ERR174331 (17.3 GB) ERR174332 (15.7 GB) ERR174333 (15.4 GB) ERR174334 (15.6 GB) ERR174335 (15.6 GB) ERR174336 (15.9 GB) ERR174337 (16.0 GB) ERR174338 (16.0 GB) ERR174339 (15.6 GB) ERR174340 (11.2 GB) ERR174341 (14.8 GB) Total 284.6 GB	hg19

All bzip2 files must be decompressed via bzip2 -d <bz> -c > <fastq>.
All Gzip files must be decompressed via gzip -d <gz> -c > <fastq>.
You need NCBI SRA Toolkit to download this sample. After you obtain it, you can fetch this particular sample via fastq-dump SRR1284073.
For this sample, only first library mate (_1) files are used.

Coverage calculation

Coverage was calculated by dividing a total number of nucleotides in SAM or FASTQ file with the rounded reference genome size. These numbers are not intended to be exact, but more as a rough estimate of a coverage in the given sample.

The following reference genome sizes were used:

Organism	Size
H.sapiens	3,100,000,000
T.cacao	345,000,000
D.melangoster	168,000,000
S.cerevisiae	12,000,000
P.aeruginosa	6,300,000
E.coli	4,700,000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samples.md

samples.md

Data Repository

SAM Samples

FASTQ Samples

Coverage calculation

Files

samples.md

Latest commit

History

samples.md

File metadata and controls

Data Repository

SAM Samples

FASTQ Samples

Coverage calculation