All reference files should be pre-processed with ref.py
, as explained in the Manual.
Organism | Technology | Coverage | Uncompressed size |
Link | Reference |
---|---|---|---|---|---|
E.coli | Illumina MiSeq | 420x | 5.2 GB | MiSeq_Ecoli_DH10B_110721_PF 1 (1.3 GB) | CP000948 |
H.sapiens | IonTorrent | 0.6x | 5.6 GB | sample-2-10_sorted (1.4 GB) | Homo_sapiens_assembly19 |
H.sapiens | Illumina HiSeq | 2x | 20 GB | 9827_2#49 (6.1 GB) | hs37d5 |
D.melangoster | PacBio | 75x | 29 GB | dm3PacBio (12 GB) | dm3 |
H.sapiens | RNASeq | 6x | 71 GB | K562_cytosol_LID8465_TopHat_v (12.8 GB) | hg19 2 |
H.sapiens | PacBio | 15x | 118 GB | NA12878.pacbio.bwa-sw.20140202 (53.8 GB) | hs37d5 |
H.sapiens | Illumina-like Cancer Cell | 30x | 398 GB | HCC1954.mix1.n80t20 3 (122.5 GB) | Homo_sapiens_assembly19 |
H.sapiens | Illumina HiSeq | 50x | 549 GB | NA12878_S1 (113.3 GB) | hg19 |
- All BAM files must be decompressed via
samtools view -h <bam> -o <sam>
. - You can concatenate separate chromosome files into a large FASTA file via
cat chr*.fa > hg19.fa
. - You need GeneTorrent to download this sample. After you obtain it, you can fetch this particular sample via
gtdownload -vv -c https://cghub.ucsc.edu/software/downloads/cghub_public.key -d 360b4736-6c5e-48df-af58-c1cf51609350
.
Organism | Technology | Paired Coverage | Uncompressed size | Link | Reference |
---|---|---|---|---|---|
P.aeruginosa | Illumina GAIIx | 50x | 1 GB | SRR554369_1 1 (119 MB) SRR554369_2 (120 MB) |
NC_002516.2 |
E.coli | PacBio | 140x | 1.3 GB | SRR1284073 3 (2.2 GB) | Arabidopsis |
H.sapiens gut | Illumina GAII | Unknown | 3.6 GB | MH0001_081026_clean.1 2 (478 MB) MH0001_081026_clean.2 (550 MB) |
hg19 |
S.cerevisiae | Illumina GAII | 175x | 7.7 GB | SRR327342_1 (792 MB) SRR327342_2 (947 MB) |
ACFL01000033 |
T.cacao | Illumina GAIIx | 35x | 39 GB | SRR870667_1 (5.2 GB) SRR870667_2 (4.0 GB) |
Cacao |
H.sapiens | Illumina HiSeq | 13x | 102 GB | ERR174310_1 (17.3 GB) ERR174310_2 (16.8) |
hg19 |
H.sapiens | Illumina HiSeq | 120x (single-end) | 887 GB | ERR174324 (4) (17.5 GB) ERR174325 (16.7 GB) ERR174326 (16.3 GB) ERR174327 (16.3 GB) ERR174328 (16.3 GB) ERR174329 (16.3 GB) ERR174330 (16.1 GB) ERR174331 (17.3 GB) ERR174332 (15.7 GB) ERR174333 (15.4 GB) ERR174334 (15.6 GB) ERR174335 (15.6 GB) ERR174336 (15.9 GB) ERR174337 (16.0 GB) ERR174338 (16.0 GB) ERR174339 (15.6 GB) ERR174340 (11.2 GB) ERR174341 (14.8 GB) Total 284.6 GB |
hg19 |
- All bzip2 files must be decompressed via
bzip2 -d <bz> -c > <fastq>
. - All Gzip files must be decompressed via
gzip -d <gz> -c > <fastq>
. - You need NCBI SRA Toolkit to download this sample. After you obtain it, you can fetch this particular sample via
fastq-dump SRR1284073
. - For this sample, only first library mate (
_1
) files are used.
Coverage was calculated by dividing a total number of nucleotides in SAM or FASTQ file with the rounded reference genome size. These numbers are not intended to be exact, but more as a rough estimate of a coverage in the given sample.
The following reference genome sizes were used:
Organism | Size |
---|---|
H.sapiens | 3,100,000,000 |
T.cacao | 345,000,000 |
D.melangoster | 168,000,000 |
S.cerevisiae | 12,000,000 |
P.aeruginosa | 6,300,000 |
E.coli | 4,700,000 |