Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqtk sample gives empty output #145

Closed
Gil-marquez opened this issue Aug 8, 2019 · 3 comments
Closed

seqtk sample gives empty output #145

Gil-marquez opened this issue Aug 8, 2019 · 3 comments

Comments

@Gil-marquez
Copy link

I've been using seqtk sample to randomly subsample fastq files with about 75 M reads.
I tried to subsample 5M, 10M, 15M,..., 30M, 35M,... 70M reads but at the time the 35M reads subsampling comes, the outputs began to be empty. Does anyone know what could be the problem?

seqtk sample -s12 H3K27-1.fastq 10000000 > subsampling/H3K27-1.10M.fastq
seqtk sample -s14 H3K27-1.fastq 15000000 > subsampling/H3K27-1.15M.fastq
seqtk sample -s16 H3K27-1.fastq 20000000 > subsampling/H3K27-1.20M.fastq
seqtk sample -s18 H3K27-1.fastq 25000000 > subsampling/H3K27-1.25M.fastq
seqtk sample -s20 H3K27-1.fastq 30000000 > subsampling/H3K27-1.30M.fastq
seqtk sample -s22 H3K27-1.fastq 35000000 > subsampling/H3K27-1.35M.fastq
seqtk sample -s24 H3K27-1.fastq 40000000 > subsampling/H3K27-1.40M.fastq
seqtk sample -s26 H3K27-1.fastq 45000000 > subsampling/H3K27-1.45M.fastq
seqtk sample -s28 H3K27-1.fastq 50000000 > subsampling/H3K27-1.50M.fastq
seqtk sample -s30 H3K27-1.fastq 55000000 > subsampling/H3K27-1.55M.fastq
seqtk sample -s32 H3K27-1.fastq 60000000 > subsampling/H3K27-1.60M.fastq
seqtk sample -s32 H3K27-1.fastq 65000000 > subsampling/H3K27-1.65M.fastq
seqtk sample -s32 H3K27-1.fastq 70000000 > subsampling/H3K27-1.70M.fastq

I also tried changing the seed value, but nothing changes in the output.

@tseemann
Copy link

tseemann commented Oct 18, 2019

@Gil-marquez i had a look at the code, and it seems it needs to allocate RAM for N sequence structs when you want to sample N records. I suspece you might be running out of RAM.

Use the -2 option to do 2-pass mode instead.

If this works, please close this issue. Thanks!

Usage:   seqtk sample [-2] [-s seed=11] <in.fa> <frac>|<number>

Options: -s INT       RNG seed [11]
         -2           2-pass mode: twice as slow but with much reduced memory

@Acribbs
Copy link

Acribbs commented Feb 20, 2020

FYI I had a similar issue and @tseemann solution fixed my issue

@tseemann
Copy link

@Gil-marquez please close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants