seqtk sample not working as expected #193

antoine4ucsd · 2022-06-09T16:09:51Z

Hello
I am trying to subsample fastq.gz file but not sure if it really works as expected above a given limit.

my source file contains 150k reads

awk '{s++}END{print s/4}' ./BA922J_barcode16_run5_merged.fastq.gz
150626

but when trying to subset:

seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 > BA922J_10000.gz
seqtk sample -s100 BA922Jl_barcode16_run5_merged.fastq.gz 40000 > BA922J_40000.gz
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 60000 > BA922J_60000.gz
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 80000 > BA922J_80000.gz
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 100000 > BA922J_100000.gz

then the file size is plateauing...

-rwxrwxrwx  1  staff    82M Jun  9 08:36 BA922J_10000.gz
-rwxrwxrwx  1  staff   314M Jun  9 08:36 BA922J_40000.gz
-rwxrwxrwx  1  staff   314M Jun  9 08:36 BA922J_60000.gz
-rwxrwxrwx  1  staff   314M Jun  9 08:36 BA922J_80000.gz
-rwxrwxrwx  1  staff   314M Jun  9 08:37 BA922J_100000.gz

also need to make sure this is not resampling the same reads. can you confirm (for example if I set the sample to 200k)

not sure what I am doing wrong...
thank you!

The text was updated successfully, but these errors were encountered:

shenwei356 · 2022-11-18T04:41:05Z

What's the size of the original file? And also, check the number of reads with seqkit stats.

seqkit stats -j 10 *.gz

PS: The command below does not output gzip format.

seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 > BA922J_10000.gz

# this does
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 | pigz -c > BA922J_10000.gz

antoine4ucsd · 2022-11-18T15:24:16Z

thank you. good catch for the typo in the cmd line.

SplitInf mentioned this issue Nov 18, 2022

Problem with seqtk sample #199

Closed

antoine4ucsd closed this as completed Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seqtk sample not working as expected #193

seqtk sample not working as expected #193

antoine4ucsd commented Jun 9, 2022

shenwei356 commented Nov 18, 2022

antoine4ucsd commented Nov 18, 2022

seqtk sample not working as expected #193

seqtk sample not working as expected #193

Comments

antoine4ucsd commented Jun 9, 2022

shenwei356 commented Nov 18, 2022

antoine4ucsd commented Nov 18, 2022