We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello I am trying to subsample fastq.gz file but not sure if it really works as expected above a given limit.
my source file contains 150k reads
awk '{s++}END{print s/4}' ./BA922J_barcode16_run5_merged.fastq.gz 150626
but when trying to subset:
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 > BA922J_10000.gz seqtk sample -s100 BA922Jl_barcode16_run5_merged.fastq.gz 40000 > BA922J_40000.gz seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 60000 > BA922J_60000.gz seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 80000 > BA922J_80000.gz seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 100000 > BA922J_100000.gz
then the file size is plateauing...
-rwxrwxrwx 1 staff 82M Jun 9 08:36 BA922J_10000.gz -rwxrwxrwx 1 staff 314M Jun 9 08:36 BA922J_40000.gz -rwxrwxrwx 1 staff 314M Jun 9 08:36 BA922J_60000.gz -rwxrwxrwx 1 staff 314M Jun 9 08:36 BA922J_80000.gz -rwxrwxrwx 1 staff 314M Jun 9 08:37 BA922J_100000.gz
also need to make sure this is not resampling the same reads. can you confirm (for example if I set the sample to 200k)
not sure what I am doing wrong... thank you!
The text was updated successfully, but these errors were encountered:
What's the size of the original file? And also, check the number of reads with seqkit stats.
seqkit stats -j 10 *.gz
PS: The command below does not output gzip format.
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 > BA922J_10000.gz # this does seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 | pigz -c > BA922J_10000.gz
Sorry, something went wrong.
thank you. good catch for the typo in the cmd line.
No branches or pull requests
Hello
I am trying to subsample fastq.gz file but not sure if it really works as expected above a given limit.
my source file contains 150k reads
but when trying to subset:
then the file size is plateauing...
also need to make sure this is not resampling the same reads. can you confirm (for example if I set the sample to 200k)
not sure what I am doing wrong...
thank you!
The text was updated successfully, but these errors were encountered: