-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only the first 100,000 reads analyzed? #64
Comments
No, not really. It's only true for the modules which deal with duplication (so duplication and overrepresented sequences), where we have to store the sequences which have been seen, and if you don't put a limit on that you end up having to (potentially) store every sequence in the file, which goes badly when you have a big diverse dataset. Even then, we still look at all of the data - we just take the first 100,000 different sequences and track them through the entire dataset and then extrapolate to get the overall expected duplication figures for the whole dataset. Modules like the quality and composition use all reads. |
Thank you so much for the clarification. Also am I correct in that only a limited set of adapters will be searched but than one can provide additional adapter sequences that will be searched against the library? |
Yes, you can add new sequences to Configuration/contaminant_list.txt if you have more sequences you want to search against. |
Is it true that FastQC only analyzes the first 100,000 reads? Is this true of all versions of the tool? Also is this true for all the metrics (adapter content, quality of reads, etc) that are reported.
The text was updated successfully, but these errors were encountered: