Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only the first 100,000 reads analyzed? #64

Closed
smb20200615 opened this issue Nov 14, 2020 · 3 comments
Closed

Only the first 100,000 reads analyzed? #64

smb20200615 opened this issue Nov 14, 2020 · 3 comments

Comments

@smb20200615
Copy link

smb20200615 commented Nov 14, 2020

Is it true that FastQC only analyzes the first 100,000 reads? Is this true of all versions of the tool? Also is this true for all the metrics (adapter content, quality of reads, etc) that are reported.

@s-andrews
Copy link
Owner

No, not really. It's only true for the modules which deal with duplication (so duplication and overrepresented sequences), where we have to store the sequences which have been seen, and if you don't put a limit on that you end up having to (potentially) store every sequence in the file, which goes badly when you have a big diverse dataset.

Even then, we still look at all of the data - we just take the first 100,000 different sequences and track them through the entire dataset and then extrapolate to get the overall expected duplication figures for the whole dataset.

Modules like the quality and composition use all reads.

@smb20200615
Copy link
Author

Thank you so much for the clarification. Also am I correct in that only a limited set of adapters will be searched but than one can provide additional adapter sequences that will be searched against the library?

@s-andrews
Copy link
Owner

Yes, you can add new sequences to Configuration/contaminant_list.txt if you have more sequences you want to search against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants