Only the first 100,000 reads analyzed? #64

smb20200615 · 2020-11-14T02:21:57Z

Is it true that FastQC only analyzes the first 100,000 reads? Is this true of all versions of the tool? Also is this true for all the metrics (adapter content, quality of reads, etc) that are reported.

s-andrews · 2020-11-16T09:04:47Z

No, not really. It's only true for the modules which deal with duplication (so duplication and overrepresented sequences), where we have to store the sequences which have been seen, and if you don't put a limit on that you end up having to (potentially) store every sequence in the file, which goes badly when you have a big diverse dataset.

Even then, we still look at all of the data - we just take the first 100,000 different sequences and track them through the entire dataset and then extrapolate to get the overall expected duplication figures for the whole dataset.

Modules like the quality and composition use all reads.

smb20200615 · 2020-12-07T21:05:30Z

Thank you so much for the clarification. Also am I correct in that only a limited set of adapters will be searched but than one can provide additional adapter sequences that will be searched against the library?

s-andrews · 2020-12-08T08:42:42Z

Yes, you can add new sequences to Configuration/contaminant_list.txt if you have more sequences you want to search against.

s-andrews closed this as completed Nov 16, 2020

tamuanand mentioned this issue Oct 2, 2023

[Question]; How is % duplication calculated compared to FastQC smithlabcode/falco#54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only the first 100,000 reads analyzed? #64

Only the first 100,000 reads analyzed? #64

smb20200615 commented Nov 14, 2020 •

edited

Loading

s-andrews commented Nov 16, 2020

smb20200615 commented Dec 7, 2020

s-andrews commented Dec 8, 2020

Only the first 100,000 reads analyzed? #64

Only the first 100,000 reads analyzed? #64

Comments

smb20200615 commented Nov 14, 2020 • edited Loading

s-andrews commented Nov 16, 2020

smb20200615 commented Dec 7, 2020

s-andrews commented Dec 8, 2020

smb20200615 commented Nov 14, 2020 •

edited

Loading