Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No results for Kmer content module in HTML and fastqc_data.txt #15

Closed
RoadRoller opened this issue Aug 27, 2021 · 4 comments
Closed

No results for Kmer content module in HTML and fastqc_data.txt #15

RoadRoller opened this issue Aug 27, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@RoadRoller
Copy link

Falco version is 0.2.4. Kmer module turned on in configuration by changing line in limits.txt to:

kmer 				ignore 		0

Falco indicates that Kmer content FAIL, but there are no Kmer content in fastqc_data.txt:

>>Kmer Content	fail
#Sequence	Count	PValue	Obs/Exp Max	Max Obs/Exp Position
>>END_MODULE

In HTML file section not reported also:
image

For the same fastq file FastQC display Kmer content module as expected.
image

@guilhermesena1
Copy link
Collaborator

Hello,

Would you be able to help me reproduce the error on my end? Specifically, could you confirm

(1) if you cloned, used the release or installed from Conda
(2) could you tell me the exact command you used to run falco, and what it printed in the output and
(3) if at all possible, could you share the first 40,000 lines of your input FASTQ file?

thank you!

@RoadRoller
Copy link
Author

Hello,
thank you for response!
(1) I used the release and install it according to README.md with commands
$ ./configure CXXFLAGS="-O3 -Wall"
$ make all
$ make install
(2) exact command is
$ falco -l Configuration/limits.txt test.fastq
(before run command I turn on kmer module in limits.txt)
(3) this is config and head of fastq
limits.txt
test.fastq.gz

@guilhermesena1 guilhermesena1 added the bug Something isn't working label Sep 8, 2021
@guilhermesena1
Copy link
Collaborator

guilhermesena1 commented Sep 9, 2021

So essentially the k-mer content functionality wasn't implemented at all! The data was being collected but not summarized properly. I went through the whole code and tested in your dataset. The fix on e115b55 should at least report and plot the k-mer contents, but it's not fully exactly like fastqc yet, specifically, I haven't implemented p-value calculations yet because I'll have to dig through the java BinomialDistribution package to know how to do it and it will take a little longer.

That said, I think log(p-values) are a monotonic function of the obs/exp ratio. In the current implementation I am sorting k-mers by obs/exp ratio and reporting the top ones (top 20 on fastqc_data, top 10 on the plots), which in most practical cases should be identical to the ones with lowest p-values.

If you use this at all in your data I'll be happy to know if this is working.

EDIT: Made some changes in the code on 7426ae2 to match FastQC behavior, it's also faster because we only check for k-mers once every 50 reads, like FastQC

@RoadRoller
Copy link
Author

Thank you! We will try 0.3.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants