-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on fastq data #9
Comments
Hello, Thank you so much for reporting the bug and taking the time to create a small test set that allows us to reproduce the segfault. With that I was able to find a problem in datasets with uneven read length distribution: When a longer read is processed, sometimes the tile matrix was not long enough to store the tile quality data for the new read. Because this is a very serious problem, I have decided to add a new release for falco that addresses this fix + some of the fixes brought up by previous issues. If at all possible, I'd truly appreciate it if you are able to test the new release and follow up on whether you are able to run it in your full test dataset. The new release can be found in the releases page. Thank you in advance! |
Happy to have helped diagnose a problem. I will be testing it on the dataset that originally produced the error I uploaded and many more. I will follow up with results. Thank you for being extremely punctual in addressing this issue. |
I've tested the new release on the same dataset. It completes the QC for the data. However, there is an issue with one of the HTML reports. The data output text file looks correct to me, but the HTML file doesn't display the visual component of most of the reports, only the grade. I am attaching the data and the report generated from it. As an aside - there are a few quality of life issues I've noticed.
Thank you, Kenji |
Hi Kenji, My sincerest apologies for not reaching out sooner. I was unable to reproduce the error on several fastqs, but I know from the report what is breaking it: Your fastq has varying read lengths, and some tiles don't have reads that reach a certain length, which was causing a division by zero that broke the html. I pushed a fix to the main repository if you are still interested, and I'll try to create some examples that may help me guarantee the issue has been addressed. Thank you as well for bringing the other two issues to my attention. The threads argument is a dummy argument for compatibility with fastqc, which is currently not being used. Other users brought this up and I understand the confusion, so I will update the default message to emphasize that this flag is currently being ignored, and create another issue for the multiple output case, which indeed is not faithfully replicating fastqc behavior. |
Oh, no problem. I was going to open another issue to specifically address the html report issue, but a fix ahead of time works, too. And I have the datasets that were generating the problem here, too, so I'll attach that, Thank you for addressing these, I appreciate it. |
Perfect! Using this I think I can confirm that the commit I referenced above should displays the html and the output is consistent with fastqc, with empty tiles showing "average" (0 deviations from the mean) tile quality. I also pushed fixes to the outputs overriding each other when multiple files are provided and a temporary warning on the readme that the threads flag is currently ignored. Thank you once again for the very detailed feedback on the problems and for providing the data that made debugging so much simpler. |
I'm more than happy to help. I'm with the Konstantinidis lab at GT, and we're in the process of replacing fastqc in one our software developments with Falco, so we've been testing it on a smorgasbord of data and situations. I do have a quick question - I've written a wrapper script around Falco for one of my own projects, and it relies on the naming structure that Falco previously had. If a user only submits one file to Falco, will the resulting names be the same as in previous versions? Or is there a new naming convention regardless of how many files are given to Falco at once? |
Yes, when a single file is provided the names will be the same as before, so if you run
it will generate If the user runs
it will generate |
Great! Then my script should continue working with the new version. Thank you for your work and responsiveness on all of this. I really appreciate it. Looking forward to testing the new version when it's ready for release. |
I've encountered a segfault occurring with some data I was trimming. Falco works fine on the untrimmed data, but fails on the post-trim results.
I was trying to narrow down the break to a single read, but I've encountered a strange result - the data attached is a collection of 16 reads, broken into two groupings of 8. Oddly, Falco works on either grouping of 8, but not on the concatenated 16. Even more strangely, it works on all 16 reads when I change the ordering of the reads in the file.
segfault_fq.zip
The text was updated successfully, but these errors were encountered: