Improve run reports for rotary #157

LeeBergstrand · 2024-04-19T22:47:06Z

No description provided.

LeeBergstrand · 2024-04-19T23:01:19Z

@jmtsuji Can you update the description for this with more explanation of what to do?

jmtsuji · 2024-04-22T03:18:44Z

Problem description

The run summary report currently generated by rotary is very basic. To understand what was done to each contig assembled by rotary and why (e.g., was the contig filtered by coverage, and why; how aggressively was it polished), a user currently needs to dig through several log files to find important information. It would be nice if we could generate more informative summary reports for rotary that collect this information for the user. Collecting this information for the user will also allow the user (and us, in end-to-end testing) to sanity-check the performance of rotary.

Proposed solution

I think the following types of reports could be helpful to create:

Read QC report: summary of % reads retained vs. discarded during short and long read polishing
Assembly report: the report already made by Flye could be used for this
Contig stats report from assembly, polishing, and circularization: could include per-contig info on: contig length, whether it is circular, end repair status, changed bases during read polishing (if not too hard to add), coverage filtration status, and whether it was rotated to a start gene like dnaA
- the current contig report generated by rotary (at "{sample}/stats/{sample}_contig_info.tsv") already kind of accomplishes this, but it only includes a simple Yes/No summary for each contig of whether it is circular, it was end-repaired successfully, and was retained during coverage filtration. More info could be added to this file to make a better contig report.
Optional: Contig rotation report, to confirm that each contig was rotated to a new start position and then re-polished at least once during the pipeline. (It might actually be better to just confirm this by internal logic... a user might not need to see a report of this.)

After these reports are created, the Post-run tips section at the end of the README can be deleted, because this section summarizes which log files to manually check to obtain the above information.

Possible caveats

Because rotary has a lot of options that can be turned on or off (e.g., short read polishing; users can also just run up until the end of one module), we will need to consider how to make the reports modular so that the user can get reports at a variety of run endpoints. Rather than waiting to summarize all run info until the annotation module, we might need to add report summary rules within each module that summarize the information currently available. The next module can then take the run report from the previous module and add to it. I kind of do this already with "{sample}/stats/{sample}_contig_info.tsv"

jmtsuji · 2024-04-22T03:19:17Z

@jmtsuji Can you update the description for this with more explanation of what to do?

@LeeBergstrand Done! Thanks for breaking up the to-do file into individual issues.

jmtsuji · 2024-04-23T02:24:22Z

Quick update about the formats of the reports

To start, I was thinking that TSV files might be the most straightforward way to summarize the results from the different analysis steps. In general, one report could be made for each major bullet point above, with some exceptions (e.g., two reports might work better for read QC: one report for short reads and one report for long reads). In the longer term, we could consider making HTML reports with embedded tables and/or plots to summarize the rotary results (e.g., using info from the TSV files), if it's not too difficult to do this. I personally am happy with TSV files, but I wonder if users of the published version of the tool might appreciate a slightly more polished HTML report... @LeeBergstrand what are your thoughts?

LeeBergstrand · 2024-04-23T07:51:15Z

Quick update about the formats of the reports

To start, I was thinking that TSV files might be the most straightforward way to summarize the results from the different analysis steps. In general, one report could be made for each major bullet point above, with some exceptions (e.g., two reports might work better for read QC: one report for short reads and one report for long reads). In the longer term, we could consider making HTML reports with embedded tables and/or plots to summarize the rotary results (e.g., using info from the TSV files), if it's not too difficult to do this. I personally am happy with TSV files, but I wonder if users of the published version of the tool might appreciate a slightly more polished HTML report... @LeeBergstrand what are your thoughts?

@jmtsuji I want to use off-the-shelf tools to do much of the stats and make the HTML reports. For example, most of the reports ATLAS makes could have been generated by third-party tools and aggregated by MultiQC. This leads to less maintenance and its less likely that the reports will become broken.

jmtsuji · 2024-04-23T08:20:39Z

@LeeBergstrand Agreed - using off-the-shelf tools for both stats (as much as possible) and HTML reports sounds like a good idea for maintenance etc.. Relevant for #91

LeeBergstrand mentioned this issue Apr 19, 2024

Task list for rotary #15

Closed

11 tasks

jmtsuji changed the title ~~Add a report/summary for key steps in the run so that all the log files do not need to be checked manually (as currently described in the README)~~ Improve run reports for rotary Apr 22, 2024

jmtsuji mentioned this issue Apr 23, 2024

Report read QC stats #91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve run reports for rotary #157

Improve run reports for rotary #157

LeeBergstrand commented Apr 19, 2024

LeeBergstrand commented Apr 19, 2024

jmtsuji commented Apr 22, 2024

jmtsuji commented Apr 22, 2024

jmtsuji commented Apr 23, 2024

LeeBergstrand commented Apr 23, 2024 •

edited

Loading

Quick update about the formats of the reports

jmtsuji commented Apr 23, 2024

Improve run reports for rotary #157

Improve run reports for rotary #157

Comments

LeeBergstrand commented Apr 19, 2024

LeeBergstrand commented Apr 19, 2024

jmtsuji commented Apr 22, 2024

Problem description

Proposed solution

Possible caveats

jmtsuji commented Apr 22, 2024

jmtsuji commented Apr 23, 2024

Quick update about the formats of the reports

LeeBergstrand commented Apr 23, 2024 • edited Loading

Quick update about the formats of the reports

jmtsuji commented Apr 23, 2024

LeeBergstrand commented Apr 23, 2024 •

edited

Loading