Create a function that generates a report comparing results (e.g. between SCT versions) #1603

benjamindeleener · 2018-02-15T01:25:34Z

Description

When there is a new release or a new function, it would be good to generate a PDF report when using sct_pipeline that would allow comparing the results of the current functions with specific release or results.

For example, if one wants to test the results of a new version sct_deepseg_sc and compare them to an old version (let's say master), we can currently run the command below with both versions of SCT and gather and compare the results (using the pickle files that are generated).

sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2","-i t1/t1.nii.gz -c t1\"

This new tool would provide an easy way to generate a PDF report that compares two versions of SCT, by simply adding a parameter to sct_pipeline. Examples of commands could be:

sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare my_previous_results.pkl
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare master
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare v3.1.0
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare v3.1.0,master

The report should contain tables with the results for each subject as well as graphs that provide a quick visual assessment of the results (like violin plots).

Points to discuss:

Should we use sct_pipeline or should we create a new function?
What would be the best format to store the results? Pickle, CSV, H5, others?

The text was updated successfully, but these errors were encountered:

zougloub · 2018-02-15T14:49:44Z

My feedback:

Most methods implemented in SCT would have been developed agains some reference (eg. training/testing) data, so each one could have a validation script (take a look at Ensure SOPs for reproducing the downloadable data are available and could work #1604).
Obviously some results are subject to changes due to improvements in methods, so there wouldn't be a choice but use visual comparison. The "best" method to appreciate a method's performance would probably be different for each method though. For fast validation, 2D review would be a must.
Some code can be subjected to simple unit testing with boolean results.
Pickle is a bad idea in 99.97% of the applicable cases, and I wouldn't recommend it.
And IMHO we should also remove its current use in downloadable data.
sct_pipeline is very poorly documented and I'm not sure it has been stress-tested enough
Manually performing diffs on PDF can be done rather quickly with tools such as diffpdf, but there are a lot of differences that can just be due to PDF generation code changes, and it's not as fast as performing A/B comparison on bitmaps by swapping them.

benjamindeleener · 2018-02-15T15:17:22Z

@zougloub Thank you for your feedback.

I would like to stress out that this issue reflects my intention to offer a quick, incremental solution for comparing our methods on large testing with older/other versions of the software, based on existing tools in SCT (such as sct_pipeline). I do not intend to develop a completely new testing framework, that would take months of development.

Specific answers to your comments:

Most methods already have their own validation scripts, that generate their own validation metrics. For example, sct_register_to_template is being tested by test_sct_register_to_template that calculate several validation metrics specific for the registration of the template (i.e., Dice coefficients). When sct_pipeline is ran for a specific SCT function on a large dataset (multiple subjects), these validation metrics are calculated for each subject and saved in a Pandas dataframe.
The purpose of this PDF report is to visualize the changes on the metrics generated by sct_pipeline against previous/other versions of SCT. I agree that the best visualization of the changes could slightly differ for each method. However, violin plots are a good initial solution and would be enough for most methods.
The purpose of this issue is to assess the changes of high-level methods only, such as sct_propseg, sct_label_vertebrae, and sct_register_to_template.
Could you please recommend an alternative to Pickle?
I agree with you that sct_pipeline is not well documented. Again, the purpose of this issue is not to reinvent the wheel, but to offer a quick solution based on existing tools. I've been using sct_pipeline to assess the results of methods for a long time now, but we lack a comparison tool, which is the purpose of this issue.
I don't understand the last comment. As sct_pipeline provides a dataframe with the results for each version of SCT that is being testing, it is very easy to use Pandas tools to generated figures and comparison results. The PDF report is just a good way of presenting these results.

Here is an example of comparison results that were generated using sct_pipeline for the segmentation when using two different methods for detecting the spinal cord:
#1249 (comment)

As you can see, it is fairly easy to assess which methods is better than the other when comparing two versions of the software. The objective of this issue is simply to generate a PDF report that provide this kind of comparison results.

jcohenadad · 2018-02-16T01:48:14Z

@benjamindeleener I like the overall approach you describe, however i would not integrate the result management/visualisation inside sct_pipeline, which is already a fairly complicated function.

My suggestion would be to create a third party function, which takes as input the panda structure (output of sct_pipeline) and spits out a pdf. There could be more than 2 inputs if we want to compare 3 results for example.

So, something like:

sct_generate_report -i results_propseg_3.1.0.pkl,results_propseg_3.1.1.pkl -o report.pdf

This approach would also enable possible use of this function for other purposes, not necessarily specific to sct_pipeline.

Another advantage is that we could generate the report independently from running sct_pipeline, which takes a long time. Example: if we want red violin plots instead of green, we don't have to re-run the whole thing for 10 hours.

benjamindeleener · 2018-02-16T02:19:48Z

@jcohenadad Agreed. Let's do this!

zougloub · 2018-03-22T23:13:43Z

I concur.

As to "let's do this!" the thing is, in order to do something that's not a future liability, it would be good to be starting off from a solid base:

Identifying processing nodes, with their inputs/parameters and outputs. This will be helpful for speeding-up computations (Schedulers for parallel graph computation #1336). A process could have some kind of "pretend" mode where we get to know that, if the options result in dynamic behaviour... and ideally it should do something simple only. Proper API (Proposal/discussion: proper Python API for SCT (vs. subprocesses or calling module.main(args=[...])) #1621)/docstrings and the command-line parser (Replace usage of custom command-line parser (msct_parser) with argparse #1548) can be used to do something like that.
Extracting information from outputs, trying to have commonality... We could have a "diff" mechanism for each function/output in the library, but it would be better to regroup and have less.
Not having non-parametric or hard-coded intermediary/output file names

zougloub · 2018-03-23T00:28:02Z

When it comes to generating the actual report generation, once we have gathered the figures/KPI/values from the magical meta-data structure filled by the process execution, I'd be more inclined to piggy-backing on docutils, ie. generating reStructuredText and compiling that into pdf (or other), than say using reportlab.

jcohenadad · 2018-05-20T22:10:32Z

@benjamindeleener do you have a working branch on this? If not I can take care of this, as we need it soon (#1757, #1746).

jcohenadad · 2018-05-21T00:37:23Z

I see that many of you are using seaborn instead of matplotlib for generating violin plot. @charleygros did you check if matplotlib can generate similar violin plots than seaborn? I.e., with inner individual plots randomly distributed along x axis?

charleygros · 2018-05-21T13:17:37Z

@jcohenadad: Indeed matplotlib can generate similar violin plots, but I guess I am answering this question a bit late (#1759) : )

joshuacwnewton · 2023-01-15T00:08:43Z

It seems like we implemented this feature for sct_pipeline in #1759, but we never closed this issue.

(However, we then transitioned from sct_pipeline to sct_run_batch, and in the process, lost the sct_pipeline_makefig script that was created!)

So, in the context of current SCT, this issue is actually more akin to "Add database creation and results pickling to sct_run_batch, the revive the makefig function"... which is a much, much larger task to take on!

benjamindeleener added enhancement category: improves performance/results of an existing feature sct_pipeline context: labels Feb 15, 2018

benjamindeleener assigned zougloub and benjamindeleener Feb 15, 2018

zougloub mentioned this issue Mar 22, 2018

General improvement of QC module #1573

Closed

jcohenadad assigned jcohenadad and unassigned zougloub and benjamindeleener May 20, 2018

joshuacwnewton changed the title ~~Create a function to automatically generate a PDF report for large testing~~ Create a function that generates a report comparing results (e.g. between SCT versions) Jan 14, 2023

joshuacwnewton unassigned jcohenadad Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a function that generates a report comparing results (e.g. between SCT versions) #1603

Create a function that generates a report comparing results (e.g. between SCT versions) #1603

benjamindeleener commented Feb 15, 2018

zougloub commented Feb 15, 2018

benjamindeleener commented Feb 15, 2018

jcohenadad commented Feb 16, 2018

benjamindeleener commented Feb 16, 2018

zougloub commented Mar 22, 2018

zougloub commented Mar 23, 2018

jcohenadad commented May 20, 2018

jcohenadad commented May 21, 2018

charleygros commented May 21, 2018

joshuacwnewton commented Jan 15, 2023

Create a function that generates a report comparing results (e.g. between SCT versions) #1603

Create a function that generates a report comparing results (e.g. between SCT versions) #1603

Comments

benjamindeleener commented Feb 15, 2018

Description

zougloub commented Feb 15, 2018

benjamindeleener commented Feb 15, 2018

jcohenadad commented Feb 16, 2018

benjamindeleener commented Feb 16, 2018

zougloub commented Mar 22, 2018

zougloub commented Mar 23, 2018

jcohenadad commented May 20, 2018

jcohenadad commented May 21, 2018

charleygros commented May 21, 2018

joshuacwnewton commented Jan 15, 2023