Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a function that generates a report comparing results (e.g. between SCT versions) #1603

Open
benjamindeleener opened this issue Feb 15, 2018 · 10 comments
Labels
enhancement category: improves performance/results of an existing feature sct_pipeline context:

Comments

@benjamindeleener
Copy link
Contributor

Description

When there is a new release or a new function, it would be good to generate a PDF report when using sct_pipeline that would allow comparing the results of the current functions with specific release or results.

For example, if one wants to test the results of a new version sct_deepseg_sc and compare them to an old version (let's say master), we can currently run the command below with both versions of SCT and gather and compare the results (using the pickle files that are generated).

sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2","-i t1/t1.nii.gz -c t1\"

This new tool would provide an easy way to generate a PDF report that compares two versions of SCT, by simply adding a parameter to sct_pipeline. Examples of commands could be:

sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare my_previous_results.pkl
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare master
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare v3.1.0
sct_pipeline -f sct_deepseg_sc -d /Users/gustave/data/sct_test_function/ -p \"-i t2/t2.nii.gz -c 
    t2\" -compare v3.1.0,master

The report should contain tables with the results for each subject as well as graphs that provide a quick visual assessment of the results (like violin plots).

Points to discuss:

  • Should we use sct_pipeline or should we create a new function?
  • What would be the best format to store the results? Pickle, CSV, H5, others?
@benjamindeleener benjamindeleener added enhancement category: improves performance/results of an existing feature sct_pipeline context: labels Feb 15, 2018
@zougloub
Copy link
Contributor

My feedback:

  • Most methods implemented in SCT would have been developed agains some reference (eg. training/testing) data, so each one could have a validation script (take a look at Ensure SOPs for reproducing the downloadable data are available and could work #1604).
  • Obviously some results are subject to changes due to improvements in methods, so there wouldn't be a choice but use visual comparison. The "best" method to appreciate a method's performance would probably be different for each method though. For fast validation, 2D review would be a must.
  • Some code can be subjected to simple unit testing with boolean results.
  • Pickle is a bad idea in 99.97% of the applicable cases, and I wouldn't recommend it.
    And IMHO we should also remove its current use in downloadable data.
  • sct_pipeline is very poorly documented and I'm not sure it has been stress-tested enough
  • Manually performing diffs on PDF can be done rather quickly with tools such as diffpdf, but there are a lot of differences that can just be due to PDF generation code changes, and it's not as fast as performing A/B comparison on bitmaps by swapping them.

@benjamindeleener
Copy link
Contributor Author

@zougloub Thank you for your feedback.

I would like to stress out that this issue reflects my intention to offer a quick, incremental solution for comparing our methods on large testing with older/other versions of the software, based on existing tools in SCT (such as sct_pipeline). I do not intend to develop a completely new testing framework, that would take months of development.

Specific answers to your comments:

  • Most methods already have their own validation scripts, that generate their own validation metrics. For example, sct_register_to_template is being tested by test_sct_register_to_template that calculate several validation metrics specific for the registration of the template (i.e., Dice coefficients). When sct_pipeline is ran for a specific SCT function on a large dataset (multiple subjects), these validation metrics are calculated for each subject and saved in a Pandas dataframe.
  • The purpose of this PDF report is to visualize the changes on the metrics generated by sct_pipeline against previous/other versions of SCT. I agree that the best visualization of the changes could slightly differ for each method. However, violin plots are a good initial solution and would be enough for most methods.
  • The purpose of this issue is to assess the changes of high-level methods only, such as sct_propseg, sct_label_vertebrae, and sct_register_to_template.
  • Could you please recommend an alternative to Pickle?
  • I agree with you that sct_pipeline is not well documented. Again, the purpose of this issue is not to reinvent the wheel, but to offer a quick solution based on existing tools. I've been using sct_pipeline to assess the results of methods for a long time now, but we lack a comparison tool, which is the purpose of this issue.
  • I don't understand the last comment. As sct_pipeline provides a dataframe with the results for each version of SCT that is being testing, it is very easy to use Pandas tools to generated figures and comparison results. The PDF report is just a good way of presenting these results.

Here is an example of comparison results that were generated using sct_pipeline for the segmentation when using two different methods for detecting the spinal cord:
#1249 (comment)

As you can see, it is fairly easy to assess which methods is better than the other when comparing two versions of the software. The objective of this issue is simply to generate a PDF report that provide this kind of comparison results.

@jcohenadad
Copy link
Member

@benjamindeleener I like the overall approach you describe, however i would not integrate the result management/visualisation inside sct_pipeline, which is already a fairly complicated function.

My suggestion would be to create a third party function, which takes as input the panda structure (output of sct_pipeline) and spits out a pdf. There could be more than 2 inputs if we want to compare 3 results for example.

So, something like:

sct_generate_report -i results_propseg_3.1.0.pkl,results_propseg_3.1.1.pkl -o report.pdf

This approach would also enable possible use of this function for other purposes, not necessarily specific to sct_pipeline.

Another advantage is that we could generate the report independently from running sct_pipeline, which takes a long time. Example: if we want red violin plots instead of green, we don't have to re-run the whole thing for 10 hours.

@benjamindeleener
Copy link
Contributor Author

@jcohenadad Agreed. Let's do this!

@zougloub
Copy link
Contributor

I concur.

As to "let's do this!" the thing is, in order to do something that's not a future liability, it would be good to be starting off from a solid base:

@zougloub
Copy link
Contributor

When it comes to generating the actual report generation, once we have gathered the figures/KPI/values from the magical meta-data structure filled by the process execution, I'd be more inclined to piggy-backing on docutils, ie. generating reStructuredText and compiling that into pdf (or other), than say using reportlab.

@jcohenadad
Copy link
Member

@benjamindeleener do you have a working branch on this? If not I can take care of this, as we need it soon (#1757, #1746).

@jcohenadad
Copy link
Member

I see that many of you are using seaborn instead of matplotlib for generating violin plot. @charleygros did you check if matplotlib can generate similar violin plots than seaborn? I.e., with inner individual plots randomly distributed along x axis?

@charleygros
Copy link
Member

@jcohenadad: Indeed matplotlib can generate similar violin plots, but I guess I am answering this question a bit late (#1759) : )

@joshuacwnewton joshuacwnewton changed the title Create a function to automatically generate a PDF report for large testing Create a function that generates a report comparing results (e.g. between SCT versions) Jan 14, 2023
@joshuacwnewton
Copy link
Member

It seems like we implemented this feature for sct_pipeline in #1759, but we never closed this issue.

(However, we then transitioned from sct_pipeline to sct_run_batch, and in the process, lost the sct_pipeline_makefig script that was created!)

So, in the context of current SCT, this issue is actually more akin to "Add database creation and results pickling to sct_run_batch, the revive the makefig function"... which is a much, much larger task to take on!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement category: improves performance/results of an existing feature sct_pipeline context:
Projects
None yet
Development

No branches or pull requests

5 participants