Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffused transformations in docs to preprocessing, DA, and utilities #1061

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

uzaymacar
Copy link
Contributor

@uzaymacar uzaymacar commented Jan 29, 2022

Checklist

GitHub

  • I've given this PR a concise, self-descriptive, and meaningful title
  • I've linked relevant issues in the PR body
  • I've applied the relevant labels to this PR
  • I've assigned a reviewer

PR contents

Description

This PR diffuses the "Transformations" heading into (i) "Preprocessing", (ii) "Data Augmentations", and (iii) "Utility Transformations" headings at the same level.

In doing so, I also alphabetized the transformations under the new "Data Augmentations" as previously mentioned here.

In the reviewing of this PR, it would be great to get some feedback for

  • perhaps a better / shorter name for "Utility Transformations"
  • currently customizable parameters (e.g. "applied_to", "dataset_type", etc.) are repeated in all three headings which is not ideal

Linked issues

Resolves #1059

@uzaymacar uzaymacar added the documentation category: readthedocs or user doc label Jan 29, 2022
@kanishk16
Copy link
Contributor

Would it make sense to have a subsection as transformations with subsubsections as (i) "Preprocessing", (ii) "Data Augmentations", and (iii) "Utilities" as shown below:

image

(Why?) ivadomed at its core uses PyTorch and this aligns with torchvision.transforms.Compose where we eventually chain several transformations together ranging from preprocessing to data augmentations. This even resolves repetition of customizable parameters that could be stated just once under Transformations subsection.

ref: Sphinx subsubsections

@uzaymacar
Copy link
Contributor Author

uzaymacar commented Jan 30, 2022

@kanishk16 It would definitely make sense, thanks for your suggestion! Another supporting argument for your idea is that the key used in the config file is "transformations" and it would be good to retain a "Transformations" heading (similar to how we have it with "training_parameters" key and "Training Parameters" heading and so on).

A counter-argument for this idea is non-consistency with "Preprocessing" heading as @jcohenadad noted here, but I would be in favor of your idea because it solves more problems.

@kanishk16
Copy link
Contributor

kanishk16 commented Jan 30, 2022

A counter-argument for this idea is non-consistency with "Preprocessing" heading as @jcohenadad noted here, but I would be in favor of your idea because it solves more problems.

@uzaymacar Thanks for highlighting this, I completely missed this. Apart from this trade-off, another plausible solution could be to have a different key (preprocessing) and a corresponding subsection for it to be more consistent. And, data augmentations along with utility transformations under a single subsection of transformations. I guess this solution is more intuitive (feedback appreciated on this thought). The solution we choose depends upon our target audience for ivadomed.

@jcohenadad
Copy link
Member

@uzaymacar Thanks for highlighting this, I completely missed this. Apart from this trade-off, another plausible solution could be to have a different key (preprocessing) and a corresponding subsection for it to be more consistent. And, data augmentations along with utility transformations under a single subsection of transformations. I guess this solution is more intuitive (feedback appreciated on this thought). The solution we choose depends upon our target audience for ivadomed.

@kanishk16 could you please show us in a 'tree'-like display what your TOC suggestion would look like?

@naga-karthik
Copy link
Member

Thanks, @uzaymacar for restructuring the docs! It is much required! I am in favor of keeping the name "Transformations" as it is currently in the TOC and it also corresponds well with the keys that we have in the config file. However, as "Preprocessing" shouldn't technically be under "Transformations", I agree with @kanishk16's comment on creating new key-value pairs for preprocessing in the config file and having another subsection (I think preprocessing keys are important enough to justify a new subsection!).

At the risk of extending the discussion, here are my 2 more cents on this:

  1. Personally, I would do away with the NumpyToTensor key in the config file in the first place, because this conversion is something I expect to be done by default without having the user know about this (I mean, the basic building block in PyTorch is a Tensor). Moreover, in all the examples I have seen, the value for that key is always empty ie NumpyToTensor: {}, which suggests that removing it at some point should be considered.
  2. If (1) above is implemented, then we can push "Normalize" under "transformations". This is because, from here, I see that normalize is indeed under torchvision.transforms and we also do not have to worry about "utilities" or "utility transformations" anymore.

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally rename "Configuration File" into "Usage"

currently, "Usage" is pretty light and only has the CLI info, whereas Configuration file should be part of Usage (moreover, some of the params could also be called via the CLI, right?).

But maybe that should be part of another PR?

@uzaymacar
Copy link
Contributor Author

Thanks all for the excellent feedback! I would like to pick a few points to handle in separate issues + PRs if possible:

  • I would personally rename "Configuration File" into "Usage" >

    We should think of this together with Consider merging "Usage" and "Scripts" into a single documentation entry #1029.

  • I would do away with the NumpyToTensor key

    Agreed. Why was this implemented in the first place -- if no answer let's proceed with your suggestion!

  • have a different key (preprocessing)

    creating new key-value pairs for preprocessing in the config file

    Agreed.

Moving onto this PR, IIUC currently we have two proposals on top of the current state of the branch:

  • h1: Transformations, h2: Preprocessing, h2: Data Augmentations, h2: Utility by @kanishk16's first suggestion
  • h1: Preprocessing, h1: Transformations by @kanishk16's second suggestion + @naga-karthik's suggestion

I was in favor of first option because all of preprocessing, data augmentations, and utility are implemented as and are transformations: (a) being implemented in ivadomed/transforms.py, (b) based on the ImedTransform class, and (c) have common parameters as mentioned here. I like the second option because it (a) doesn't hide everything under "Transformations", (b) establish consistency with "Postprocessing", and (c) remove Utility as a heading.

How can we fuse the two? LMK if I have missed anything! Perhaps first taking care of the tangential but excellent points mentioned at the beginning might shine some light on this discussion.

@mariehbourget
Copy link
Member

Very interesting discussion!
Here is my two-cents:

Regarding Usage, Configuration File and Scripts:

I would be in favor of keeping the term and section "configuration file" but reorganizing the "Usage" section as:
h1: Usage, h2: Command line tools, h2: Configuration file, h2: Scripts

About NumpyToTensor:

  • I would do away with the NumpyToTensor key

    Agreed. Why was this implemented in the first place -- if no answer let's proceed with your suggestion!

As per #818, #826 and #830, the NumpyToTensor key is not necessary in the config file and is applied directly in the code here (for transforms that are not preprocessing). The code is retro-compatible for config files that contains the key here (the key is ignored).
So I think it's safe to remove it from the transforms since it is not used anymore on a config file basis.

About preprocessing:

  • have a different key (preprocessing)
    creating new key-value pairs for preprocessing in the config file

    Agreed.

I'm less in favor of this one because preprocessing transforms ARE transforms (same arguments as Uzay for option 1).
They are in the transforms.py file and share classes with data augmentation (ex: Compose). I don't see them as the opposite of postprocessing since preprocessing applies to the image + GT whereas postprocessing applies to the predictions.
Moreover, we have a preprocessing.py file that is something else completely.
I'm not against it completely but would be careful moving forward with this refactoring.

About the doc hierarchy:

Moving onto this PR, IIUC currently we have two proposals on top of the current state of the branch:

  • h1: Transformations, h2: Preprocessing, h2: Data Augmentations, h2: Utility by @kanishk16's first suggestion
  • h1: Preprocessing, h1: Transformations by @kanishk16's second suggestion + @naga-karthik's suggestion

Because of my previous point, I will suggest a third option.

  • h1: Transformations, h2: Preprocessing, h2: Data augmentation (including NormalizeInstance), NumpyToTensor is removed.

@uzaymacar
Copy link
Contributor Author

Latest commit 3035fa5 implements h1: Transformations, h2: Preprocessing, h2: Data Augmentations and removes NumPyToTensor as suggested by @mariehbourget -- I think it is a good conclusion of all of the ideas presented in this discussion, thanks Marie-Hélène! I am marking this PR ready-to-review as a result.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 1815266603

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 171 unchanged lines in 11 files lost coverage.
  • Overall coverage remained the same at 79.043%

Files with Coverage Reduction New Missed Lines %
ivadomed/scripts/compare_models.py 2 95.35%
ivadomed/scripts/extract_small_dataset.py 5 91.4%
ivadomed/scripts/visualize_transforms.py 5 91.78%
ivadomed/loader/slice_filter.py 8 73.33%
ivadomed/scripts/training_curve.py 9 85.71%
ivadomed/loader/bids_dataset.py 10 82.47%
ivadomed/loader/mri2d_segmentation_dataset.py 13 89.52%
ivadomed/utils.py 22 76.19%
ivadomed/main.py 24 55.67%
ivadomed/testing.py 31 63.45%
Totals Coverage Status
Change from base Build 1764193488: 0%
Covered Lines: 4643
Relevant Lines: 5874

💛 - Coveralls

@jcohenadad
Copy link
Member

jcohenadad commented Feb 9, 2022

from @mariehbourget

I would be in favor of keeping the term and section "configuration file" but reorganizing the "Usage" section as:
h1: Usage, h2: Command line tools, h2: Configuration file, h2: Scripts

👍

from @uzaymacar

Latest commit 3035fa5 implements h1: Transformations, h2: Preprocessing, h2: Data Augmentations and removes NumPyToTensor as suggested by @mariehbourget -- I think it is a good conclusion of all of the ideas presented in this discussion, thanks Marie-Hélène! I am marking this PR ready-to-review as a result.

where would 'postprocessing' be located? I would suggest using this more intuitive display to get the discussion going:

├── GETTING STARTED
│   ├── Installation
│   ├── Data
│   ├── Usage
│   │   ├── Command line tools
│   │   ├── Configuration file
│   │   └── Scripts
│   ├── Transformations
│       ├── Preprocessing
│       ├── Data Augmentations
│       └── Postprocessing
├── Architectures
├── Pre-trained models
├── Help

@mariehbourget
Copy link
Member

where would 'postprocessing' be located? I would suggest using this more intuitive display to get the discussion going:

├── GETTING STARTED
│   ├── Installation
│   ├── Data
│   ├── Usage
│   │   ├── Command line tools
│   │   ├── Configuration file
│   │   └── Scripts
│   ├── Transformations
│       ├── Preprocessing
│       ├── Data Augmentations
│       └── Postprocessing
├── Architectures
├── Pre-trained models
├── Help

Just to clarify: "Transformations" is nested under "Configuration file" with all the other config parameters (including Postprocessing), see updated structure below.

As for "Postprocessing", I don't think it should be inside "Transformations" because:

  • They are not tranformations i.e. not based on ImedTransform class or torchvision_transforms
  • Transformations apply to images + GT whereas Postprocessing apply to predictions
├── GETTING STARTED
│   ├── Installation
│   ├── Data
│   ├── Usage
│   │   ├── Command line tools
│   │   ├── Configuration file
│   │   │   ├── <other previous configuration file categories>
│   │   │   ├── Transformations
│   │   │   │   ├── Preprocessing
│   │   │   │   └── Data Augmentations
│   │   │   ├── Uncertainty
│   │   │   ├── Postprocessing
│   │   │   └── Evaluation Parameters
│   │   └── Scripts
│   ├── Architectures
│   ├── Pre-trained models
│   ├── Help

@jcohenadad
Copy link
Member

from #1061 (comment), i feel that an h3 nesting for 'Transformations' (which is a pretty important section) will likely be missed.

Maybe we should reconsider some of the proposed changes about 'transformations'? And if we wish to merge 'scripts' with usage, if we consider that ivadomed is a script (which it is), then we could maybe do something like this:

├── GETTING STARTED
│   ├── Installation
│   ├── Data
│   ├── Usage  --> list all scripts here at the same level: 'ivadomed', 'ivadomed_visualize_transforms', etc.
│   ├── Configuration file
│   │   ├── <other previous configuration file categories>
│   │   ├── Transformations
│   │   │   ├── Preprocessing
│   │   │   └── Data Augmentations
│   │   ├── Uncertainty
│   │   ├── Postprocessing
│   │   └── Evaluation Parameters
│   ├── Architectures
│   ├── Pre-trained models
│   ├── Help

to sum up the proposed changes are:

  • Move 'Configuration file' after 'Usage', because I fear people will get overwhelmed by the 'Configuration file' and should first get the 'big picture' of how ivadomed works, and then get into the details of how exactly they should use it
  • Consider ivadomed a script
  • Move all scripts to 'Usage'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation category: readthedocs or user doc Ready for review Mark PR that are ready to be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformations documentation: separate between preprocessing and data augmentation
6 participants