Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor image samplers #175

Merged
merged 8 commits into from
Jun 2, 2020
Merged

Refactor image samplers #175

merged 8 commits into from
Jun 2, 2020

Conversation

fepegar
Copy link
Owner

@fepegar fepegar commented May 29, 2020

The weighted sampler uses a probability map included in the subject sample and computes once the possible patch locations, so it's as efficient as the ImageSampler (that will disappear) and much more efficient that the current LabelSampler.

Weighted sampler

Add a generic sampler that uses a probability map to extract patches.

Use Crop to extract patches.

At the moment there is legacy code in WeightedSampler that could be easily replaced by a call to Crop. The only difficulty is figuring out the arguments to Crop, but should be doable in 5 minutes with a pen and paper.

Move sample argument to __call__

This is important so that it can be used with the queue. The current implementation of Queue takes the sampler class as argument (not an instance of it), so no arguments can be passed directly to the sampler (e.g. the probability map name). The class is initialized inside Queue so that it can be passed the sample from which the patches will be extracted.

An alternative is passing an instance of the sampler and moving the sample/subject argument to __call__. It would look a bit like this:

class Sampler:
    def __init__(self, prob_map_name, patch_size):
        self.prob_map_name = prob_map_name
        self.patch_size = patch_size
    
    def __call__(self, sample):
        self.preprocess_prob_map(sample)
        while True:
            yield self.get_patch(sample)

Another option, maybe simpler, is just writing a setter for the sample/subject. Then, sampler.set_subject(subject) can be called in the queue before looping to extract the patches.

Uniform sampler (current ImageSampler)

The UniformSampler could be a thin wrapper of WeightedSampler, where the probability map is always None so (a map of ones will be used).

Refactor LabelSampler

Similarly, this sampler could just be a wrapper of the weighted sampler. Some fancier options could be added, for example sampling each class with equal probability. Or that could be a new sampler, BalancedSampler. This is nice because it follows NiftyNet syntax, to which new users might be used.

Handle how transforms are applied to probability maps

If an image is somehow marked as a sampling/probability map, e.g. using the currently unused

SAMPLING_MAP = 'sampling_map'

then intensity transforms shouldn't be applied to it. I guess using that type (or anything else) would make this work as intended. Using torchio.LABEL would work, but it doesn't make a lot of sense.

  • Weighted sampler
  • Use Crop to extract patches
  • Move sample argument to __call__
  • Uniform sampler (current ImageSampler)
  • Refactor LabelSampler
  • Handle how transforms are applied to probability maps
  • Unit tests
  • Type hints
  • Docs

This was referenced May 29, 2020
@romainVala
Copy link
Contributor

then intensity transforms shouldn't be applied to it. I guess using that type (or anything else) would make this work as intended. Using torchio.LABEL would work, but it doesn't make a lot of sense.

it is because in a segmentation task, we want to the label as target and also use them for the sampling strategy, so this is the same volume, but 2 uses ...

@fepegar
Copy link
Owner Author

fepegar commented May 30, 2020

Sure, I understand! But that's a very specific case in which the label happens to be the sampling map. But if the user defines it somehow and it's not a label, what should be the type? That's why I'm wondering.

@fepegar fepegar marked this pull request as ready for review May 31, 2020 17:54
Add weighted sampler

Add some docs for weighted sampler

Rename probability map argument

Add weighted sampler to docs

Add features to samplers

Add abstract method get_probability_map()

Move tests

Add features, tests and docs for samplers

Use crop transform to extract patches

Add comment to bounds transform

Add type hint for samplers

Fix TypeError for Python <3.8
Copy link
Contributor

@GFabien GFabien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fepegar! Great job with this PR! I was just wondering about the best way to do the following: imagine you have a segmentation task with 2 classes (tumor and non tumor for example) and you want to have the same number of patches in the background and in the object of interest. I think such a scenario would make sense, what do you think?

To implement such a scenario with your implementation you would have to compute the number of points in your volume, the number of points in your object of interest and set the value of all points in the object of interest to be the ratio and in the background 1 - ratio. And you would have to precompute that before giving the sample to the Sampler. Could it be a good idea to modify the LabelSampler to work this way or to create a new Sampler? The proportion of the different classes could also be a parameter of the Sampler.

tests/data/sampler/test_weighted_sampler.py Show resolved Hide resolved
torchio/data/sampler/weighted.py Outdated Show resolved Hide resolved
torchio/data/sampler/weighted.py Show resolved Hide resolved
@fepegar
Copy link
Owner Author

fepegar commented Jun 1, 2020

Hi @fepegar! Great job with this PR! I was just wondering about the best way to do the following: imagine you have a segmentation task with 2 classes (tumor and non tumor for example) and you want to have the same number of patches in the background and in the object of interest. I think such a scenario would make sense, what do you think?
To implement such a scenario with your implementation you would have to compute the number of points in your volume, the number of points in your object of interest and set the value of all points in the object of interest to be the ratio and in the background 1 - ratio. And you would have to precompute that before giving the sample to the Sampler. Could it be a good idea to modify the LabelSampler to work this way or to create a new Sampler? The proportion of the different classes could also be a parameter of the Sampler.

Thanks for your feedback! I've been thinking of adding something similar to NiftyNet's "balanced" sampler. At the moment, the user can achieve that behavior by precomputing a sampling map. But of course, we want to be friendly, so I think we can create a new BalancedSampler or add a kwarg to LabelSampler. What do you think is better? Maybe a new class is not necessary.

The class would take e.g. a label_probability dict:

label_probability = {
    0: 0.5,  # background
    1: 0.5,  # tumor
}

And the actual probability map can be quickly computed in __call__ by get_probability_map. Actually, this can be a generic case of LabelSampler, which would have now a dict:

label_probability = {
    0: 0,  # background
    1: 1,  # tumor
}

I've implemented something like this in the past, so hopefully I can reuse some code if needed.

@GFabien
Copy link
Contributor

GFabien commented Jun 1, 2020

Yes that would be perfect!

@fepegar fepegar added the enhancement New feature or request label Jun 1, 2020
@fepegar
Copy link
Owner Author

fepegar commented Jun 1, 2020

Done. Maybe there are more efficient ways to do this, but it seems to work.

Copy link
Contributor

@GFabien GFabien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good, well done @fepegar!

@fepegar
Copy link
Owner Author

fepegar commented Jun 2, 2020

Thanks for the feedback, @GFabien. I'll add some more tests and merge soon.

@fepegar fepegar merged commit 25d6cfa into master Jun 2, 2020
@fepegar fepegar deleted the 6-generalize-samplers branch June 2, 2020 14:30
JulianKlug added a commit to JulianKlug/torchio that referenced this pull request Jun 20, 2020
IterableDataset seems to have been removed as of fepegar#175, therefore torch>=1.1 could be sufficient.
This is in interest of many people still relying on CUDA 9.

soles fepegar#195
fepegar pushed a commit that referenced this pull request Jun 20, 2020
* Toch 1.1 is sufficient

IterableDataset seems to have been removed as of #175, therefore torch>=1.1 could be sufficient.
This is in the interest of many people still relying on CUDA 9.

Resolves #195

* Update setup.py
@fepegar fepegar mentioned this pull request Jul 1, 2020
fepegar added a commit that referenced this pull request Jul 1, 2020
Using sitk.Crop is slow because sitk.GetImageFromArray is slow.
Using copy.deepcopy(patch) is slow because it copies the whole array,
not just the patch. The solution is to use slicing for cropping and
patch.clone() for copying.

Related to #175, #183, 89acf63.
Fixes #212.
fepegar added a commit that referenced this pull request Jul 1, 2020
* Remove unused attribute

* Use slicing and clone() to crop images in samplers

Using sitk.Crop is slow because sitk.GetImageFromArray is slow.
Using copy.deepcopy(patch) is slow because it copies the whole array,
not just the patch. The solution is to use slicing for cropping and
patch.clone() for copying.

Related to #175, #183, 89acf63.
Fixes #212.

* Set proper affine when cropping patches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants