Refactor image samplers #175

fepegar · 2020-05-29T19:45:55Z

The weighted sampler uses a probability map included in the subject sample and computes once the possible patch locations, so it's as efficient as the ImageSampler (that will disappear) and much more efficient that the current LabelSampler.

Weighted sampler

Add a generic sampler that uses a probability map to extract patches.

Use `Crop` to extract patches.

At the moment there is legacy code in WeightedSampler that could be easily replaced by a call to Crop. The only difficulty is figuring out the arguments to Crop, but should be doable in 5 minutes with a pen and paper.

Move `sample` argument to `call`

This is important so that it can be used with the queue. The current implementation of Queue takes the sampler class as argument (not an instance of it), so no arguments can be passed directly to the sampler (e.g. the probability map name). The class is initialized inside Queue so that it can be passed the sample from which the patches will be extracted.

An alternative is passing an instance of the sampler and moving the sample/subject argument to __call__. It would look a bit like this:

class Sampler:
    def __init__(self, prob_map_name, patch_size):
        self.prob_map_name = prob_map_name
        self.patch_size = patch_size
    
    def __call__(self, sample):
        self.preprocess_prob_map(sample)
        while True:
            yield self.get_patch(sample)

Another option, maybe simpler, is just writing a setter for the sample/subject. Then, sampler.set_subject(subject) can be called in the queue before looping to extract the patches.

Uniform sampler (current `ImageSampler`)

The UniformSampler could be a thin wrapper of WeightedSampler, where the probability map is always None so (a map of ones will be used).

Refactor `LabelSampler`

Similarly, this sampler could just be a wrapper of the weighted sampler. Some fancier options could be added, for example sampling each class with equal probability. Or that could be a new sampler, BalancedSampler. This is nice because it follows NiftyNet syntax, to which new users might be used.

Handle how transforms are applied to probability maps

If an image is somehow marked as a sampling/probability map, e.g. using the currently unused

torchio/torchio/torchio.py

Line 12 in 4133e10

SAMPLING_MAP = 'sampling_map'

then intensity transforms shouldn't be applied to it. I guess using that type (or anything else) would make this work as intended. Using torchio.LABEL would work, but it doesn't make a lot of sense.

romainVala · 2020-05-30T19:53:27Z

then intensity transforms shouldn't be applied to it. I guess using that type (or anything else) would make this work as intended. Using torchio.LABEL would work, but it doesn't make a lot of sense.

it is because in a segmentation task, we want to the label as target and also use them for the sampling strategy, so this is the same volume, but 2 uses ...

fepegar · 2020-05-30T20:58:41Z

Sure, I understand! But that's a very specific case in which the label happens to be the sampling map. But if the user defines it somehow and it's not a label, what should be the type? That's why I'm wondering.

Add weighted sampler Add some docs for weighted sampler Rename probability map argument Add weighted sampler to docs Add features to samplers Add abstract method get_probability_map() Move tests Add features, tests and docs for samplers Use crop transform to extract patches Add comment to bounds transform Add type hint for samplers Fix TypeError for Python <3.8

GFabien

Hi @fepegar! Great job with this PR! I was just wondering about the best way to do the following: imagine you have a segmentation task with 2 classes (tumor and non tumor for example) and you want to have the same number of patches in the background and in the object of interest. I think such a scenario would make sense, what do you think?

To implement such a scenario with your implementation you would have to compute the number of points in your volume, the number of points in your object of interest and set the value of all points in the object of interest to be the ratio and in the background 1 - ratio. And you would have to precompute that before giving the sample to the Sampler. Could it be a good idea to modify the LabelSampler to work this way or to create a new Sampler? The proportion of the different classes could also be a parameter of the Sampler.

tests/data/sampler/test_weighted_sampler.py

torchio/data/sampler/weighted.py

fepegar · 2020-06-01T13:23:27Z

Hi @fepegar! Great job with this PR! I was just wondering about the best way to do the following: imagine you have a segmentation task with 2 classes (tumor and non tumor for example) and you want to have the same number of patches in the background and in the object of interest. I think such a scenario would make sense, what do you think?
To implement such a scenario with your implementation you would have to compute the number of points in your volume, the number of points in your object of interest and set the value of all points in the object of interest to be the ratio and in the background 1 - ratio. And you would have to precompute that before giving the sample to the Sampler. Could it be a good idea to modify the LabelSampler to work this way or to create a new Sampler? The proportion of the different classes could also be a parameter of the Sampler.

Thanks for your feedback! I've been thinking of adding something similar to NiftyNet's "balanced" sampler. At the moment, the user can achieve that behavior by precomputing a sampling map. But of course, we want to be friendly, so I think we can create a new BalancedSampler or add a kwarg to LabelSampler. What do you think is better? Maybe a new class is not necessary.

The class would take e.g. a label_probability dict:

label_probability = {
    0: 0.5,  # background
    1: 0.5,  # tumor
}

And the actual probability map can be quickly computed in __call__ by get_probability_map. Actually, this can be a generic case of LabelSampler, which would have now a dict:

label_probability = {
    0: 0,  # background
    1: 1,  # tumor
}

I've implemented something like this in the past, so hopefully I can reuse some code if needed.

GFabien · 2020-06-01T13:30:27Z

Yes that would be perfect!

fepegar · 2020-06-01T23:56:21Z

Done. Maybe there are more efficient ways to do this, but it seems to work.

GFabien

It seems good, well done @fepegar!

fepegar · 2020-06-02T08:21:13Z

Thanks for the feedback, @GFabien. I'll add some more tests and merge soon.

IterableDataset seems to have been removed as of fepegar#175, therefore torch>=1.1 could be sufficient. This is in interest of many people still relying on CUDA 9. soles fepegar#195

* Toch 1.1 is sufficient IterableDataset seems to have been removed as of #175, therefore torch>=1.1 could be sufficient. This is in the interest of many people still relying on CUDA 9. Resolves #195 * Update setup.py

Using sitk.Crop is slow because sitk.GetImageFromArray is slow. Using copy.deepcopy(patch) is slow because it copies the whole array, not just the patch. The solution is to use slicing for cropping and patch.clone() for copying. Related to #175, #183, 89acf63. Fixes #212.

* Remove unused attribute * Use slicing and clone() to crop images in samplers Using sitk.Crop is slow because sitk.GetImageFromArray is slow. Using copy.deepcopy(patch) is slow because it copies the whole array, not just the patch. The solution is to use slicing for cropping and patch.clone() for copying. Related to #175, #183, 89acf63. Fixes #212. * Set proper affine when cropping patches

This was referenced May 29, 2020

Improve label sampler efficiency #6

Closed

Label image sampler #50

Closed

This was linked to issues May 31, 2020

Add weighted sampler #5

Closed

Improve label sampler efficiency #6

Closed

Label image sampler #50

Closed

Crop samples in patches using a transform #128

Closed

fepegar marked this pull request as ready for review May 31, 2020 17:54

fepegar force-pushed the 6-generalize-samplers branch from c9ecd76 to d5a917d Compare May 31, 2020 17:57

Fix heteromodal example

562ceeb

GFabien reviewed Jun 1, 2020

View reviewed changes

tests/data/sampler/test_weighted_sampler.py Show resolved Hide resolved

torchio/data/sampler/weighted.py Outdated Show resolved Hide resolved

torchio/data/sampler/weighted.py Show resolved Hide resolved

Fix test name

f88caa8

fepegar added 4 commits June 2, 2020 00:48

Update docs

66988a3

Add low-level test for label sampler

d4962a5

Add label_probabilities kwarg to label sampler

2a1c463

Raise error if probability map is empty

5231c6e

fepegar added the enhancement New feature or request label Jun 1, 2020

GFabien approved these changes Jun 2, 2020

View reviewed changes

Use first label in sample if None is passed

ce7f273

fepegar merged commit 25d6cfa into master Jun 2, 2020

fepegar deleted the 6-generalize-samplers branch June 2, 2020 14:30

fepegar mentioned this pull request Jun 19, 2020

Queue does not take full advantage of multiprocessing #183

Closed

JulianKlug mentioned this pull request Jun 19, 2020

Is torch 1.2 still mandatory? #195

Closed

JulianKlug mentioned this pull request Jun 20, 2020

Toch 1.1 is sufficient #196

Merged

fepegar mentioned this pull request Jul 1, 2020

Speedup crops #212

Closed

fepegar mentioned this pull request Jul 1, 2020

Optimize cropping in samplers #214

Merged

fepegar mentioned this pull request May 20, 2021

Why using -1 to handle accumulated floating-point errors #548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor image samplers #175

Refactor image samplers #175

fepegar commented May 29, 2020 •

edited

Loading

romainVala commented May 30, 2020

fepegar commented May 30, 2020 •

edited

Loading

GFabien left a comment

fepegar commented Jun 1, 2020

GFabien commented Jun 1, 2020

fepegar commented Jun 1, 2020

GFabien left a comment

fepegar commented Jun 2, 2020

Refactor image samplers #175

Refactor image samplers #175

Conversation

fepegar commented May 29, 2020 • edited Loading

Weighted sampler

Use Crop to extract patches.

Move sample argument to __call__

Uniform sampler (current ImageSampler)

Refactor LabelSampler

Handle how transforms are applied to probability maps

romainVala commented May 30, 2020

fepegar commented May 30, 2020 • edited Loading

GFabien left a comment

Choose a reason for hiding this comment

fepegar commented Jun 1, 2020

GFabien commented Jun 1, 2020

fepegar commented Jun 1, 2020

GFabien left a comment

Choose a reason for hiding this comment

fepegar commented Jun 2, 2020

fepegar commented May 29, 2020 •

edited

Loading

Use `Crop` to extract patches.

Move `sample` argument to `call`

Uniform sampler (current `ImageSampler`)

Refactor `LabelSampler`

fepegar commented May 30, 2020 •

edited

Loading