New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RasterDataset
support for nodata masks.
#1078
Comments
I agree that we should have a way to access a nodata mask. However, I'm not sure the best way to do this. Your approach would work, but if you then want to combine NAIP with another mask dataset, the mask will go from B x H x W to B x C x H x W. Our current semantic segmentation trainers can't handle this, and I've been trying to standardize the output of our datasets more: #985 Reading through https://rasterio.readthedocs.io/en/latest/topics/masks.html, it seems like the nodata mask is simply the locations in the image where the value equals the nodata value. So if nodata=0, then all pixels in the image equal to zero are nodata pixels. One option would be to manually parse this information from the image yourself during training instead of creating a separate mask returned by the data loader. The problem with this is you have to be careful to track how this value changes if you use image normalization. Another option would be to use what you described and either forbid this kind of dataset being combined with a mask, or to make all of our masks B x C x H x W like Kornia expects. What do you think? |
Parsing the nodata mask from the image during training strikes me as breaking the abstractions of the dataloader and the Trainer. I would expect the dataloader to do all necessary disk access, particularly given that One potential problem with pulling the Rasterio stores masks per-band, so the dataset could return a mask of shape C x H x W to match the What you write in #985 about standardizing sample keys makes a lot of sense to me. Am I correct in my understanding that the |
Just to clarify, I was suggesting creating the mask from the image in memory, there shouldn't be any file I/O necessary. For example, if I load an image, and I know the value 0 means nodata, it wouldn't be hard to create a mask from an image with
If that's the case, it may not be possible to create a mask from the image in memory.
In what scenarios does the mask differ by band?
This actually sounds like a pretty good idea! I would be fine with this solution. It removes most of my concerns, and it's not particularly obtrusive. Only remaining concern would be Kornia support. We want to make sure RandomCrop affects not only the image but also the nodata mask. Part of the reason for standardizing the keys is to match what Kornia uses. In that sense, Kornia only recognizes mask, it wouldn't recognize nodata. I'm still working on kornia/kornia#2119 but it should be possible to add support for names like "nodata_mask" or "mask_nodata". If you want to submit a PR to implement this, let's just call it |
In our application it should be consistent between bands. My guess is that this feature exists in rasterio due to allow composition between different data sources which may have missing data at different locations in a raster based on the observation method.
Great! I'll work on a PR for this and call it |
Summary
We suggest adding support for nodata masking in
RasterDataset
, allowing users to optionally include anodata
mask with each sample returned by__getitem__
.Rationale
In one of our applications, we take NAIP quarter quadrangles and sample the RasterDataset with a spatial sampler. We noticed that our labels will occasionally appear at the edge of a raster where there is no data present. We'd like to mask those regions during training and evaluation to prevent those labels from being included in the loss.
Implementation
To produce optional "nodata" masks, we simply add a field to
RasterDataset
defining whether to return said masks with each sample, and when reading bands during a query we additionally read the raster nodata masks and return them as a tensor.We previously implemented this feature on a forked version of
RasterDataset
and would be happy to submit a PR with the implementation.Alternatives
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: