-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNA: allow the LULC raster to not have a nodata value defined #1631
UNA: allow the LULC raster to not have a nodata value defined #1631
Conversation
# Guarantee that our nodata cannot be represented by the datatype - | ||
# select a nodata value that's out of range. | ||
target_nodata = pygeoprocessing.choose_nodata( | ||
source_raster_info['numpy_type']) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the need to add 1 here. I mean, I understand from the comment why you've done this, but I wonder if you think choose_nodata
should be updated to return the max + 1 instead of just the max? Would that make sense for other use cases, or could it cause problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! Since we can assume that every pixel in the dataset is valid (because target_nodata is None
), the safest possible nodata value is one that isn't even in the range of values possible with the given datatype. For a float32
raster, it's pretty unlikely that we would have a dataset that would have every discrete value that is allowed by the float32 spec, but it could still happen. Since the nodata value is stored as a piece of metadata separate from the pixel values themselves, we can avoid this possibility completely by using a nodata value that is completely outside of the range of possibilities of the datatype of the pixel values themselves.
If we were to change the datatype of choose_nodata
, you are right that that would unfortunately break things. In raster_map
, the nodata values of the incoming raster are transformed to the nodata values of the target raster, where the target nodata has been selected by choose_nodata
. If choose_nodata
were to return a value of the (dtype's max value)+1, then we'd get an overflow in the pixel values, or we bump up the output raster's datatype to accommodate the extra needed precision, neither of which we want!
So this solution as implemented gets around the situation by only handling this one case, where the lulc doesn't have a nodata value defined, and we just set one (to avoid breaking other things later on in the model) that can never happen in this datatype.
# if there is no defined nodata, set a default value | ||
raster = gdal.OpenEx(target_lulc_path, gdal.GA_Update) | ||
band = raster.GetRasterBand(1) | ||
band.SetNoDataValue(target_nodata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if I'm just misreading something, but could this SetNoDataValue
unintentionally overwrite an existing nodata value that was defined already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, it shouldn't be an issue because target_nodata
would just be set to the same value it was at before, but this isn't really a clean approach. I just updated the function in 94f7f00 to only set the nodata value if one isn't already defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @phargogh; thanks for giving me the chance to review. Just had a couple of questions!
All checks passed and @emilyanndavis has approved, so I'll go ahead and merge. Thanks! |
UNA has a masking step that assumes we have a nodata value set on both the population and lulc rasters. If the lulc does not have a nodata value, the masking would raise an error. This PR fixes this behavior by updating the LULC warping step to set a nodata value that is outside of the range of the dtype used if the raster does not have a nodata value defined.
Fixes #1293
Checklist
- [ ] Updated the user's guide (if needed)- [ ] Tested the Workbench UI (if relevant)