-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Bug Description
When computing the 'nodata' count as one of the statistics for zonal_stats, if a geometry's rasterization does not include any pixels at all then:
- a UserWarning will be raised by numpy and
- the
'nodata'value will be NaN.
Of course, it is fine to interpret NaN as zero, but for consistency this should probably be:
- None, which is the value returned for
'max','mean', etc., or - 0, which is the value returned for
'count', and probably makes conceptual sense in this case because'nodata'functions as a count of pixels similarly to'count'.
Example
I have created a small .tif and .shp file illustrating this:
example.zip
In this example, the polygon in the shapefile is too thin to fully encompass any pixels' centroids, using the default centroid rasterization strategy leads to an "empty" rasterization and the following:
import rasterstats
rasterstats.zonal_stats("example.shp", "small_raster.tif", stats='nodata')C:\Users\nathan\Anaconda3\lib\site-packages\rasterstats\main.py:260: UserWarning: Warning: converting a masked element to nan.
feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum())
>>> [{'nodata': nan}]
Explanation for what is happening
The issue lies in the masking performed here:
python-rasterstats/src/rasterstats/main.py
Lines 261 to 264 in aa4130e
| featmasked = np.ma.MaskedArray(fsrc.array, mask=(~rv_array)) | |
| if 'nodata' in stats: | |
| feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum()) |
When trying to take .sum() over a fully masked array, numpy Masked Arrays will return numpy.ma.masked instead of zero.
If rv_array is all False (i.e. the rasterization doesn't include any pixels at all), then featmasked will be entirely masked, and so (featmasked == 0).sum() will be equal to numpy.ma.masked. Then, trying to convert to a float using float(numpy.ma.masked) raises the warning and returns np.nan, instead of zero.
Examples:
import numpy as np
# Partially masked (usual case):
rv_array = np.array([[False,True,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
float((featmasked == 0).sum())
>>> 0.0# Fully masked:
rv_array = np.array([[False,False,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
x = (featmasked == 0).sum()
np.ma.is_masked(x)
>>> Truefloat(x)
UserWarning: Warning: converting a masked element to nan.
>>> nanInstall info
I'm using rasterstats 0.14.0 with numpy 1.19.2 on Anaconda.