Skip to content

nodata when geometry has empty rasterization #243

@nathanjmcdougall

Description

@nathanjmcdougall

Bug Description
When computing the 'nodata' count as one of the statistics for zonal_stats, if a geometry's rasterization does not include any pixels at all then:

  • a UserWarning will be raised by numpy and
  • the 'nodata' value will be NaN.

Of course, it is fine to interpret NaN as zero, but for consistency this should probably be:

  • None, which is the value returned for 'max', 'mean', etc., or
  • 0, which is the value returned for 'count', and probably makes conceptual sense in this case because 'nodata' functions as a count of pixels similarly to 'count'.

Example

I have created a small .tif and .shp file illustrating this:
example.zip

In this example, the polygon in the shapefile is too thin to fully encompass any pixels' centroids, using the default centroid rasterization strategy leads to an "empty" rasterization and the following:

import rasterstats
rasterstats.zonal_stats("example.shp", "small_raster.tif", stats='nodata')
C:\Users\nathan\Anaconda3\lib\site-packages\rasterstats\main.py:260: UserWarning: Warning: converting a masked element to nan.
  feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum())
>>> [{'nodata': nan}]

Explanation for what is happening

The issue lies in the masking performed here:

featmasked = np.ma.MaskedArray(fsrc.array, mask=(~rv_array))
if 'nodata' in stats:
feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum())

When trying to take .sum() over a fully masked array, numpy Masked Arrays will return numpy.ma.masked instead of zero.
If rv_array is all False (i.e. the rasterization doesn't include any pixels at all), then featmasked will be entirely masked, and so (featmasked == 0).sum() will be equal to numpy.ma.masked. Then, trying to convert to a float using float(numpy.ma.masked) raises the warning and returns np.nan, instead of zero.

Examples:

import numpy as np
# Partially masked (usual case):
rv_array = np.array([[False,True,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
float((featmasked == 0).sum())
>>> 0.0
# Fully masked:
rv_array = np.array([[False,False,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
x = (featmasked == 0).sum()
np.ma.is_masked(x)
>>> True
float(x)
UserWarning: Warning: converting a masked element to nan.
>>> nan

Install info
I'm using rasterstats 0.14.0 with numpy 1.19.2 on Anaconda.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions