Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems using pycno (handling of nulls) #176

Closed
beccasealy opened this issue Jul 4, 2023 · 7 comments · Fixed by #177
Closed

Problems using pycno (handling of nulls) #176

beccasealy opened this issue Jul 4, 2023 · 7 comments · Fixed by #177

Comments

@beccasealy
Copy link

I'm currently trying to use tobler for pycnophylactic interpolation, but I'm running into some problems. When I run pycno_interpolate normally, regardless of the cellsize I pick, I get this error from astropy: "WARNING: nan_treatment='interpolate', however, NaN values detected post convolution. A contiguous region of NaN values, larger than the kernel size, are present in the input array. Increase the kernel size to avoid this. [astropy.convolution.convolve]".

I thought there might be some conflict with how nulls are handled and the data I'm using, so I tried to use "handle_null = False" to turn this off, but I'm getting "NameError: name 'smooth2d' is not defined". I checked the source code, and it looks like the smooth2d method is sometimes referred to as smooth2D and sometimes as smooth2d, which seems to be causing issues.

@knaaptime
Copy link
Member

Are you able to share any of your data to create a reproducible example?

@beccasealy
Copy link
Author

sample_data.zip
Yeah, attached are the shapefiles I'm using! It's population data from the 1870 federal census, and I'm starting with Georgia's populations by county (I've been using the column titled AK3002, which is the Black population in Georgia) and trying to get interpolated estimates for the boxes in the 12km square grid.
Code:
pycno_ga = tobler.pycno.pycno_interpolate(ga_counties, ga_grid, ['AK3002'], 12000, verbose=True)
originally, and
pycno_ga = tobler.pycno.pycno_interpolate(ga_counties, ga_grid, ['AK3002'], 12000, handle_null = False, verbose=True)
is what throws the NameError.

@knaaptime
Copy link
Member

cool, thanks for raising this (and sharing your data).

So in the first example, I think everything is working as intended. There's no actual error, just a warning coming from astropy that there are still some sections of nan values in the intermediate raster produced during the convolution phase. In this case those nan's become zeros (which is appropriate here because the total area of your output kernel is the same as the total input variable). See a discussion of the nan_handle parameter in Dan's notebook here

The interpolation completes just fine (with the correct sum, so your original input volume is preserved)

Screenshot 2023-07-04 at 9 43 54 AM

so I don't think there's a real issue, you should be able to continue using that just fine.

That's definitely a bug-inducing typo when handle_null=False though. As you suggested, we just need call smooth2D instead of smooth2d on line 160 of pycno.py. If i make that change locally, there's no error in the second version either (i.e., when you pass handle_null=False), so that should be a quick fix. Thanks for the find

@knaaptime
Copy link
Member

(just as an aside, I'm sure you already know, but by interpolating historical county data down to a regular 12km grid, you're adopting a strong assumption that population is distributed uniformly within a county. If some of the results look a little funny, you might want to consider clipping out pieces of the original counties that you know weren't inhabited in 1870, then conducting the pycno interpolation using the clipped data as the source)

but there are also some odd things about your target grid, where it looks like little pieces of (maybe?) water features have been digitized in or something. Those slivers may cause some oddities as well

Screenshot 2023-07-04 at 2 25 53 PM Screenshot 2023-07-04 at 2 23 46 PM

@beccasealy
Copy link
Author

Okay, great! Yeah, we're planning to incorporate land use data in the future so that we don't assume uniformity in population distributions. Thanks for the heads up on the grid, too -- I generated it in QGIS, so I'll have to see if something went wrong there.

@knaaptime
Copy link
Member

sweet. hope it comes together nicely.

Not sure if you're wedded to a rectangular grid, but we also have hexagonal gridding here in tobler if you want to explore some other options. Not sure why qgis produces those artifacts

@knaaptime
Copy link
Member

those county boundaries may not be planar enforced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants