Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDR: sed_deposition has large swaths of nodata in Gura sample data #911

Closed
phargogh opened this issue Mar 22, 2022 · 10 comments · Fixed by #1435
Closed

SDR: sed_deposition has large swaths of nodata in Gura sample data #911

phargogh opened this issue Mar 22, 2022 · 10 comments · Fixed by #1435
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@phargogh
Copy link
Member

phargogh commented Mar 22, 2022

Reported by Stacie:

Hey James, following up from the SDR meeting... To reproduce my concern, just run SDR with the Gura sample data and look at the sed_deposition.tif output. There are large swaths of NoData. The only thing they correspond obviously with is the F intermediate output, no other intermediate files are missing data there, nor does it correspond obviously with LULC. I'd love to figure that out/fix it, since our sample data should not produce weird output. Thanks!

After running SDR, I overlayed the sed_deposition.tif on top of what_drains_to_stream.tif to show that these regions do indeed flow to the stream.

image

I can confirm that this is happening on InVEST 3.10.2 with the sample data.

@phargogh phargogh added the bug Something isn't working label Mar 22, 2022
@phargogh phargogh self-assigned this Mar 22, 2022
@phargogh phargogh assigned emlys and unassigned phargogh May 6, 2022
@newtpatrol
Copy link

It's the DEM causing the problem. In a desperate attempt to be able to use the SDR sample data for training, I downloaded a different DEM for the Gura sample data area, ran it with SDR version 3.12.0, and the output rasters do not have these weird holes! The original DEM data was SRTM, the new DEM data is HydroSHEDS (resampled from 90m to 15m to match the LULC, which is something I wouldn't typically do, but it's fine for this purpose).

@dcdenu4 dcdenu4 added the question Further information is requested label Oct 13, 2022
@emlys
Copy link
Member

emlys commented Oct 24, 2022

Thanks for the update @newtpatrol! Does this behavior still seem like a software bug, or do we just need to update the sample data?

@newtpatrol
Copy link

Hm, I think a little of both, since we did use the SRTM DEM for an analysis successfully years ago, and the same DEM does not cause similar problems with NDR. However, since I don't think anyone else has reported this problem, I'd be fine with shelving it as a bug for now, I'll just need to update the sample data - what's the best way to do that?

@emlys
Copy link
Member

emlys commented Jun 6, 2023

Investigating this now, some notes:

  • the unexpected nodata pixels are not being explicitly set to nodata. They are not being visited by the sediment deposition algorithm at all.
  • At least one of the nodata areas appears to originate at a pixel where flow direction is defined, but e_prime is not

@emlys
Copy link
Member

emlys commented Jun 7, 2023

The sediment deposition function takes three inputs: flow direction, sdr, and e_prime. flow_dir and sdr have valid data in the larger area (same as the LULC and DEM), while e_prime has valid_data in the smaller area (same as erosivity). The swaths of nodata originate from points along the edge of the watershed where flow_dir is defined and e_prime is undefined.

Because f is defined in terms of e_prime, sed_dep and f are undefined where e_prime is undefined.
Because sed_dep and f are defined recursively in terms upslope neighbors, they are undefined for all pixels that are downslope from a pixel where e_prime is undefined. The nodata propagates downslope from the edge of the watershed.

This is happening now because of a recent change to the sediment deposition function - we used to ignore upstream pixels that were missing data, which led to some potentially wrong results. Now that we require all upstream pixels to have data, we can't calculate results in all the same places.

So I think what's happening is correct mathematically, but we could handle this situation better. Maybe we should set flow direction to nodata where e_prime is nodata. (Though there's an assumption here that the flow direction raster has data over the entire watershed). This is worth discussing with Lisa and Rafa before deciding what to do.

@newtpatrol
Copy link

newtpatrol commented Jun 7, 2023

Thanks for doing all that sleuthing @emlys. Do you think it's due to something about the DEM sample data that could be fixed by the user (me)? If so, I might add related guidance to the User Guide as relevant.

I'll also thank you for another reason - it seems that I never updated the Gura sample data with the more functional HydroSHEDS DEM, so glad to have that reminder! What's the best way to update sample data? It's been a while since I've had to.

@emlys
Copy link
Member

emlys commented Jun 7, 2023

@newtpatrol I think it could be fixed in the short term by aligning the DEM and other raster inputs so that they all have nodata in the same places. But that's not something we normally expect from user data.

The sample data lives in this bitbucket repo: https://bitbucket.org/natcap/invest-sample-data/src/master/ You'd need to clone the repo, change the file, and commit and push the change back to the master branch. Or I can do it if you send me the file

@dcdenu4
Copy link
Member

dcdenu4 commented Jun 8, 2023

The sample data lives in this bitbucket repo: https://bitbucket.org/natcap/invest-sample-data/src/master/ You'd need to clone the repo, change the file, and commit and push the change back to the master branch. Or I can do it if you send me the file

Yep, I think it's easiest to send us the files for updating :)

This is happening now because of a recent change to the sediment deposition function - we used to ignore upstream pixels that were missing data, which led to some potentially wrong results.

I know we had this discussion previously but I wonder if there is a compromise between "ignoring undefined upstream pixels" and communicating how that could affect the results.

Thanks for picking this up Emily!

@newtpatrol
Copy link

newtpatrol commented Jun 8, 2023

Thanks y'all. I tried to update the SDR sample data myself in Bitbucket, but couldn't for the life of me successfully log in via any git tool, so gave up and sent an updated folder to @emlys. (For the record, there are several updates in that folder, along with the DEM).

@phargogh
Copy link
Member Author

Hey everyone, this issue came up on the forums (#1433 ) and I'm about to submit a patch that will fix it by mutually masking out the user's input rasters. That way, once alignment and masking are complete (which is early in model execution), all outputs should have data in the same places, except for regions that don't drain to the stream. I'll include you all on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants