-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite simple and tiepoint modis interpolation in cython #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, makes it much easier to follow with split functions! just one inline question.
@mraspaud I think this is ready for re-review. Refactoring definitely fixed some bugs that were just "lucky" that they didn't break the final results. I'll make more PRs in the future when the coveralls github action is further along and merged with the upstream github action. I will merge this after I do some final performance checks to make sure my refactoring didn't break anything. |
Here is creation of a And here is this branch: So 25s faster, less memory, and faster processing. Note this is computing the true_color and throwing away the chunks. I'm not exactly sure why the new version shows a near constant 8GB of memory usage at the second half of processing, but that isn't the interpolation part of the processing anyway. |
Nnnoooo...the true_color geotiff is bad! It may be a nearest neighbor thing, let me check. |
False alarm, it is a bad default setting in the EWA resampler in pyresample. Separate PR coming for that in pyresample. |
So bottom line this PR makes this code better in every way. Let's just merge it. No need to review it. 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple refactoring requests
Regarding the performance results: did you test both modis_interpolator and simple_modis_interpolator? I think you presented just one result, which one is it? |
Good question. I realize now that this PR title is misleading as this PR has grown. I'll update that. The most recent profiling was all using Satpy's defaults which for the 1km data I was using is to use the Note I realized after doing it that some of these results were likely with Cython profiling/tracing enabled so likely slower than they will be in production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR tiepoint: This PR simple: So faster and more memory efficient, but some of these results seemed heavily dependent on what the rest of my system was doing. The simple interpolation definitely doesn't benefit from this work as much as the tiepoint-based interpolation does as it uses scipy to do the majority of the interpolation/extrapolation. I'm still very surprised by the simple interpolation in |
The other thing to keep in mind with the above plots is that the interpolation is only the first 10 seconds or so since I am also doing EWA resampling and then computation of the arrays. So the peak memory at the right side of the plot is an EWA problem not a geotiepoint/modis interpolation problem. |
Great work, thanks for wrapping this up! |
This is a continuation of #36 but includes mostly changes to the simple modis interpolation since that PR. Doing this though required restructuring of the cython modules so I added a
_modis_utils.py
. The results of a simple test with some dask array based processing:This is about half the memory and half the processing time of a previous execution with the non-cython version. It also runs faster in the pure numpy case of a full granule, but not as fast as the #36 angle-based interpolation with pure numpy. So to put it another way, simple interpolation is faster and uses less memory than #36 if used with small and/or dask-based arrays, but it is slower than #36 for larger numpy arrays.
git diff origin/main **/*py | flake8 --diff