New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dot product implementation for overlaps breaks xagg for high res grids #31
Comments
The original hope was a performance boost by switching from the for-loop to the dot product implementation - though my own (limited) benchmarking so far has shown ambiguous results? @jsadler2 Do you have a take on that? Depending on where we are with this, we could return behavior to what it was before, but with an option to use the dot-product setup manually for situations where it could provide the largest speed boost, or if there are fewer memory allocation issues. |
When I first implemented it, I wasn't thinking about using any kind of weights. I think the benefits would be for really large datasets and when you can use a dask cluster to do the dot product operation in a distributed way. With the memory issue, I think it makes sense to rollback to the previous behavior for now. Maybe there is a way to implement the dot product approach that avoids the memory issues, but that can be explored offline. |
#32 restores the default behavior, but keeps the dot product implementation as Med-term it would be good to get some more detailed benchmarking under which circumstances one of the two approaches is 'better' (speed vs. memory), and decide on next steps from there (this fix is also a bit of a cludge with a few redundancies, so I can clean those up at that point as well) |
I'm finding that the implementation of the dot product for computing weighted averages in core.py/aggregrate eats up way too much memory for high res grids. It's the wm.overlap_da that requires way too much memory. I am unable to allocate enough memory to make it through core.py/aggregate for many of the datasets I'm processing on an HPC system. I had no issue with the previous aggregate function before commit 4c5cc65. Looks like the dot product method is a lot cleaner in the code, but is there another benefit?
The text was updated successfully, but these errors were encountered: