Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dot product implementation for overlaps breaks xagg for high res grids #31

Closed
kerriegeil opened this issue Mar 15, 2022 · 3 comments
Closed

Comments

@kerriegeil
Copy link
Contributor

I'm finding that the implementation of the dot product for computing weighted averages in core.py/aggregrate eats up way too much memory for high res grids. It's the wm.overlap_da that requires way too much memory. I am unable to allocate enough memory to make it through core.py/aggregate for many of the datasets I'm processing on an HPC system. I had no issue with the previous aggregate function before commit 4c5cc65. Looks like the dot product method is a lot cleaner in the code, but is there another benefit?

@ks905383
Copy link
Owner

The original hope was a performance boost by switching from the for-loop to the dot product implementation - though my own (limited) benchmarking so far has shown ambiguous results? @jsadler2 Do you have a take on that?

Depending on where we are with this, we could return behavior to what it was before, but with an option to use the dot-product setup manually for situations where it could provide the largest speed boost, or if there are fewer memory allocation issues.

@jsadler2
Copy link
Contributor

jsadler2 commented Mar 15, 2022

When I first implemented it, I wasn't thinking about using any kind of weights. I think the benefits would be for really large datasets and when you can use a dask cluster to do the dot product operation in a distributed way.

With the memory issue, I think it makes sense to rollback to the previous behavior for now. Maybe there is a way to implement the dot product approach that avoids the memory issues, but that can be explored offline.

@ks905383
Copy link
Owner

ks905383 commented Apr 1, 2022

#32 restores the default behavior, but keeps the dot product implementation as impl='dot_product' in pixel_overlaps() and aggregate(). Hopefully that covers all use cases for now.

Med-term it would be good to get some more detailed benchmarking under which circumstances one of the two approaches is 'better' (speed vs. memory), and decide on next steps from there (this fix is also a bit of a cludge with a few redundancies, so I can clean those up at that point as well)

@ks905383 ks905383 closed this as completed Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants