-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce memory consumption of flow accumulation? #97
Comments
If possible, the easiest way would be to break up the DEM itself. Breaking up into catchments for instance would allow you to run accumulation on each catchment independently. Within the accumulation function, there are probably some places that you could eliminate redundant arrays. It depends what line in the function you are encountering the memory error though. Memory usage is probably largest after line 1249 is executed. I'll assume the number of cells in the DEM is greater than ~33000 and smaller than ~2 billion. The big arrays are:
The I will need to think about what else can be eliminated. |
...
startnodes, endnodes = self._construct_matching(fdir, domain,
dirmap=dirmap)
del startnodes
indegree = np.bincount(endnodes)
del endnodes
indegree = indegree.reshape(acc.shape).astype(np.uint8)
startnodes = np.flatnonzero(indegree == 0)
endnodes = fdir.flat[startnodes]
if weights is not None:
assert(weights.size == fdir.size)
# TODO: Why flatten? Does this prevent weights from being modified?
acc = weights.flatten()
else:
acc = (~nodata_cells).ravel().astype(int)
if efficiency is not None:
assert(efficiency.size == fdir.size)
eff = efficiency.flatten() # must be flattened to avoid IndexError below
acc = acc.astype(float)
eff_max, eff_min = np.max(eff), np.min(eff)
assert((eff_max<=1) and (eff_min>=0))
... |
EDIT: A more recent issue was raised here under #187, and in scikit.image. I wish I had a suggestion to flatten the allocation curve, but using a memory profiler helped me appreciate the complexity of the question "how much RAM do I need for X." In my case, the DEM is already clipped to the basin I need to work within. I wasn't able to make a noticeable effect on peak memory allocation by changing steps in the codebase, so I used the profiling tool memray while processing a DEM of roughly 80M cells (380MB file size). Given the steps: filled_dem = grid.fill_pits(dem)
flooded_dem = grid.fill_depressions(filled_dem)
conditioned_dem = grid.resolve_flats(flooded_dem, max_iter=1E12, eps=1E-12) The peak memory allocation is more than 30x the file size. I'm just learning how to read a flamegraph, but it looks like the |
Hi @mdbartos ,
pysheds is accumulating hard on my machine to produce urgently needed results. Unfortunately, it uses all my memory (with flow direction, efficiency and flow accumulation) which results (sometimes) in MemoryErrors. So, I was wondering whether some of the (flat) copies in the flow accumulation could be avoided. Are you aware of any ad-hoc solution to save precious memory? I soon have to repeat the calculations, so this would really be helpful.
Thanks in advance,
Andreas
The text was updated successfully, but these errors were encountered: