Reduce memory usage of MapEvaluator #4989

QRemy · 2023-12-14T22:36:57Z

Avoid caching exposure cutout on MapEvaluator. The cutout creates a new array in memory for each source so this does not scale well if the cutout region and the number of sources is large, while the cutout is very fast to compute.

QRemy · 2023-12-14T22:43:33Z

reference

this PR

codecov · 2023-12-14T22:44:06Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (ca22055) 75.69% compared to head (8f708ed) 75.69%.

Files	Patch %	Lines
gammapy/datasets/evaluator.py	95.65%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4989   +/-   ##
=======================================
  Coverage   75.69%   75.69%           
=======================================
  Files         228      228           
  Lines       33841    33851   +10     
=======================================
+ Hits        25616    25624    +8     
- Misses       8225     8227    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

adonath · 2023-12-14T23:12:30Z

Are you sure the difference comes from the exposure? The cutout should only return a view and not copy any values. However the cutout geom re-computes the array of coordinates. I would presume the memory usage comes from this.

QRemy · 2023-12-14T23:36:37Z

Are you sure the difference comes from the exposure? The cutout should only return a view and not copy any values. However the cutout geom re-computes the array of coordinates. I would presume the memory usage comes from this.

from what I understand the cutout creates a new map with a new data array :

gammapy/gammapy/maps/wcs/ndmap.py

Lines 992 to 995 in 888905e

    
           data = np.zeros(shape=geom_cutout.data_shape, dtype=self.data.dtype) 
        
           data[cutout_slices] = self.data[parent_slices] 
        
           return self._init_copy(geom=geom_cutout, data=data)

and the coordinates are not re-computed as they are cached only one time with the lru_cache of the original geom

QRemy · 2023-12-15T00:15:35Z

I updated the reference plot from my previous comment (it was wrong because I forgot to cherry pick one of the other memory patches in my reference branch).

adonath · 2023-12-15T00:23:27Z

I updated the reference plot from my previous #4989 (comment) (it was wrong because I forgot to cherry pick one of the other memory patches in my reference branch).

Thanks, that's even worse. There is a memory leak...

adonath · 2023-12-15T00:31:24Z

and the coordinates are not re-computed as they are cached only one time with the lru_cache of the original geom

Yes, but no...

As we create a cutout geom for each source, it is a new object. The first time we access the coordinates on the cutout geom, they are re-computed and cached. Worst case, when there are a lot of source and large support, we duplicate all the coordination information. In this case it might be better to create a cutout from the original larger coordinate arrays. But this highly depends on the analysis scenario, for few sources computing the coordinates on the cutouts is probably much better.

from what I understand the cutout creates a new map with a new data array :

Yes, you are right. I might be worth to only work with view and give up on the not fully contained cutouts, but trim instead.

QRemy · 2023-12-15T08:39:21Z

Thanks, that's even worse. There is a memory leak...

For this test I have 32 sources, and each new peak corresponds to the npred computation of a source

QRemy · 2023-12-15T10:39:37Z

As we create a cutout geom for each source, it is a new object. The first time we access the coordinates on the cutout geom, they are re-computed and cached. Worst case, when there are a lot of source and large support, we duplicate all the coordination information. In this case it might be better to create a cutout from the original larger coordinate arrays. But this highly depends on the analysis scenario, for few sources computing the coordinates on the cutouts is probably much better.

Right, at least we could save memory on coordinates caching by returning a 2D meshgrid for lon, lat if the geom is regular and a 1d array for the axes instead of a ND meshgrid for each axis. I will try that in another PR.

adonath · 2023-12-15T14:07:05Z

For this test I have 32 sources, and each new peak corresponds to the npred computation of a source

Ok ,thanks for clarifying. It thought it was multiple npred evaluations. But it is only one with 32 sources.

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

registerrier

Thanks @QRemy . This looks good. No comment from my side.

QRemy assigned registerrier Dec 14, 2023

QRemy requested review from adonath and registerrier December 14, 2023 22:37

QRemy added the performance Performance improvement label Dec 14, 2023

QRemy added 3 commits January 1, 2024 15:07

avoid saving exposure cutout in evaluator

d6d6dab

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

fix

681a71a

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

lazy cutout width and position

f779d07

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

QRemy force-pushed the memory_opti branch 2 times, most recently from 1758c23 to 945542f Compare January 1, 2024 15:18

QRemy added 2 commits January 1, 2024 16:24

cutout_view

769a4ac

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

exposure view

8f708ed

Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>

QRemy force-pushed the memory_opti branch from 945542f to 8f708ed Compare January 1, 2024 15:27

QRemy added this to the 1.2 milestone Jan 1, 2024

registerrier approved these changes Jan 16, 2024

View reviewed changes

registerrier added the trigger-benchmarks run profiler in gammapy-benchmarks label Jan 18, 2024

registerrier merged commit 9faaff3 into gammapy:main Jan 18, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of MapEvaluator #4989

Reduce memory usage of MapEvaluator #4989

QRemy commented Dec 14, 2023

QRemy commented Dec 14, 2023 •

edited

codecov bot commented Dec 14, 2023 •

edited

adonath commented Dec 14, 2023

QRemy commented Dec 14, 2023 •

edited

QRemy commented Dec 15, 2023

adonath commented Dec 15, 2023

adonath commented Dec 15, 2023

QRemy commented Dec 15, 2023 •

edited

QRemy commented Dec 15, 2023 •

edited

adonath commented Dec 15, 2023

registerrier left a comment

Reduce memory usage of MapEvaluator #4989

Reduce memory usage of MapEvaluator #4989

Conversation

QRemy commented Dec 14, 2023

QRemy commented Dec 14, 2023 • edited

codecov bot commented Dec 14, 2023 • edited

Codecov Report

adonath commented Dec 14, 2023

QRemy commented Dec 14, 2023 • edited

QRemy commented Dec 15, 2023

adonath commented Dec 15, 2023

adonath commented Dec 15, 2023

QRemy commented Dec 15, 2023 • edited

QRemy commented Dec 15, 2023 • edited

adonath commented Dec 15, 2023

registerrier left a comment

Choose a reason for hiding this comment

QRemy commented Dec 14, 2023 •

edited

codecov bot commented Dec 14, 2023 •

edited

QRemy commented Dec 14, 2023 •

edited

QRemy commented Dec 15, 2023 •

edited

QRemy commented Dec 15, 2023 •

edited