Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing values interpolation for Background3D #3219

Merged
merged 14 commits into from
Apr 2, 2021

Conversation

QRemy
Copy link
Contributor

@QRemy QRemy commented Feb 3, 2021

This PR adds missing values interpolation for Background3D in order to fix a bug found in CTA background IRF where the clipping of missing values in the ante-last bins of the energy axis was causing aberrantly high extrapolated values.

@QRemy QRemy requested a review from adonath February 3, 2021 15:32
@QRemy QRemy added the bug label Feb 3, 2021
@adonath
Copy link
Member

adonath commented Feb 3, 2021

Related: #2432

@QRemy QRemy added this to To do in gammapy.irf via automation Feb 3, 2021
@QRemy QRemy added this to the v0.19 milestone Feb 3, 2021
@adonath
Copy link
Member

adonath commented Feb 5, 2021

Conclusion from the dev meeting today:

  • Split this out into a little helper function like bkg = interpolate_missing_bkg_values(bkg=bkg) and add an option to the MapDatasetMaker to apply this fix if needed (I would prefer a default of not applying it, as it still seems to be a rare problem...)
  • I think the methods can reside in gammapy/irf/background.py for now

gammapy/utils/interpolation.py Outdated Show resolved Hide resolved
@QRemy
Copy link
Contributor Author

QRemy commented Feb 5, 2021

I added options to use the fix or not in Background3D (default is False) and MapDatasetMaker (default is True, otherwise we have to set another option in the config file for the high-level interface).

@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

Merging #3219 (49c6105) into master (76a7dfc) will increase coverage by 0.00%.
The diff coverage is 96.96%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3219   +/-   ##
=======================================
  Coverage   93.81%   93.82%           
=======================================
  Files         144      144           
  Lines       17754    17776   +22     
=======================================
+ Hits        16656    16678   +22     
  Misses       1098     1098           
Impacted Files Coverage Δ
gammapy/irf/core.py 90.70% <95.83%> (+0.66%) ⬆️
gammapy/makers/map.py 95.87% <100.00%> (+0.17%) ⬆️
gammapy/utils/interpolation.py 94.80% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76a7dfc...49c6105. Read the comment docs.

Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QRemy, I have left some comments, I noticed the methods also extrapolates to low energies. Can this be avoided? I guess it would be the more safe behaviour. I think in most cases the background turns over at low energies, however I'm not sure this always the case...

gammapy/irf/background.py Outdated Show resolved Hide resolved
gammapy/irf/background.py Outdated Show resolved Hide resolved
gammapy/makers/map.py Outdated Show resolved Hide resolved
"""

tag = "MapDatasetMaker"
available_selection = ["counts", "exposure", "background", "psf", "edisp"]

def __init__(self, selection=None, background_oversampling=None):
def __init__(
self, selection=None, background_oversampling=None, fix_background=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick check (analysis_2.ipynb tutorial) it seems the interpolation of missing values slows down the data reduction by ~20%, maybe don't apply the fix by default? It's only required if data is extrapolated...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now only the missing values within the valid range are interpolated, the zeros outside are filled so it should be faster.

gammapy/irf/tests/test_background.py Show resolved Hide resolved
gammapy/irf/background.py Outdated Show resolved Hide resolved
gammapy/irf/background.py Outdated Show resolved Hide resolved
@adonath
Copy link
Member

adonath commented Mar 19, 2021

Sorry again @QRemy, for the late response here. It had been on my list forever to try this out and check the implementation. I finally managed to do so. Can you please take a look at the following notebook: https://github.com/adonath/notebooks-public/blob/master/missing_value_interpolation.ipynb ? It runs on your interp_missing_data branch...

Here are a few things I found:

  • Your method does not work as expected: it leaves out the interpolation of some values, e.g. when the first value is zero as well?
  • I think the method is still too slow...
  • In the notebook I provide an alternative implementation, which (a) works as expected, (b) is faster and (c) works on arbitrary axis and ND data

In the beginning I was hesitating a bit do add something like this at all, but now I see that more and more DL3 data is produced (VERITAS, MAGIC, LST etc.) and I doubt we will get fully stable IRFs in the first productions. So now I agree more such a method is useful, possibly for other IRFs as well.

So if you like to continue here, I would propose to copy the implementation from the notebook I linked above and move it to the IRF base class as .interp_missing_data(axis_name=) and add the minimal example with the log values as a unit test. In general I prefer np.nan as a placeholder for invalid values, but maybe it could be an argument to the .interp_missing_data(axis_name=) function as well.

@adonath
Copy link
Member

adonath commented Mar 19, 2021

I think the implementation in the MapDatasetMaker can stay as is. Let's only expose it for the background now, where we are aware of the issue...

@QRemy
Copy link
Contributor Author

QRemy commented Mar 19, 2021

I'm ok with your implementation but I think we need mask = ~np.isfinite(data) | (data == 0.0) instead of mask = np.isnan(self.data) because the existing IRFs do not necessarily contain NaN for missing values.

Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @QRemy, no further comments from my side.

@adonath adonath merged commit a08a237 into gammapy:master Apr 2, 2021
gammapy.irf automation moved this from To do to Done Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
gammapy.irf
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants