New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix min energy handling in SpectrumEnergyGrouper #1210

Merged
merged 8 commits into from Nov 28, 2017

Conversation

Projects
None yet
2 participants
@jjlk
Contributor

jjlk commented Nov 15, 2017

Hi @cdeil , @joleroi ,

I'm starting this PR to fix the SpectrumEnergyGrouper. First commit to simplify the existing tests by using dummy value and not by using simulated data.

OKay as a start?

@jjlk jjlk requested a review from cdeil Nov 15, 2017

@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 15, 2017

Contributor

@cdeil, failing test added

Contributor

jjlk commented Nov 15, 2017

@cdeil, failing test added

@cdeil cdeil added this to the 0.7 milestone Nov 15, 2017

@cdeil cdeil added bug tests labels Nov 15, 2017

@cdeil cdeil self-assigned this Nov 15, 2017

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 15, 2017

Member

@jjlk - Thanks for improving these tests!

In test_edges you take a seg fixture as input, but then you don't use it, but make your own seg.

How about deleting the seg fixture, and instead in each test function create a seg from obs. Yes, it duplicates ~ 2 lines of codes in each test function, but there might be tests where we don't want the compute_range_safe method call which is now in the seg fixture function, and if we remove that it's just one line.

Apart from that, I still don't get what's up with test_edges. First of all, the start where you have
# Bin selection basically just tests Numpy, we don't need that. Instead you can just put one line, no?

ebounds = [1, 2, 3, 4, 5] * u.TeV

And then for the line where you have # not ok for the result: I see at https://travis-ci.org/gammapy/gammapy/jobs/302600538#L2686 that the first energy bin is put as underflow by the grouper, but you want it to be normal?

I think that's a reasonable expectation, maybe we can / should change the behaviour!

But it is relying on a float number comparison: 1 TeV is at the exact edge, and what's happening is that compute_groups_fixed at

self.groups.apply_energy_min(energy=ebounds[0])

calls self.groups.apply_energy_min(energy=ebounds[0]) (where ebounds[0] is 1 TeV`, which calls here:
if energy in group.energy_range:

For me, this is not a quick fix to do. I would have to take an hour or two, fully understand the problem and code again, and then think about what a good solution is. It could be just a change in logic and addition of unit-tests for the lower-level methods I linked to in this comment to make them well-understood and well-behaved, or it could be re-thinking the high-level interface: it's inherently tricky to take multiple float ebounds in the analysis and then do float-based comparisons and to rely on behaviour at the bin edges, and instead only doing the grouping once and then passing groups in the high-level interface around instead of ebounds might be a more robust solution?

@jjlk - Can you do this deep dive into the spectrum energy grouping and start to make it better by adding lower-level tests that show the current behaviour (and then feel free to change it if there are bugs or it's non-intuitive)? If yes, great! If no, assign to me and I'll try to find time next week.

Member

cdeil commented Nov 15, 2017

@jjlk - Thanks for improving these tests!

In test_edges you take a seg fixture as input, but then you don't use it, but make your own seg.

How about deleting the seg fixture, and instead in each test function create a seg from obs. Yes, it duplicates ~ 2 lines of codes in each test function, but there might be tests where we don't want the compute_range_safe method call which is now in the seg fixture function, and if we remove that it's just one line.

Apart from that, I still don't get what's up with test_edges. First of all, the start where you have
# Bin selection basically just tests Numpy, we don't need that. Instead you can just put one line, no?

ebounds = [1, 2, 3, 4, 5] * u.TeV

And then for the line where you have # not ok for the result: I see at https://travis-ci.org/gammapy/gammapy/jobs/302600538#L2686 that the first energy bin is put as underflow by the grouper, but you want it to be normal?

I think that's a reasonable expectation, maybe we can / should change the behaviour!

But it is relying on a float number comparison: 1 TeV is at the exact edge, and what's happening is that compute_groups_fixed at

self.groups.apply_energy_min(energy=ebounds[0])

calls self.groups.apply_energy_min(energy=ebounds[0]) (where ebounds[0] is 1 TeV`, which calls here:
if energy in group.energy_range:

For me, this is not a quick fix to do. I would have to take an hour or two, fully understand the problem and code again, and then think about what a good solution is. It could be just a change in logic and addition of unit-tests for the lower-level methods I linked to in this comment to make them well-understood and well-behaved, or it could be re-thinking the high-level interface: it's inherently tricky to take multiple float ebounds in the analysis and then do float-based comparisons and to rely on behaviour at the bin edges, and instead only doing the grouping once and then passing groups in the high-level interface around instead of ebounds might be a more robust solution?

@jjlk - Can you do this deep dive into the spectrum energy grouping and start to make it better by adding lower-level tests that show the current behaviour (and then feel free to change it if there are bugs or it's non-intuitive)? If yes, great! If no, assign to me and I'll try to find time next week.

@cdeil

Basically it should be possible to write a lower-level test (using the lower-level classes, not the grouper) with ~ 5 lines of code that just uses one energy bin and results in "underflow" where you expect "normal"?

@jjlk - You think this is a bug and that behaviour should change, no? Can you spot a place where to make a change to get more sensible behaviour?

Show outdated Hide outdated gammapy/spectrum/tests/test_energy_group.py Outdated
@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 16, 2017

Contributor

Hi @cdeil,
I removed the seg fixture and add a better test, meaning: what we should have when energy bounds corresponds to edges of the binning of the data.

I think that we want this to work, no? If my data is binned as [0.1, 0.2, 0.3...,1] TeV and that I want flux points for energy bins corresponding to [0.2, 0.3...,0.9] TeV, the current code remove the lowest and highest bin. Which is shitty no? This use case exists, for example, when someone wants to select bins according to sane criteria, e.g., minimal excess, minimal signifiance, systematics on background, ETC!!!

Do you agree?

I'm not sure it's easy to deal with that in the current implmentation. If I found a dirty solution, are you okay to 'pythoned' it?

Contributor

jjlk commented Nov 16, 2017

Hi @cdeil,
I removed the seg fixture and add a better test, meaning: what we should have when energy bounds corresponds to edges of the binning of the data.

I think that we want this to work, no? If my data is binned as [0.1, 0.2, 0.3...,1] TeV and that I want flux points for energy bins corresponding to [0.2, 0.3...,0.9] TeV, the current code remove the lowest and highest bin. Which is shitty no? This use case exists, for example, when someone wants to select bins according to sane criteria, e.g., minimal excess, minimal signifiance, systematics on background, ETC!!!

Do you agree?

I'm not sure it's easy to deal with that in the current implmentation. If I found a dirty solution, are you okay to 'pythoned' it?

@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 19, 2017

Contributor

Hi @cdeil,
I added lower level tests for the SpectrumEnergyGroups class and I added one function to flag and merge the bins with underflow/underflow. This function is then called in the SpectrumEnergyGroupMaker class in the compute_groups_fixedfunction, just before the actual merging.

What do you think?

Contributor

jjlk commented Nov 19, 2017

Hi @cdeil,
I added lower level tests for the SpectrumEnergyGroups class and I added one function to flag and merge the bins with underflow/underflow. This function is then called in the SpectrumEnergyGroupMaker class in the compute_groups_fixedfunction, just before the actual merging.

What do you think?

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 20, 2017

Member

@jjlk - Thanks!

I'll have a look at this today and probably do some more coding / cleanup for the energy grouping. At least there's this one fail: https://travis-ci.org/gammapy/gammapy/jobs/304390957#L2667

Member

cdeil commented Nov 20, 2017

@jjlk - Thanks!

I'll have a look at this today and probably do some more coding / cleanup for the energy grouping. At least there's this one fail: https://travis-ci.org/gammapy/gammapy/jobs/304390957#L2667

@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 21, 2017

Contributor

Hi @cdeil ,
I fixed the bug.

Concerning the failling test, test_spectrum_pipe.py:

There is a test to assert the value of the first flux point for the energy binning:
`EnergyBounds.equal_log_spacing(1, 50, 4, 'TeV') which gives the following flux point:

FluxPoints(sed_type="dnde", n_points=4)
    e_ref         e_min         e_max     ...    dnde_errp       dnde_errn   
     TeV           TeV           TeV      ... 1 / (cm2 s TeV) 1 / (cm2 s TeV)
------------- ------------- ------------- ... --------------- ---------------
 1.6681005372 1.13646366639 2.44843674682 ...             nan             nan
4.08423865267 2.44843674682 6.81292069058 ...             nan             nan
10.6605049898 6.81292069058  16.681005372 ...             nan             nan
27.8255940221  16.681005372 46.4158883361 ...             nan             nan

The first bin starts at 1.13646366639 TeV. The binning for the spectral extraction is (keeping values above 1 TeV and 55 TeV):

  [... 1.00000000e+00   1.13646367e+00   1.29154967e+00   1.46779927e+00
  1.66810054e+00   1.89573565e+00   2.15443469e+00   2.44843675e+00
  2.78255940e+00   3.16227766e+00   3.59381366e+00   4.08423865e+00
  4.64158883e+00   5.27499706e+00 ...] TeV

After the correction of the bug, the output of the flux points gives:

FluxPoints(sed_type="dnde", n_points=4)
    e_ref         e_min         e_max     ...    dnde_errp       dnde_errn   
     TeV           TeV           TeV      ... 1 / (cm2 s TeV) 1 / (cm2 s TeV)
------------- ------------- ------------- ... --------------- ---------------
1.56474814166           1.0 2.44843674682 ...             nan             nan
4.08423865267 2.44843674682 6.81292069058 ...             nan             nan
10.6605049898 6.81292069058  16.681005372 ...             nan             nan
27.8255940221  16.681005372 46.4158883361 ...             nan             nan

So the test was also suffering of the bug.

Oké to update the value?

Contributor

jjlk commented Nov 21, 2017

Hi @cdeil ,
I fixed the bug.

Concerning the failling test, test_spectrum_pipe.py:

There is a test to assert the value of the first flux point for the energy binning:
`EnergyBounds.equal_log_spacing(1, 50, 4, 'TeV') which gives the following flux point:

FluxPoints(sed_type="dnde", n_points=4)
    e_ref         e_min         e_max     ...    dnde_errp       dnde_errn   
     TeV           TeV           TeV      ... 1 / (cm2 s TeV) 1 / (cm2 s TeV)
------------- ------------- ------------- ... --------------- ---------------
 1.6681005372 1.13646366639 2.44843674682 ...             nan             nan
4.08423865267 2.44843674682 6.81292069058 ...             nan             nan
10.6605049898 6.81292069058  16.681005372 ...             nan             nan
27.8255940221  16.681005372 46.4158883361 ...             nan             nan

The first bin starts at 1.13646366639 TeV. The binning for the spectral extraction is (keeping values above 1 TeV and 55 TeV):

  [... 1.00000000e+00   1.13646367e+00   1.29154967e+00   1.46779927e+00
  1.66810054e+00   1.89573565e+00   2.15443469e+00   2.44843675e+00
  2.78255940e+00   3.16227766e+00   3.59381366e+00   4.08423865e+00
  4.64158883e+00   5.27499706e+00 ...] TeV

After the correction of the bug, the output of the flux points gives:

FluxPoints(sed_type="dnde", n_points=4)
    e_ref         e_min         e_max     ...    dnde_errp       dnde_errn   
     TeV           TeV           TeV      ... 1 / (cm2 s TeV) 1 / (cm2 s TeV)
------------- ------------- ------------- ... --------------- ---------------
1.56474814166           1.0 2.44843674682 ...             nan             nan
4.08423865267 2.44843674682 6.81292069058 ...             nan             nan
10.6605049898 6.81292069058  16.681005372 ...             nan             nan
27.8255940221  16.681005372 46.4158883361 ...             nan             nan

So the test was also suffering of the bug.

Oké to update the value?

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 21, 2017

Member

@jjlk - Yes!

Is the behaviour at the high end as you expect? I.e. is 46.4 the end of the last "fine bin" that fully falls into the last "requested" flux point?

Didn't we discuss that the behaviour should be to only look to the left edge of the "fin bin" when assigning to flux point energy bins? Is this how it works in the middle already in this case?

I don't really have an opinion what to do at the upper end, i.e. whether to keep or remove the last "fine bin". This is just a question to you @jjlk to please review this test while you update the numbers (and to maybe leave a one-line comment saying what is going on if it's non-obvious like with the last point).

Member

cdeil commented Nov 21, 2017

@jjlk - Yes!

Is the behaviour at the high end as you expect? I.e. is 46.4 the end of the last "fine bin" that fully falls into the last "requested" flux point?

Didn't we discuss that the behaviour should be to only look to the left edge of the "fin bin" when assigning to flux point energy bins? Is this how it works in the middle already in this case?

I don't really have an opinion what to do at the upper end, i.e. whether to keep or remove the last "fine bin". This is just a question to you @jjlk to please review this test while you update the numbers (and to maybe leave a one-line comment saying what is going on if it's non-obvious like with the last point).

@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 21, 2017

Contributor

Hi @cdeil,
No failures anymore. Merge?

Contributor

jjlk commented Nov 21, 2017

Hi @cdeil,
No failures anymore. Merge?

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 22, 2017

Member

@jjlk - Thanks for all your work on this!

I'm working on improving this further now. Will merge later today!

Member

cdeil commented Nov 22, 2017

@jjlk - Thanks for all your work on this!

I'm working on improving this further now. Will merge later today!

@cdeil cdeil dismissed their stale review via bcf1082 Nov 22, 2017

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 22, 2017

Member

@jjlk - It's only half-done, but I think this is an improvement: bcf1082

Especially I think that we should write grouping tests like this, where it's clear just from looking at the TestSpectrumEnergyGroups class what the test case is:
bcf1082#diff-5b08a003b96d7f613071e60b232107f2R123

What's missing is to reduce and rewrite the many methods that apply emin / emax or energy binnings to much fewer / simpler code (and to remove the EnergyRange class completely) and to work with groups objects in those methods and the tests. @jjlk - If you have time, go ahead. Otherwise I'll come back to this and finish it on Friday (tomorrow is HGPS).

Member

cdeil commented Nov 22, 2017

@jjlk - It's only half-done, but I think this is an improvement: bcf1082

Especially I think that we should write grouping tests like this, where it's clear just from looking at the TestSpectrumEnergyGroups class what the test case is:
bcf1082#diff-5b08a003b96d7f613071e60b232107f2R123

What's missing is to reduce and rewrite the many methods that apply emin / emax or energy binnings to much fewer / simpler code (and to remove the EnergyRange class completely) and to work with groups objects in those methods and the tests. @jjlk - If you have time, go ahead. Otherwise I'll come back to this and finish it on Friday (tomorrow is HGPS).

@jjlk

This comment has been minimized.

Show comment
Hide comment
@jjlk

jjlk Nov 22, 2017

Contributor

@cdeil ,
Yes, but can we merge to have a code that is less buggy? I don't have time to implement things this week.

Contributor

jjlk commented Nov 22, 2017

@cdeil ,
Yes, but can we merge to have a code that is less buggy? I don't have time to implement things this week.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Nov 28, 2017

Member

As discussed with @jjlk on Slack, I'm merging this in now. There's quite a bit more work to do for the energy grouping and spectral point computation, but that will have to come later via follow-up PRs when someone has time.

Member

cdeil commented Nov 28, 2017

As discussed with @jjlk on Slack, I'm merging this in now. There's quite a bit more work to do for the energy grouping and spectral point computation, but that will have to come later via follow-up PRs when someone has time.

@cdeil cdeil merged commit d9b29ff into gammapy:master Nov 28, 2017

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@cdeil cdeil changed the title from Fix SpectrumEnergyGrouper to Fix min energy handling in SpectrumEnergyGrouper Nov 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment