New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in SpectrumStacker #1319

Merged
merged 9 commits into from Feb 28, 2018

Conversation

Projects
None yet
5 participants
@AnneLemiere

AnneLemiere commented Feb 21, 2018

This pull request is aked in order to fix a bug in spectrum/observation.py : in the stacking function, the alpha table was filled with 1 if NOFF=0. We now calculate the average alpha and fill the alpha table with this value when NOFF=0.
The bug appears fixed after several tests on MC and data.
There is a test (test_stack_backscal) in tests/test_observation.py

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 21, 2018

For reference: this PR should fix #1282, an there are other Gammapy issues that mention spectrum stacking, i.e. are possibly related: #1263 #958 #772. We can look at those again after this PR is merged, I'm not suggesting to do everything here, I just wanted to mention and thus "link" the issues with this PR.

@cdeil cdeil added the bug label Feb 21, 2018

@cdeil cdeil added this to the 0.7 milestone Feb 21, 2018

@joleroi

Thanks a lot for this PR, I left some inline comments. There are some technical problems that should be easy to fix.

I cannot judge the change in implementation, because I do not fully understand the problem. The best way to show that there is a problem is to add a failing test and then provide an implementation that fixes the test. Could you show me a simple example where the current SpectrumObservationStacker gives a wrong alpha value? I know this has been discussed before but a simple code example always helps

@@ -827,30 +827,41 @@ def stack_counts_spectrum(counts_spectrum_list):
quality=stacked_quality,
)
#modif AL : correct bug on alpha

This comment has been minimized.

@joleroi

joleroi Feb 21, 2018

Contributor

So far we did not put comments like this in the code base. If one wants to know who made a certain change at what time, one can rely on the git blame function, see for example
https://github.com/gammapy/gammapy/blame/master/gammapy/spectrum/observation.py

So unless there has been a change of policy that I am unaware of, I suggest to remove this line.

for obs in self.obs_list:
bkscal_on_data = obs.on_vector._backscal_array.copy()
bkscal_off_data = obs.off_vector._backscal_array.copy()
bkscal_off += (bkscal_on_data / bkscal_off_data) * obs.off_vector.counts_in_safe_range.value
alpha_sum += obs.alpha * obs.off_vector.counts_in_safe_range.sum()

This comment has been minimized.

@joleroi

joleroi Feb 21, 2018

Contributor

In this line you assume that alpha is not a function of energy, right? This need not be the case, and is not covered by the test you added. I don't fully understand the changes you made, but there could be a problem here

@@ -190,7 +244,23 @@ def test_verify_npred(self):
assert_allclose(npred_stacked.data.data, npred_summed)
def test_stack_backscal(self):

This comment has been minimized.

@joleroi

joleroi Feb 21, 2018

Contributor

This test reimplements the functionality of SpectrumObservationStacker, so it basically is a test if the two implementations are the same, not if they are correct.

A better way to set up tests is to come up with a test data set, for which we know what the test output should be and the assert to a given value, for example if you wanted to test

def add (a,b):
    return a + b

You would not setup the test like

val1 = 5
val2 = 10
result = val1 + val2
assert result == add(val1, va;2)

This test would pass even if the + operator in Python did something completely crazy, a better test would be

assert add(5, 10) == 15
data=np.arange(nbin)
data1=data
data2=data
data3=data

This comment has been minimized.

@joleroi

joleroi Feb 21, 2018

Contributor

This won't work since data1, data2, and data3 all point to the same object. This is because in Python what you are used to calling varibles, are not really varibales (i.e. memory locations) but just names for certain objects. This is also explained here.

Let me give you an example

>>> val = np.arange(10)
>>> a = val
>>> b = val
>>> a is b
True
>>> a[0] = -5
>>> b
array([-5,  1,  2,  3,  4,  5,  6,  7,  8,  9])

If you want to create two different objects you can do

>>> a = np.arange(10)
>>> b = a.copy()
>>> a is b
False
data3=data
data1[0]=0
data1[1]=0
data1[2]=0

This comment has been minimized.

@joleroi

joleroi Feb 21, 2018

Contributor

This could be simplified

data1[:3] = np.zeros(3)

@cdeil cdeil modified the milestones: 0.7, 0.8 Feb 23, 2018

@JouvinLea

This comment has been minimized.

Contributor

JouvinLea commented Feb 23, 2018

@joleroi :
Sorry I was not available this week. I reviewed the PR and I made changes myself on @AnneLemiere implementation especially to have an easier test. I also fixed some coding issues.

I think @AnneLemiere @AtreyeeS can comment on your question. The issue came from the fact that before when the number of OFF stacked events in a bin was equal to zero, the alpha of the bin was -1. This introduce biais in the final spectrum due to the calculation of the statistics for the fit containing (1+alpha) terms that was giving 0. @AnneLemiere and @AtreyeeS did some MC simulations where they saw this systematical bias so this PR fix that. Now you don’t put -1 for these bins but you calculate an alpha that is averaged on the sum of all the OFF events of each run. With the MC simulation, they get now a correct result.

@cdeil : When I am running all the test I have an issue related to gammapy extra whereas I am use to use the latest version. It concerns the test gammapy/irf/tests/test_psf_table.py : I got: E FileNotFoundError: [Errno 2] No such file or directory: '/Users/jouvin/Desktop/these/test_Gammapy/gammapy-extra/test_datasets/cta_1dc/caldb/data/cta/prod3b/bcf/South_z20_50h/irf_file.fits'

@cdeil

This comment has been minimized.

Member

cdeil commented Feb 23, 2018

@JouvinLea - You are probably not up-to-date with the gammapy-extra master.

From https://github.com/gammapy/gammapy-extra/commits/master I see that the latest commit there is gammapy/gammapy-extra@287d78d . So check your gammapy-extra branch / commit locally via git status or git log.

@AtreyeeS

This comment has been minimized.

Contributor

AtreyeeS commented Feb 23, 2018

@joleroi This is related to issue #1282

Simulating a simple spectrum (with a simple response and background), as in the snippet below, yields a consistently steeper index for the stacked spectrum. This was solved by @AnneLemiere in the PR.

#-------#

alpha1=0.2
time1=1

e_true = np.logspace(-2, 2.5, 109) * u.TeV
e_reco = np.logspace(-2,2, 79) * u.TeV

edisp1 = EnergyDispersion.from_gauss(e_true=e_true, e_reco=e_reco, sigma=0.2, bias=0,)
edisp2 = EnergyDispersion.from_gauss(e_true=e_true, e_reco=e_reco, sigma=0.2, bias=0,)

#Aeff
ee = EnergyBounds(np.logspace(-2,2.5,109)*u.TeV)
p1=6.85e9
p2=0.0891
p3=5.0e5

f = lambda x : p1 * (x/u.MeV)**(-p2) * np.exp(((-p3)*u.MeV)/x)
value=f(ee.log_centers.to('MeV'))
data = value * u.cm ** 2
aeff1=EffectiveAreaTable(ee.lower_bounds,ee.upper_bounds,data)

#======================================

#DEFINE MODEL

#Model for the source
index = 2.1 * u.Unit('')
amplitude = 3.5e-12 * u.Unit('cm-2 s-1 TeV-1')
reference = 1 * u.TeV
pwl = PowerLaw(index=index, amplitude=amplitude, reference=reference)
print(pwl)

Bkg :

bkg_index = 3 * u.Unit('')
bkg_amplitude = 4.31e-13 * u.Unit('cm-2 s-1 TeV-1')
reference = 1 * u.TeV
bkg_model = PowerLaw(index=bkg_index, amplitude=bkg_amplitude, reference=reference)

#======================================

#SIMULATE SPECTRA

livetime1 = time1 * u.h
livetime2= time2 * u.h

n_obs = 50
seeds = np.arange(n_obs)
sim1 = SpectrumSimulation(aeff=aeff1,
edisp=edisp1,
source_model=pwl,
livetime=livetime1,
background_model=bkg_model,
alpha=alpha1)

sim1.run(seeds)

sim2=sim1

Indiv_best_fit_index = []
Indiv_ind_err = []
Indiv_amp = []
Indiv_amp_err = []

best_fit_index = []
best_amp = []
best_ind_err = []
best_amp_err = []
i=0

while i<n_obs:
sim_result=(sim1.result[i],sim2.result[i])
#FIT SPECTRA:
pwl.parameters['index'].parmax = 10

for obs in sim_result:
    fit = SpectrumFit(obs, pwl.copy(), stat='wstat')
    fit.model.parameters['index'].value = 2
    fit.fit()
    fit.est_errors()
    Indiv_best_fit_index.append(fit.result[0].model.parameters['index'].value)
    Indiv_ind_err.append(np.sqrt(fit.result[0].model.parameters.covariance[0][0]))

    Indiv_amp.append(fit.result[0].model.parameters['amplitude'].value)      
    Indiv_amp_err.append(np.sqrt(fit.result[0].model.parameters.covariance[1][1]))
    #print(fit.result[0])
    #print(' Indiv_best_fit_index = ', Indiv_best_fit_index)
    
#i+=1    
#STACK SPECTRA 2 by 2
# Add the two spectra

obs_stacker = SpectrumObservationStacker(sim_result)
#print('sim_result is the list = ',sim_result)   
obs_stacker.run()

fit_stacked = SpectrumFit(obs_stacker.stacked_obs, pwl.copy(), stat='wstat')
fit_stacked.model.parameters['index'].value = 2
fit_stacked.fit()
fit_stacked.est_errors()
best_fit_index.append(fit_stacked.result[0].model.parameters['index'].value)
best_ind_err.append(np.sqrt(fit_stacked.result[0].model.parameters.covariance[0][0]))

best_amp.append(fit_stacked.result[0].model.parameters['amplitude'].value)
best_amp_err.append(np.sqrt(fit_stacked.result[0].model.parameters.covariance[1][1]))
#print('  best_fit_index =',best_fit_index)
print(i)
i+=1

#ax0, ax1 = Indiv_best_fit_index.plot(figsize=(8,8))
#ax0.set_ylim(0, 20)

#ax0, ax1 = best_fit_index.plot(figsize=(8,8))
#ax0.set_ylim(0, 20)

fix, axes = plt.subplots(1,2, figsize=(12, 4))
axes[0].hist(Indiv_best_fit_index)
axes[0].set_xlabel('Index for Indiv spectra')
axes[1].hist(best_fit_index)
axes[1].set_xlabel('Index for stacked spectra')
print("Indiv = ",np.mean(Indiv_best_fit_index),np.std(Indiv_best_fit_index))
print("Stacked = ",np.mean(best_fit_index),np.std(best_fit_index))

fix, axes = plt.subplots(1,2, figsize=(12, 4))
axes[0].hist(Indiv_amp)
axes[0].set_xlabel('Amp for Indiv spectra')
axes[1].hist(best_amp)
axes[1].set_xlabel('Amp for stacked spectra')
print("Indiv = ",np.mean(Indiv_amp),np.std(Indiv_best_fit_index))
print("Stacked = ",np.mean(best_amp),np.std(best_fit_index))

@joleroi

This comment has been minimized.

Contributor

joleroi commented Feb 28, 2018

@JouvinLea @AtreyeeS @AnneLemiere Thanks for the explanaition and fixing this bug!

@joleroi

This comment has been minimized.

Contributor

joleroi commented Feb 28, 2018

The remaining test failures are probably unrelated to this PR

@joleroi joleroi merged commit 48f0924 into gammapy:master Feb 28, 2018

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details

@cdeil cdeil modified the milestones: 0.8, 0.7 Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment