Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing Bugs #13924 - SegFault with fake_pha #56

Closed
anetasie opened this issue Jun 26, 2015 · 6 comments
Closed

Importing Bugs #13924 - SegFault with fake_pha #56

anetasie opened this issue Jun 26, 2015 · 6 comments

Comments

@anetasie
Copy link
Contributor

SegFault with fake_pha on some systems was reported.
The original bug and the data are located here:
/data/lenin2/Bugs/Sherpa/Nustar/HD_15934

and the script 'test2.sph' fails on MAC.

This bug is specific to X-ray spectral analysis.
In CIAO version the screen output when running on the command line shows the following:

% sherpa 
----------------------------------------------------- 
Welcome to Sherpa: CXC's Modeling and Fitting Package 
----------------------------------------------------- 
CIAO 4.6 Sherpa version 2 Wednesday, January 22, 2014 
sherpa-1> set_source("faked", "xsapec.A") 
sherpa-2> fake_pha("faked", arf="tvcol_A.arf", rmf="tvcol_A.rmf", exposure=1.e8, 
grouped=False) 
/soft/ciao-4.6/binexe/sherpa: line 199: 28184 Segmentation fault (core 
dumped) ipython --profile sherpa 
@olaurino
Copy link
Member

olaurino commented Jul 8, 2015

Might this be related to issue #55, in particular this comment from @DougBurke?

@DougBurke
Copy link
Contributor

The data files (tvcol_A.arf and tvcol_A.rmf) plus an example script are available at https://gist.github.com/DougBurke/04a41831a70e7a59d3e4

@DougBurke
Copy link
Contributor

So, it appears that this is related to #62. The problem here is that the ARF grid is not contiguous, that is the energ_hi value for row i is not always the same as the energ_lo column for row i+1. It appears that this makes Sherpa send in a non-contiguous grid of lo/hi values to the models. Since the apec model is in use here, it triggers #62.

Using the data from the gist at https://gist.github.com/DougBurke/04a41831a70e7a59d3e4 and the latest master branch (so using astropy for file I/O, but the same result is seen in CIAO 4.7 with crates as the I/O back end):

In [1]: import numpy as np

In [2]: from sherpa.astro import ui
WARNING: imaging routines will not be available, 
failed to import sherpa.image.ds9_backend due to 
'RuntimeErr: DS9Win unusable: Could not find ds9 on your PATH'

In [3]: arf = ui.unpack_arf('tvcol_A.arf')

In [4]: elo = arf.energ_lo

In [5]: ehi = arf.energ_hi

In [6]: elo.size
Out[6]: 4096

Are the energy bins for the ARF (the energ_lo and energ_hi columns) ordered correctly?

a) Are the low edge of each bin less than the high edge?

In [7]: (elo < ehi).all()
Out[7]: True

b) Are the low edges monotonically increasing?

In [8]: (elo[1:] > elo[:-1]).all()
Out[8]: True

c) Does the high edge of bin i match the low edge of the next bin (i+1)?

In [9]: (elo[1:] == ehi[:-1]).all()
Out[9]: False

How many are there like this?

In [10]: (elo[1:] == ehi[:-1]).sum()
Out[10]: 3079

In [11]: idx, = np.where(elo[1:] != ehi[:-1])

In [12]: idx.size
Out[12]: 1016

So there's a lot. Where are they (the first few)?

In [13]: idx[:10]
Out[13]: array([ 1,  5,  7, 13, 18, 27, 30, 40, 43, 54])

Let's look at the first 10 pairs since this contains three problem bins:

In [14]: zip(elo,ehi)[:10]
Out[14]: 
[(1.6000000238418579, 1.6399999856948853),
 (1.6399999856948853, 1.6799999475479126),
 (1.6800000667572021, 1.7200000286102295),
 (1.7200000286102295, 1.7599999904632568),
 (1.7599999904632568, 1.7999999523162842),
 (1.7999999523162842, 1.8399999141693115),
 (1.8400000333786011, 1.8799999952316284),
 (1.8799999952316284, 1.9199999570846558),
 (1.9200000762939453, 1.9600000381469727),
 (1.9600000381469727, 2.0)]

Looking at the second and third pairs, we have

 (1.6399999856948853, 1.6799999475479126),
 (1.6800000667572021, 1.7200000286102295),

and you can see that there is a gap: the first bin ends at 1.679999... and the next one starts at 1.68000....

It appears that the RMF does not have these issues, as shown below (this time using the response-model created by Sherpa to access the lo/hi bins used for evaluating the model, since this is what the unpack_pha example uses, but I manually create a DataPHA dataset). First with just a RMF, where the grid has no gaps:

In [1]: import numpy as np

In [2]: from sherpa.astro import ui
WARNING: imaging routines will not be available, 
failed to import sherpa.image.ds9_backend due to 
'RuntimeErr: DS9Win unusable: Could not find ds9 on your PATH'

In [3]: x = np.arange(1, 4097, 1)

In [4]: y = np.ones(x.size)

In [5]: ui.load_arrays(1, x, y, ui.DataPHA)

In [6]: ui.load_rmf(1, 'tvcol_A.rmf')

In [7]: ui.set_source(1, ui.powlaw1d.mdl1)

In [8]: eval_model = ui.get_model(1)

In [9]: print(eval_model)
apply_rmf(powlaw1d.mdl1)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   mdl1.gamma   thawed            1          -10           10           
   mdl1.ref     frozen            1 -3.40282e+38  3.40282e+38           
   mdl1.ampl    thawed            1            0  3.40282e+38           

In [10]: xlo = eval_model.xlo

In [11]: xhi = eval_model.xhi

In [12]: (xlo < xhi).all()
Out[12]: True

In [13]: (xlo[1:] == xhi[:-1]).all()
Out[13]: True

The xlo/xhi grid is contiguous because the output of line 13 is True. If I now add in an ARF, we see gaps (the equivalent of line 13, which is line 20, is now False):

In [14]: ui.load_arf(1, 'tvcol_A.arf')

In [15]: eval_model2 = ui.get_model(1)

In [16]: print(eval_model2)
apply_rmf(apply_arf(powlaw1d.mdl1))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   mdl1.gamma   thawed            1          -10           10           
   mdl1.ref     frozen            1 -3.40282e+38  3.40282e+38           
   mdl1.ampl    thawed            1            0  3.40282e+38           

In [17]: xlo2 = eval_model2.xlo

In [18]: xhi2 = eval_model2.xhi

In [19]: (xlo2 < xhi2).all()
Out[19]: True

In [20]: (xlo2[1:] == xhi2[:-1]).all()
Out[20]: False

@DougBurke
Copy link
Contributor

This suggests that there's two issues here

  1. the crash, which is being dealt with in XSpec model evaluation for non-continuous grids (Xspec models that call sumdem, such as apec, can crash) #62 (and is apparently fixed in the next XSpec release)

  2. what to do with ARF and RMF grids that do not match (I think there has previously been discussion of this, a long time ago, but don't have anything to point to). It may be worth a new ticket.

DougBurke added a commit to DougBurke/sherpa that referenced this issue Jul 14, 2015
When run with the code and data files given in issue sherpa#56 - e.g.
https://gist.github.com/DougBurke/04a41831a70e7a59d3e4 - then the model
now fails with the following message

ValueError: Grid cells overlap: cell 360 (16 to 16.04) and cell 361 (16.04 to 16.08)
@DougBurke
Copy link
Contributor

So, if PR #63 is used - using @6f1884e - then we get

a) with a 32-byte, FORTRAN-style XSpec model

The following works (note: as this was an "occasional crash" previously, just running to the end doesn't necessarily mean it is fixed, but it is now only calling the model once with "gaps" being dealt with):

import numpy as np
from sherpa.astro import ui

# this is a 32-byte float model
ui.set_source("faked", "xsapec.A")

A.abundanc=1.0
A.redshift=0.0

arfdata = ui.unpack_arf("tvcol_A.arf")
rmfdata = ui.unpack_rmf("tvcol_A.rmf")

A.kT=1.0
A.norm=1.0
ui.fake_pha("faked", arf=arfdata, rmf=rmfdata, exposure=1.e8, grouped=False)

b) with a 64-byte, C++ style XSpec model

If the set_source line is changed to

ui.set_source("faked", "xspowerlaw.mdl2")

then the fake_pha call will now error out (in fact, any attempt to evaluate the model will do so) with the following error message

ValueError: Grid cells overlap: cell 360 (16 to 16.04) and cell 361 (16.04 to 16.08)

This particular row in the ARF is

% dmlist tvcol_A.arf data,clean rows=359:363
#  ENERG_LO             ENERG_HI             SPECRESP
        15.9200000763        15.9600000381       199.3535614014
        15.9600000381                 16.0       198.8935852051
                 16.0        16.0400009155       198.4293060303
        16.0399990082        16.0799999237       197.9608459473
        16.0799999237        16.1200008392       197.4918365479

So you can see that there is an overlap: 15.96 - 16.0, 16.0 - 16.040xxx, 16.0399xxx - 16.07999

The output format used by the error message doesn't show the overlap, since it has rounded down to 2 decimal places, so you just see 16.04 repeated.

So, as there is this overlap, why doesn't the xsapec model trigger the warning? Well, the code for the 32-byte float models now deals with the energy array as 32-byte floats not 64-byte floats, including checking for overlap - so using FLT_EPSILON rather than DBL_EPSILON. For this ARF, the overlap bins, such as shown below, are all within FLT_EPSILON, but not DBL_EPSILON, hence the difference in behaviour.

Note that the columns are stored as 32-byte floats in the ARF:

% dmlist tvcol_A.arf cols

--------------------------------------------------------------------------------
Columns for Table Block SPECRESP
--------------------------------------------------------------------------------

ColNo  Name                 Unit        Type             Range
   1   ENERG_LO             keV          Real4          -Inf:+Inf            label for field 1
   2   ENERG_HI             keV          Real4          -Inf:+Inf            label for field 2
   3   SPECRESP             cm**2        Real4          -Inf:+Inf            label for field 3

@DougBurke
Copy link
Contributor

Fixed by #101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants