bugfixes for the save_all command #98

DougBurke · 2015-09-09T20:04:50Z

Note

This PR is not going to be merged as is, since #100 has been merged (which includes a small number of the changes here, as well as removing the set_session code). The plan is to add a new PR with the remaining changes, once the CIAO 4.8 release has settled down.

Release Notes

This commit fixes and enhances the output of the save_all command.

User-visible changes:

added a new argument to save_all: if outfile is None then the outfh argument is used
to define the output handle (the argument can be any file-like argument, such as a file
handle like sys.stdout or the output of open, or a StringIO object)
setting the clobber argument to save_all now means that the output file (the outfile
argument, if not None) is deleted if it already exists; prior to this, the file would be
appended to instead
the source expression is now saved correctly for most cases (e.g. when not using
set_full_model); this is bug save_all does not save the source model for PHA data #97 but also affects non-PHA data sets
the background model expression was not always written out correctly when using PHA
data sets
quality and grouping arrays of PHA data sets are now stored as 16-byte integers rather
than a floating-point value (this has no affect on the results, but matches the OGIP standard)
fixed up saving the grouping and quality arrays of background PHA data sets (this would only
be an issue if the background is being fit, rather than subtracted)
basic data sets created with the load_arrays function are now written out by save_all
as part of the script; this is intended for small datasets and may have problems with
precision if used with floating-point arrays
calls to load_psf are now correctly restored (they may not have been written out correctly
if multiple data sets were loaded)
user models are now written out to disk; this consists of two parts:
- writing out the function that defines the model, which may or may not be possible (if
  not, a place-holder function is added to the output and a warning displayed).
- the necessary calls to load_user_model and add_user_pars are now included in
  the output
the Python code created by save all has undergone several minor changes:
- it now explicitly imports the sherpa.astro.ui module, so that it can be run from the
  IPython prompt using the %run <filename> command, or directly as python <filename>
- it uses the create_model_component function rather than eval to create model
  components (this is CXC bug 12146)
- many optional arguments to functions are now given as name=value rather than
  being a positional argument, to make it clearer what the script is doing.
- calls to load_data have been replaced by more-specific versions - e.g. load_pha
  and load_image - if appropriate
- there have seen several minor syntactic clean ups to better follow the suggestions from PEP8

When writing out code that defines a user-model, there is no attempt to make sure that
modules used by the function are available. These will need to be added, either directly
or imported, manually to the output.

Internal changes:

adds a test suite; this requires a small change to the test_astro.py test script just to make
sure that the default XSPEC settings are restored on test tear-down (to avoid the new tests
from failing). The test suite covers some of the basic situations expected to be served by
save_all - such as basic arrays, PHA datasets - but does not cover all functionality
moves the code out to sherpa/astro/ui/serialize.py from sherpa/astro/ui/utils.py, as the
latter is rather large
code has been refactored to promote sharing and improve readability
code has seen PEP 8 fixes and other simplifications, such as:
- replace x == True and y == False by x and not y
- remove extra brackets in conditionals
- use isinstance(x, basestring) rather than type(x) == str, which has been a problem for
  other parts of the Sherpa UI when given a string stored in a numpy array (since these
  aren't treated as str but do match the isinstance check)

Notes

I believe the save_session command is either redundant or an experiment that was never finished. It can probably be marked as deprecated, but that can be considered later. This is why the release notes do not mention save_session, even though virtually-all the changes are relevant for it.

The test suite has only been tested on one machine. It is likely that it will need changing to account for running on different configurations (in particular 32 vs 64 bit or different OS values); in particular, the regression tests that ensure the textual output is correct may need updating to allow numerical differences or changed to a different approach.

ADDED NOTE the tests include regression tests, where the output is compared against an expected string. These strings are currently stored within test_serialize.py itself, but it may make sense to move them into the sherpa-test-data repository, once the patch has been accepted (and the approach vindicated by more testing, to make sure it's worth approach).

Not all functionality is tested - e.g. serializing sessions that use image data, or an iterative fit, or more corner cases in various options. This set of changes is large enough as is, and significantly improves the overall coverage of Sherpa (from around 50% to 55%) that I think it's worth looking at now. The coverage of serialize.py is 68% (when the equivalent before this PR was 0% for the code that is now in serialize.py).

A lot of the logic in serialize.py should really be moved into the classes, so that rather than this code being changed whenever a new statistic needs to be serialized (say), the object itself can be queried. I'm wondering about a Serializable class which could be imported from to provide a serialize method (names open to discussion), or if not a class, just make sure that each object we want to serialize provides a method. This is way outside the scope of this PR, but some of the refactoring done in this PR was done to highlight places where this would be useful.

The code will conflict once #91 is accepted (and possibly other commits). The commit history is also not ideal (since the tests are not added straight away, and there's some changes that could perhaps be squashed together). I submitted this on my own branch to provide more freedom in rebase-ing the commit.

Example

With the following code

from sherpa.astro import ui
ui.load_arrays(1, [1,2,3], [5,6,7])
ui.set_source(ui.const1d.m1)

the difference in the output of save_all from CIAO 4.7 (out.ciao47, which is what master gives) and this PR (out.pr98) is:

% diff out.ciao47 out.pr98 
1a2
> from sherpa.astro.ui import *
5c6,9
< load_data(1,"")

---
> load_arrays(1,
>             [1, 2, 3],
>             [5, 6, 7],
>             Data1D)
7c11
< ######### Set Image Coordinates 

---
> ######### Set Image Coordinates
21c25
< notice_id(1,"1.0000:3.0000")

---
> notice_id(1, "1.0000:3.0000")
44c48
< eval("const1d.m1")

---
> create_model_component("const1d", "m1")
60c64
< set_full_model(1, const1d.m1)

---
> set_source(1, const1d.m1)

which shows several of the enhancements and bug fixes, namely:

import of the sherpa.astro.ui module
simple datasets created by load_arrays are now written out to file
create_model_component is used rather than eval
the source model is written out correctly (using set_source and not set_full_model)

The sherpa.astro.ui.utils module is large enough, so the two routines involved with state serialization (save_all and save_session) have been moved out to a separate module (sherpa.astro.ui.serialize).

Remove ` == True` from boolean checks and replace `x == False` by `not x` (the latter form was also used in the Python output and has been changed there too).

This change was also done to the Python code that is created by these routines.

More refactoring of common code. This revealed a bug in the serialization of PSF models when the wrong is was assigned to the load_psf call in the ASCII output if multiple data sets are loaded and the PSF is not for the last model.

This adds a warning to the user when the serialization code is run, as well as the existing message added to the output file. The message now includes the model name.

Change the order of the checks on the model typename to reduce redundancy and make it clearer the logic of the checks.

The original code referred to self.the_source in two places; these were changed to state.the_source in an earlier commit. This change removes the "state." prefix as it's obviously a local variable, rather than taken from an object.

Ensure that Sherpa is loaded, so that there are more ways that the script can be used (e.g. "python foo.py").

Fix up the remaining code that converts a Sherpa data set identifier into a string.

Refactored code for serializing PHA files and fixed what appears to be a bug, where the grouping and quality arrays for background files were being taken from the source data set instead. This is only an issue for people who fit the background, rather than subtract it (since in the latter case the backgroun grouping and quality arrays are ignored). The Python serialization is also a bit-more explicit about the optional function arguments, in that the form name=value is now used in several places related to PHA data files, rather than relying on position.

This should enable easier testing (but this has not been implemented yet).

The OGIP standard has these two columns as 2-byte ints, whereas the default serialization tends to write them out as 8-byte floats, presumably because this is how the DataPHA object stores them.

Allow the save_all to write to a file-like object, primarily for testing but it may be useful for advanced use cases. Includes a minor fix, going from `type(foo) == string` to `isinstance(foo, basestring)`.

Change `type(foo) == str` to `isintance(foo, basestring)`.

This adds minimal tests: - store/restore the basic Sherpa configuration - ditto but after changing a few statistic and method settings - a very-basic PHA case The `test_astro.py` script was changed so as to restore the XSPEC settings (if XSPEC is enabled), since the threads it runs change the default settings, which can cause the serialize tests to fail.

This is CXC ticket #12146.

First attempt at storing data set with load_arrays, rather than read in from a file. This does not handle all cases, but is a start.

This is a fix for 40b6d9d - we want add_user_pars not add_user_model - and also adds in a test of loading in the save_all output for this test.

Determining hoe the model was set should probably be recorded in the state itself, rather than trying to reconstruct it here.

The output of save_all is not correct in this case, which means that the output can not be run (the model expression for the background component is not written out correctly). The test checks for the current, incorrect, output. The "restore" test - which loads in the output of save_all - is skipped for now.

The background model wasn't being queried correctly (incorrect use of arguments) and so could be written out as using set_bkg_full_ model (which does not serialize correctly for PHA data sets at this time) instead of set_bkg_source. This was seen for data sets with a non-default id (or if multiple backgronud components).

Remove the skipif constraint on the test_restore_pha_back now that the background model is restored correctly. Clean up the test. The background filtering has been updated to match the current behavior (since setting the source filter also sets the background filter).

DougBurke · 2015-09-09T23:14:33Z

Oops: looks like I need a few skipIf(no-xspec) decorators...

DougBurke · 2015-09-10T20:25:43Z

I have created a greatly simplified version of this patch that only addresses two of these changes, but it does fix a bug which renders save_all fairly useless for most cases. See PR #100

olaurino · 2016-01-05T22:52:34Z

Replaced by #138

DougBurke added 30 commits August 31, 2015 08:48

Move save_all into serialize module.

ac9393a

The sherpa.astro.ui.utils module is large enough, so the two routines involved with state serialization (save_all and save_session) have been moved out to a separate module (sherpa.astro.ui.serialize).

Clean up boolean checks.

fe7b314

Remove ` == True` from boolean checks and replace `x == False` by `not x` (the latter form was also used in the Python output and has been changed there too).

Remove un-needed brackets in conditionals.

c461e07

This change was also done to the Python code that is created by these routines.

Refactored common routines.

e365951

Refactor serialization of XSPEC settings.

6a2fb4f

Refactor saving the statistic and method settings

0be40d6

clobber means over-write, not append

c922abd

bug fix serializing load_psf

f497134

More refactoring of common code. This revealed a bug in the serialization of PSF models when the wrong is was assigned to the load_psf call in the ASCII output if multiple data sets are loaded and the PSF is not for the last model.

Warn user when skipping a user model.

38a7932

This adds a warning to the user when the serialization code is run, as well as the existing message added to the output file. The message now includes the model name.

Improve code clarity

3ef6cf6

Change the order of the checks on the model typename to reduce redundancy and make it clearer the logic of the checks.

More refactoring

380e682

Bug fix: remove namespace on variables

90fb1be

The original code referred to self.the_source in two places; these were changed to state.the_source in an earlier commit. This change removes the "state." prefix as it's obviously a local variable, rather than taken from an object.

Refactoring.

2e2d89f

Add sherpa import

300050d

Ensure that Sherpa is loaded, so that there are more ways that the script can be used (e.g. "python foo.py").

Refactor data serialization

2359ab6

Clean up.

31f4e0d

Fix up the remaining code that converts a Sherpa data set identifier into a string.

Refactor the saving of ARF/RMF file names.

ca6c80c

Minor clean up.

b4d0eb6

Minor clean up.

4acf9de

Minor clean up

c85f147

Pass around a file handle, not a file name.

7e36514

Internal rework to support file handles.

69b40ed

This should enable easier testing (but this has not been implemented yet).

Grouping/Quality arrays serialized as np.int16

8e246cb

The OGIP standard has these two columns as 2-byte ints, whereas the default serialization tends to write them out as 8-byte floats, presumably because this is how the DataPHA object stores them.

Add outfh argument to save_all

9f0fd5c

Allow the save_all to write to a file-like object, primarily for testing but it may be useful for advanced use cases. Includes a minor fix, going from `type(foo) == string` to `isinstance(foo, basestring)`.

Improve type-checking

2c3a0cd

Change `type(foo) == str` to `isintance(foo, basestring)`.

Use isinstance rather than a check on type

3387925

Use create_model_component rather than eval

b2f8dad

This is CXC ticket #12146.

Very minor rewrite of model-chosing model

d280fac

DougBurke added 8 commits September 5, 2015 11:00

Initial support for load_arrays

0b0de8e

First attempt at storing data set with load_arrays, rather than read in from a file. This does not handle all cases, but is a start.

Fix for saving user models.

cee61d7

This is a fix for 40b6d9d - we want add_user_pars not add_user_model - and also adds in a test of loading in the save_all output for this test.

Only use set_full_model with PHA data sets.

dbe5635

Determining hoe the model was set should probably be recorded in the state itself, rather than trying to reconstruct it here.

Add explicit test of grouping and quality arrays

25746a1

Adds comments about tests to add

12aec52

DougBurke added type:bug type:enhancement area:code labels Sep 9, 2015

olaurino added the 3 - Working label Sep 9, 2015

DougBurke mentioned this pull request Sep 9, 2015

save_all does not save the source model for PHA data #97

Closed

Fix tests to work when XSPEC module is not available

f9200b6

DougBurke mentioned this pull request Sep 10, 2015

Fix set_source output from save_all #100

Merged

olaurino modified the milestone: 4.8rc1 Oct 6, 2015

olaurino self-assigned this Oct 6, 2015

olaurino modified the milestones: 4.8rc1, 4.8.0 Oct 8, 2015

DougBurke added note:obsolete pr:hold labels Oct 23, 2015

olaurino removed the pr:hold label Dec 17, 2015

olaurino mentioned this pull request Jan 5, 2016

Bug/improve save all #138

Merged

olaurino closed this Jan 5, 2016

olaurino removed the 3 - Working label Jan 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfixes for the save_all command #98

bugfixes for the save_all command #98

DougBurke commented Sep 9, 2015

DougBurke commented Sep 9, 2015

DougBurke commented Sep 10, 2015

olaurino commented Jan 5, 2016

bugfixes for the save_all command #98

bugfixes for the save_all command #98

Conversation

DougBurke commented Sep 9, 2015

Note

Release Notes

Notes

Example

DougBurke commented Sep 9, 2015

DougBurke commented Sep 10, 2015

olaurino commented Jan 5, 2016