Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfixes for the save_all command #98

Closed
wants to merge 45 commits into from

Conversation

DougBurke
Copy link
Contributor

Note

This PR is not going to be merged as is, since #100 has been merged (which includes a small number of the changes here, as well as removing the set_session code). The plan is to add a new PR with the remaining changes, once the CIAO 4.8 release has settled down.

Release Notes

This commit fixes and enhances the output of the save_all command.

User-visible changes:

  • added a new argument to save_all: if outfile is None then the outfh argument is used
    to define the output handle (the argument can be any file-like argument, such as a file
    handle like sys.stdout or the output of open, or a StringIO object)
  • setting the clobber argument to save_all now means that the output file (the outfile
    argument, if not None) is deleted if it already exists; prior to this, the file would be
    appended to instead
  • the source expression is now saved correctly for most cases (e.g. when not using
    set_full_model); this is bug save_all does not save the source model for PHA data #97 but also affects non-PHA data sets
  • the background model expression was not always written out correctly when using PHA
    data sets
  • quality and grouping arrays of PHA data sets are now stored as 16-byte integers rather
    than a floating-point value (this has no affect on the results, but matches the OGIP standard)
  • fixed up saving the grouping and quality arrays of background PHA data sets (this would only
    be an issue if the background is being fit, rather than subtracted)
  • basic data sets created with the load_arrays function are now written out by save_all
    as part of the script; this is intended for small datasets and may have problems with
    precision if used with floating-point arrays
  • calls to load_psf are now correctly restored (they may not have been written out correctly
    if multiple data sets were loaded)
  • user models are now written out to disk; this consists of two parts:
    • writing out the function that defines the model, which may or may not be possible (if
      not, a place-holder function is added to the output and a warning displayed).
    • the necessary calls to load_user_model and add_user_pars are now included in
      the output
  • the Python code created by save all has undergone several minor changes:
    • it now explicitly imports the sherpa.astro.ui module, so that it can be run from the
      IPython prompt using the %run <filename> command, or directly as python <filename>
    • it uses the create_model_component function rather than eval to create model
      components (this is CXC bug 12146)
    • many optional arguments to functions are now given as name=value rather than
      being a positional argument, to make it clearer what the script is doing.
    • calls to load_data have been replaced by more-specific versions - e.g. load_pha
      and load_image - if appropriate
    • there have seen several minor syntactic clean ups to better follow the suggestions from PEP8

When writing out code that defines a user-model, there is no attempt to make sure that
modules used by the function are available. These will need to be added, either directly
or imported, manually to the output.

Internal changes:

  • adds a test suite; this requires a small change to the test_astro.py test script just to make
    sure that the default XSPEC settings are restored on test tear-down (to avoid the new tests
    from failing). The test suite covers some of the basic situations expected to be served by
    save_all - such as basic arrays, PHA datasets - but does not cover all functionality
  • moves the code out to sherpa/astro/ui/serialize.py from sherpa/astro/ui/utils.py, as the
    latter is rather large
  • code has been refactored to promote sharing and improve readability
  • code has seen PEP 8 fixes and other simplifications, such as:
    • replace x == True and y == False by x and not y
    • remove extra brackets in conditionals
    • use isinstance(x, basestring) rather than type(x) == str, which has been a problem for
      other parts of the Sherpa UI when given a string stored in a numpy array (since these
      aren't treated as str but do match the isinstance check)

Notes

I believe the save_session command is either redundant or an experiment that was never finished. It can probably be marked as deprecated, but that can be considered later. This is why the release notes do not mention save_session, even though virtually-all the changes are relevant for it.

The test suite has only been tested on one machine. It is likely that it will need changing to account for running on different configurations (in particular 32 vs 64 bit or different OS values); in particular, the regression tests that ensure the textual output is correct may need updating to allow numerical differences or changed to a different approach.

ADDED NOTE the tests include regression tests, where the output is compared against an expected string. These strings are currently stored within test_serialize.py itself, but it may make sense to move them into the sherpa-test-data repository, once the patch has been accepted (and the approach vindicated by more testing, to make sure it's worth approach).

Not all functionality is tested - e.g. serializing sessions that use image data, or an iterative fit, or more corner cases in various options. This set of changes is large enough as is, and significantly improves the overall coverage of Sherpa (from around 50% to 55%) that I think it's worth looking at now. The coverage of serialize.py is 68% (when the equivalent before this PR was 0% for the code that is now in serialize.py).

A lot of the logic in serialize.py should really be moved into the classes, so that rather than this code being changed whenever a new statistic needs to be serialized (say), the object itself can be queried. I'm wondering about a Serializable class which could be imported from to provide a serialize method (names open to discussion), or if not a class, just make sure that each object we want to serialize provides a method. This is way outside the scope of this PR, but some of the refactoring done in this PR was done to highlight places where this would be useful.

The code will conflict once #91 is accepted (and possibly other commits). The commit history is also not ideal (since the tests are not added straight away, and there's some changes that could perhaps be squashed together). I submitted this on my own branch to provide more freedom in rebase-ing the commit.

Example

With the following code

from sherpa.astro import ui
ui.load_arrays(1, [1,2,3], [5,6,7])
ui.set_source(ui.const1d.m1)

the difference in the output of save_all from CIAO 4.7 (out.ciao47, which is what master gives) and this PR (out.pr98) is:

% diff out.ciao47 out.pr98 
1a2
> from sherpa.astro.ui import *
5c6,9
< load_data(1,"")

---
> load_arrays(1,
>             [1, 2, 3],
>             [5, 6, 7],
>             Data1D)
7c11
< ######### Set Image Coordinates 

---
> ######### Set Image Coordinates
21c25
< notice_id(1,"1.0000:3.0000")

---
> notice_id(1, "1.0000:3.0000")
44c48
< eval("const1d.m1")

---
> create_model_component("const1d", "m1")
60c64
< set_full_model(1, const1d.m1)

---
> set_source(1, const1d.m1)

which shows several of the enhancements and bug fixes, namely:

  • import of the sherpa.astro.ui module
  • simple datasets created by load_arrays are now written out to file
  • create_model_component is used rather than eval
  • the source model is written out correctly (using set_source and not set_full_model)

The sherpa.astro.ui.utils module is large enough, so the two
routines involved with state serialization (save_all and save_session)
have been moved out to a separate module (sherpa.astro.ui.serialize).
Remove ` == True` from boolean checks and replace `x == False` by
`not x` (the latter form was also used in the Python output and has
been changed there too).
This change was also done to the Python code that is created by these
routines.
More refactoring of common code. This revealed a bug in the serialization
of PSF models when the wrong is was assigned to the load_psf call in the
ASCII output if multiple data sets are loaded and the PSF is not for
the last model.
This adds a warning to the user when the serialization code is run,
as well as the existing message added to the output file. The message
now includes the model name.
Change the order of the checks on the model typename to reduce redundancy
and make it clearer the logic of the checks.
The original code referred to self.the_source in two places; these were
changed to state.the_source in an earlier commit. This change removes the
"state." prefix as it's obviously a local variable, rather than taken from
an object.
Ensure that Sherpa is loaded, so that there are more ways that the
script can be used (e.g. "python foo.py").
Fix up the remaining code that converts a Sherpa data set identifier
into a string.
Refactored code for serializing PHA files and fixed what appears to be
a bug, where the grouping and quality arrays for background files were
being taken from the source data set instead. This is only an issue for
people who fit the background, rather than subtract it (since in the
latter case the backgroun grouping and quality arrays are ignored).

The Python serialization is also a bit-more explicit about the optional
function arguments, in that the form name=value is now used in several
places related to PHA data files, rather than relying on position.
This should enable easier testing (but this has not been implemented yet).
The OGIP standard has these two columns as 2-byte ints, whereas the
default serialization tends to write them out as 8-byte floats,
presumably because this is how the DataPHA object stores them.
Allow the save_all to write to a file-like object, primarily for testing
but it may be useful for advanced use cases. Includes a minor fix, going
from `type(foo) == string` to `isinstance(foo, basestring)`.
Change `type(foo) == str` to `isintance(foo, basestring)`.
This adds minimal tests:

  - store/restore the basic Sherpa configuration
  - ditto but after changing a few statistic and method settings
  - a very-basic PHA case

The `test_astro.py` script was changed so as to restore the XSPEC
settings (if XSPEC is enabled), since the threads it runs change
the default settings, which can cause the serialize tests to fail.
First attempt at storing data set with load_arrays, rather than read
in from a file. This does not handle all cases, but is a start.
This is a fix for 40b6d9d - we want
add_user_pars not add_user_model - and also adds in a test of loading
in the save_all output for this test.
Determining hoe the model was set should probably be recorded in the
state itself, rather than trying to reconstruct it here.
The output of save_all is not correct in this case, which means that
the output can not be run (the model expression for the background
component is not written out correctly). The test checks for the current,
incorrect, output. The "restore" test - which loads in the output of
save_all - is skipped for now.
The background model wasn't being queried correctly (incorrect use
of arguments) and so could be written out as using set_bkg_full_ model
(which does not serialize correctly for PHA data sets at this time)
instead of set_bkg_source. This was seen for data sets with a non-default
id (or if multiple backgronud components).
Remove the skipif constraint on the test_restore_pha_back now that the
background model is restored correctly. Clean up the test. The background
filtering has been updated to match the current behavior (since setting
the source filter also sets the background filter).
@DougBurke
Copy link
Contributor Author

Oops: looks like I need a few skipIf(no-xspec) decorators...

@DougBurke
Copy link
Contributor Author

I have created a greatly simplified version of this patch that only addresses two of these changes, but it does fix a bug which renders save_all fairly useless for most cases. See PR #100

@olaurino olaurino modified the milestone: 4.8rc1 Oct 6, 2015
@olaurino olaurino self-assigned this Oct 6, 2015
@olaurino olaurino modified the milestones: 4.8rc1, 4.8.0 Oct 8, 2015
@olaurino olaurino removed the pr:hold label Dec 17, 2015
@olaurino olaurino mentioned this pull request Jan 5, 2016
@olaurino
Copy link
Member

olaurino commented Jan 5, 2016

Replaced by #138

@olaurino olaurino closed this Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants