New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save_all improvements for set_psf and load_xstable_model #1874
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1874 +/- ##
==========================================
+ Coverage 81.38% 81.64% +0.26%
==========================================
Files 75 75
Lines 27155 27321 +166
Branches 4119 4163 +44
==========================================
+ Hits 22099 22307 +208
+ Misses 4866 4826 -40
+ Partials 190 188 -2
... and 9 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
db95a1e
to
babfdbb
Compare
7126697
to
8fb77e9
Compare
Add tests of the current behavior related to sherpa#1789 and sherpa#1880. These are regression tests so will need updating once we address these issues.
The problem is that we do not include the "set_psf" call in the ASCII serialization. Since it's not obvious what the text will be we can't check the output serialization, but we can at least check that the PSF has not been applied when we restore the saved file.
Check - linked parameters - PHA with no response - PHA with type=counts (as part of the previous test) - PHA from CSC - "full model" handling (currently XFAIL) - table model - xspec table model (currently a regression test as the code does not create a valid save file) - pileup model (currently XFAIL because of inability to recreate data not read from a file) - dataspace1d/2d use (currently XFAIL) - example of load_data with an ascii file, which has lead to viwing interesting differences in the two I/O backends - PHA grating dataset (multiple backgrounds) using wavelength XFAIL see sherpa#320 - PHA2 file XFAIL see sherpa#1882 (issue sherpa#320 plus more) For those tests where we know they fail because the save_all output is wrong in some way, we ensure they fail in the expected manner. This is to make sure that they don't suddenly start failing for a different reason (due to an error elsewhere) which we don't see. This will only be noted when running with the --runxfail pytest flag.
In changing save_all I have found it useful to check that we are still failing in the same way for issues sherpa#320 and sherpa#1882, so make the current tests regression tests (so they pass, even though the output is incorrect) rather than mark them as xfail. We do not do this for all XFAIL tests.
This slightly-changes the save_all output for PHA background data but semantically it has the same meaning. Internally it cleans up the code as we were trying to support possibilities that we know can't happen (e.g. 2D data with backgrounds).
We know expicitly set the arguments (other than the dataset identifier) when calling set_analysis. There is no semantic change here, but it make code more future-proof (there are some changes I'd like to make to set_analysis at some point so may as well make sure they won't mess up save_all output).
Since we have the same form for the "banner" lines, we can encode this as the _output_banner routine. Also some f-string changes.
This is mostly f-string changes but there are a few other code-style changes.
This was a logical error in the existing code, coupled with incomplete testing. The fix is simple: we just need to loop through the session's _psf dictionary and use the key,value as the arguments to set_psf call(s). There is still something weird going on with the warning messages for this test. I don't understand but have got it to a point where it works for me.
The DataIMG case avoided setting a filter if it didn't filter any data, so let's apply that to the other data types. We use the same scheme for the source and background filters.
Add basic support for calling save_all for a session where one or more datasets were created manually - e.g. with load_arrays, dataspace1d, dataspace2d, or set_data. This only provides basic support, as things like header keywords are not copied over and there are likely PHA settings that are not handled. The aim is to provide basic support and hope that is enough to be usable, with additions added later. This really needs changes to the Data object class, to identify when a dataset has been changed from it's original / read-from-file values, as a) we can simplify a lot of the pha code if we don't need to reset grouping/quality b) there are cases where a used can have tweaked a file and it would be nice to catch that although I think there will always be possible problems (e.g. an on-disk change to the file), so this will never be perfect. The DataIMG restoration is more-complicated since we have to worry about data where the coord has been changed from logical, since we need to send in logical coordinates to the DataIMG constructor. Fortunately this is relatively easy (although testing shows some unrelated issues like sherpa#1880 and sherpa#1789).
We now make sure that all imports are done first. This is only relevant for cases which need to write out manually-created image files, and so need to also import the sherpa.astro.io.wcs.WCS symbol. It is not clear whether we require this symbol to be always present or it is optional (this is only relevant for the code generation, not the code created by save_all). This is actually a large code change, as we now create the output contents before writing it out to a file. There's more that could be done [*], but for now leave as just this change. [*] for instance, the "output state" could be built up by a class rather than just manually adding to a dictionary, but we don't yet need the class (e.g. to over-ride it)
There is no "functional" change to the output as it only removes header sections that are not needed (related to data input).
This avoids writing out banners for components and model expressions, and removes some blank lines that were being added. The approach taken to avoiding printing out the banned is rather hacky, but it works. The code could be re-written to only add the banner if we know we need it, but that is a larger change I didn't want to make here.
There's no need to add an "if <check>" for grouping and background subtraction since we know the result of the call, so we just make them instead.
Separate out some of the code for setting up datasets, and take advantage of knowing we have a DataPHA object to avoid quite so many try/except blocks. We know prefer to send the DataPHA object around, and use that API rather than the UI API, to access the data we need. As part of this we have removed more "blank" sections for PHA files (no semantic changes).
8a7fb30
to
4ad35c7
Compare
Rebased to address conflicts in the tests (so no code changes needed). |
@anetasie - can you try this PR out and see if it makes sense (assuming you use |
@DougBurke I don't use save_all, but I should! I just tried with 2D image models and 2D psf and the save_all file looks good. |
So, However, |
For @anetasie - you can compare the documentation at
to suggest improvements (technically you should change to the "current" version of the documentation but I don't believe it's changed since 4.15.1 for these two functions). |
@DougBurke my only addition to the docs would be an example of how to 'restore' the session using the |
So, re-running the output of
only works for ipython. If you are running python it fails. Python 3 explicitly took out an easy way to do what we want here ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I left some suggetions for typos in comments, just to proove to you that I indeed read the code. However, with thousands of lines of string formatting, I could still have missed something, but there are ample tests, so I fill that's sufficient.
There are ~2 actual comments on the code, but both are judgement calls that could be done either this way or that way. So, I don't expct them to accepted, I just wanted to bring it up while I was reading through the code.
|
||
# The par parameter is hard to type as it's Union[Parameter, | ||
# XSBaseParameter] but the latter is only defined if XSPEC support is | ||
# enabled. One way around this is to create a Protocol for defining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read up on Protocols, as I wanted to know if there was a way to make it more obscure and more complicated! Didn't use to be in the old days of Python, but now there is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd = 'load_user_model({}, "{}")'.format( | ||
mod.calc.__name__, modelname) | ||
_output(cmd, fh) | ||
_output(out, _reindent(pycode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this suggestion is applied it also needs a import textwrap
at the top.
_output(out, _reindent(pycode)) | |
textwarp.dedent(out, _reindent(pycode)) |
The behaviour is not exactly identical to _reindent
, e.g.
def f(a):
# Not sure why I would write this
return a
would be detended in line 1 and 3 by _reindent
, but not by textwrap.detent
. On the other hand, _reindent
above (I'm making the comment here because this is the only line that uses it, so if we switch to textwarp.detend
the entire definition of _reindent
could be removed) also has a comment that it's fragile, just in other ways, for example if the first line does not start with "def":
@some_decorator
def func(a):
pass
or
class A():
def __call__(self, a):
return a
would be treated correctly by textwrap.detend
, but not by _reindent
. I tried one of the the example, and we could actually get such a case here:
In [3]: inspect.getsource(func)
Out[3]: '@functools.cache\ndef func(a):\n return a\n'
On balance, I don't care. I'm always for removing code and using the standard library where possible, but I'm not going to argue that this problem is better than that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this code is that I haven't been able to easily trigger it, so if we can replace code with standard-specific stuff - I was looking when I wrote it originally but missed textwrap
. As I always do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not change this at this stage (better to go with the devil you know) given that although this line was changed in this PR it isn't related to the suggested change here. We can do this post the 4.16 release.
sherpa/astro/ui/serialize.py
Outdated
# This case shpuld not happen, but just in case. | ||
# | ||
if xspec is None: | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe?
# This case shpuld not happen, but just in case. | |
# | |
if xspec is None: | |
return | |
# This case shpuld not happen, but just in case. | |
assert xspec is not None |
potentially more disrupting to users if this case happens, but for us, we would know we've run into a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this use case I don't want the user to be disrupted in case something strange has happened.
"""Import the WCS symbol if not done already. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Import the WCS symbol if not done already. | |
"""Import the WCS symbol. | |
There is no test if it's done already. But there is also little harm in putting out that import statement more thna once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is a test (implicitly) because out["imports"]
is a set, so it only gets added once. So I'm keeping the original text.
Co-authored-by: Hans Moritz Günther <moritz.guenther@gmx.de>
Argh - one of the suggested changes broke a number of tests. Will have to fix (probably by reverting the problem change). |
Co-authored-by: wmclaugh <wmclaugh@cfa.harvard.edu>
Summary
Ensure that save_all includes set_psf calls, supports XSPEC table models, and improve the support for data sets created with load_arrays or dataspace1d/dataspace2d. Fixes #1873.
Details
This was meant to be a simple fix for #1873... It does not solve all known issues with save_all (e.g. #320 #1882), but they would require more time than I have to spend now (and they are not new problems),
There are a number of code clean ups here, including removing code that made no sense (bkg_id is only a parameter for PHA data sets), and changes to improve coverage testing of the save_all code (as this was the reason why this bug was not fixed, as the code was intended to work, it's just it was in the wrong place and ended up doing nothing, and with no tests we couldn't see it).
There is a minor change to the serialization output (set_analysis calls now include the parameter names, so
set_analysis(1, quantity="energy"", type='rate', factor=0)
, which does not change the behaviour of the script. I also added bkg_id argument names to group calls for background datasets (again no semantic difference).The actual fix for #1873 was "simple", and just required re-thinking the code organization slightly.
I also decided to simplify
notice_id(1, None, None)
tonotice_id(1)
, and also to not include a notice range if all the data is included (for backgrounds it's a bit different). Again, these do not change the behaviour of the script.Then there are changes to improves support for data that was created with
load_arrays
ordataspace1d/dataspace2d
rather than read from a file. There's two major changesload_arrays
, as it was currently untested and now has better coverageThen there are commits to remove "banners" which are not needed (these are comments, so there's no semantic changes to the output). One slightly-larger change is that we only include the XSPEC section if we actually use any XSPEC models.
There is clean up of grouping code, since the old version said "if you can group this data then group it" whereas we actually know that the data can be grouped, so we can ignore the check.
The internals keep on getting changed - but that's okay because the only routine we guarantee is
save_all
and everything else is private. There are more changes that could be done, but I'd want to do that once we work out how to deal withThe last commit adds support for XSPEC table models (which I don't think we have an issue about).
I am relying on the relatively good coverage of this module to check the code clean ups and re-orderings I did. The mypy work actually found one or two minor issues, and I used it as a check of things, but it is somewhat limited in scope without adding annotations to other parts of the system. It's also got some interesting issues regarding testing optional behaviour (in this case related to XSPEC), which limit the expressiveness of the mypy tests.
Examples
These are to show the change in the comments more than anything.
load_arrays 1D
CIAO 4.15
PR
Simple PHA
There's things I'd like to improve on this (we don't need to load the responses or background) but that's significantly more work, is partly related to #320 and #1882 but also #1885.
What you can see here are
group(1)
lineCIAO 4.15
this PR
dataspace1d
In this case the old code was unable to save the "state" correctly:
CIAO 4.15
This PR