-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a file handle or Path object to be sent to the outfile parameter of fit #2054
Conversation
72a4c67
to
0a206c4
Compare
81ef507
to
e0234a6
Compare
@DougBurke What would be a new behavior? Current option
sherpa In [5]: fit(outfile="../out.txt") |
@anetasie : @DougBurke gave an example of the new behavior in #2054 (comment): Instead of a string "path/filename", you can now give it some other object that supports the same interface, for example you can capture the output into a variable using the |
So, setting outfile to a string has not changed (i.e. this does not change existing behaviour). The new behaviour is if you set outfile to a Path object or a file-handle. The from pathlib import Path
out = Path("/foo/bar/out.dat")
fit(outfile=out) You can instead pass it a file-handle, which is again a specialized feature. The one I want it for is because it makes testing things easier since you can say from io import StringIO
buffer = StringO()
fit(outfile=buffer)
out = buffer.getvalue()
print(out) [ie you don't need to write to a file just to get the data]. You can also send in an actual filehandle - e.g. with open("/foo/bar/baz.dat", "wt") as fh:
fit(outfile=fh)
fh.write("YOU CAN ADD EXTRA INFO TO THE FILE IF YOU REALLY WANT\n") but again this is likely only useful for specialized cases.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of this chain from Fit
to IterFit
to IterFitCallback
. It seems convoluted and I'm sure there must be a simpler way of organizing it, but that's not for this PR to litigate.
@DougBurke thanks for the description. |
Rather than 'from numpy import x, y, z' use np.x, np.y, np.z directly. This is a style choice, but I think makes the code clearer.
- use super().__init__() rather than hard-code the super-class name - add or remove the NoNewAttributesAfterInit class (for example, IterFit is removed because - we never called the super-class so never actually made use of it - it's an internal class we don't expect the user to come across - a subsequent change is awkward if we actually initialize the NewAttributesAfterInit class - avoid re-creating the list that get_thawed_pars() creates for us (this is due to a clean up in sherpa#1974 that created this routine) - create a n-element tuple by appending to a list and then converting that to a tuple, rather than just adding a single tuple (since mypy doesn't like this)
Use isinstance rather than type. There are other uses of type in this module but it't tricky because they need checking to make sure we don't accidentally change behaviour (as some classes may be sub-classed).
We know explicitly allow the output file for fit calls to be a Path object (previously we allowed it but not intentionally) and also allow a file-handle like object to be used. This then lets you say out = String() fit(outfile=out) txt = out.getvalue() without having to create an on-disk file. In doing this the internals of the IterFit class were changed, and the callback routine it uses was converted from a local definition to a top-level class. This change I think better separates the code logic, but is also to help address issue sherpa#2007 (where the multiprocessing code doesn't seem to like our use of local function definitions when using anything but the "fork" method). As part of this work some basic typing statements have been added. There is some fun in how best to indicate the type of a "file handle": the easy solution is typing.TextIO but this is "too large" as all we need are write() and close() methods, so we use a Protocol. See the discussion at python/typing#829 Now, this may not be ideal - as we don't currently export the protocol so how will it work with the documentation - but it would be easy to change to use TextIO.
Clarify the internal fitting code, where a tuple was being converted to a list to be able to edit an element. We now explicitly decompose the tuple, so it is more obvious what is going on. I had hoped to be able to simplify this routine even more, but the difference between the statistic object stored locally and the one stored in the callback means it is not as simple as I had hoped.
The class used a boolean to determine whether to call a function when we can just check if the function is defined.
The issue at hand is that if you *change* the stat method of the Fit structure then you will get a mis-match with the actual fitted statistic, which remains using the _iterfit.stat method (i.e. the original statistic). We only have two test cases that expose this difference, so take advantage of the new "write the fit results to StingIO" capability and show that the stat value has changed (i.e. all but the last row use the original statistic, in this case Chi2DataVar, so the stat value is ~ 8500, and only the last row uses the Cash or CStat value). This is intended as a regression test.
We can't make use of a simple context manager for handling the fit output since a) we may not have a handle/filename b) even if we do, we may not want to close it (i.e. if sent in by the user) We can at least try to emulate the "close even on an error" handling of a context manager with try/finally, as suggested by @hamogu This is a bit invasive visually, but it's not clear to me that breaking the code up into smaller chunks is worth it here.
We can make sure that changing the Fit stat also changes the associated _iterfit.stat field, which seems like it should be the correct thing to do. There are only two tests where this is an issue, and it's unclear whether the changes are good, bad, or not significant, since a) the tests appear to be regression tests b) we don't know how the test data was created, so we don't know what the true values are Note that the parameter values do not appear to have changed "hugely", but I haven't looked into the results too deeply. The tests do show the change - in that the statistic value reported in the "fit(outfile=...)" option now remains consistent (i.e. matches the expected value, assuming our new interpretation of the stat field of the Fit object is correct [which it should be, I claim]).
We can just call the callback routne after updating the model values, rather than manually setting the thawed pars, calculating the stat, and writing out the values to a file. This is ony possible now that sherpa#2063 has been addressed.
Although this is primarily used in the UI code it is used often enough that it's useful to have in sherpa.utils.types.
Rebased
There's been no other changes to the code in the rebase. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2054 +/- ##
==========================================
+ Coverage 86.64% 86.68% +0.03%
==========================================
Files 82 82
Lines 29766 29786 +20
Branches 4487 4492 +5
==========================================
+ Hits 25792 25821 +29
+ Misses 3863 3854 -9
Partials 111 111
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Summary
The fit call can now be sent a Path object or a file handle, as well as a string. Fix issue #2063 (ensure consistency when changing the statistic object used in a fit; this is not an issue for the UI code).
Details
Partly taken from #2015, #2022.
The idea is for a user at the ui layer to be able to say things like
To do this we need to be able to send in a
Path
or file-handle-like object to theFit.fit
method. This does change theIterFit
interface but this is an internal class which users do not interact with directly.To do this I've made some minor clean up changes. This is part of trying to address #2007 and came out of #2022 - I am trying to separate things to make the PRs easier to understand and review.
One of the fun things is in trying to add typing statements for this. We could use
typing.TextIO
for this but there's actually some "interesting" discussion about this on the internet, since it's often too "large" a requirement. So I follow one piece of advice from the typing world and use aProtocol
instead to indicate the functionality we need (close
andwrite
).There is a similar issue here to #1929 (comment) in that I have had to add a
close_on_exit
flag to decide whether to callfh.close
. Normally you would say "just use a context manager" but, as with the IO case, it's not that simple. There may be a better way to do this, and if anyone has an idea please holler, but the current approach - whilst ugly - seems to work.I have then made some changes to address review comments, including discovering and "fixing" #2063.