Allow a file handle or Path object to be sent to the outfile parameter of fit #2054

DougBurke · 2024-05-29T14:20:41Z

Summary

The fit call can now be sent a Path object or a file handle, as well as a string. Fix issue #2063 (ensure consistency when changing the statistic object used in a fit; this is not an issue for the UI code).

Details

Partly taken from #2015, #2022.

The idea is for a user at the ui layer to be able to say things like

fit(outfile=Path(...))

from io import StringIO
store = StringIO()
fit(outfile=store)
out = store,getvalue()

with open("txt.txt") as fh:
    fit(outfile=fh)
    # this is not really sensible, but you can do it
    fit(outfile=fh)

To do this we need to be able to send in a Path or file-handle-like object to the Fit.fit method. This does change the IterFit interface but this is an internal class which users do not interact with directly.

To do this I've made some minor clean up changes. This is part of trying to address #2007 and came out of #2022 - I am trying to separate things to make the PRs easier to understand and review.

One of the fun things is in trying to add typing statements for this. We could use typing.TextIO for this but there's actually some "interesting" discussion about this on the internet, since it's often too "large" a requirement. So I follow one piece of advice from the typing world and use a Protocol instead to indicate the functionality we need (close and write).

There is a similar issue here to #1929 (comment) in that I have had to add a close_on_exit flag to decide whether to call fh.close. Normally you would say "just use a context manager" but, as with the IO case, it's not that simple. There may be a better way to do this, and if anyone has an idea please holler, but the current approach - whilst ugly - seems to work.

I have then made some changes to address review comments, including discovering and "fixing" #2063.

sherpa/fit.py

anetasie · 2024-06-17T15:39:24Z

@DougBurke What would be a new behavior?

Current option fit(outfile="path/filename") generates an output file with the record of the search for all the parameters.

sherpa In [2]: fit("out.txt")
IdentifierErr: data set out.txt has not been set

sherpa In [3]: fit(outfile="out.txt")
Dataset               = 1
Method                = neldermead
Statistic             = wstat
Initial fit statistic = 1.88051e+08
Final fit statistic   = 475.847 at function evaluation 339
Data points           = 446
Degrees of freedom    = 444
Probability [Q-value] = 0.143116
Reduced statistic     = 1.07173
Change in statistic   = 1.88051e+08
   p1.gamma       1.23288     
   p1.ampl        9.07174e-06 

sherpa In [4]: !more out.txt
# nfev statistic p1.gamma p1.ampl
0.000000e+00 1.880512e+08 1.000000e+00 1.000000e+00
1.000000e+00 7.535066e+07 2.200000e+00 1.000000e+00
2.000000e+00 4.137324e+08 1.000000e+00 2.200000e+00
3.000000e+00 2.309309e+08 1.300000e+00 1.600000e+00
4.000000e+00 3.656551e+07 1.900000e+00 4.000000e-01
5.000000e+00 1.906931e+07 3.100000e+00 4.000000e-01
6.000000e+00 3.386327e+06 4.150000e+00 1.000000e-01
7.000000e+00 3.736637e+07 2.612500e+00 6.250000e-01
8.000000e+00 2.362581e+07 2.818750e+00 4.375000e-01
9.000000e+00 3.975639e+06 5.068750e+00 1.375000e-01
'''

sherpa In [5]: fit(outfile="../out.txt")
Dataset = 1
Method = neldermead
Statistic = wstat
Initial fit statistic = 475.847
Final fit statistic = 475.847 at function evaluation 302
Data points = 446
Degrees of freedom = 444
Probability [Q-value] = 0.143116
Reduced statistic = 1.07173
Change in statistic = 0
p1.gamma 1.23288
p1.ampl 9.07174e-06
'''

hamogu · 2024-06-17T16:56:15Z

@anetasie : @DougBurke gave an example of the new behavior in #2054 (comment): Instead of a string "path/filename", you can now give it some other object that supports the same interface, for example you can capture the output into a variable using the StringIO module so you don't have to write it to a file first and then read it back in. (In the UI, that might not be what most users do, but it's useful when e/g/ you use Sherpa to build some pipeline and don't know what system it runs on or what directory you can use to write temporary files or you are multi-threading and don't want to bother to make code such that each output filename in unique etc. or other cases like that).

DougBurke · 2024-06-17T20:09:44Z

@anetasie

So, setting outfile to a string has not changed (i.e. this does not change existing behaviour).

The new behaviour is if you set outfile to a Path object or a file-handle. The Path version seems like it's just more to type than a string, but Path is meant to be the "better approach" to specifying file locations, so we may as well support it - e.g.

from pathlib import Path

out = Path("/foo/bar/out.dat")
fit(outfile=out)

You can instead pass it a file-handle, which is again a specialized feature. The one I want it for is because it makes testing things easier since you can say

from io import StringIO

buffer = StringO()
fit(outfile=buffer)
out = buffer.getvalue()
print(out)

[ie you don't need to write to a file just to get the data].

You can also send in an actual filehandle - e.g.

with open("/foo/bar/baz.dat", "wt") as fh:
    fit(outfile=fh)
    fh.write("YOU CAN ADD EXTRA INFO TO THE FILE IF YOU REALLY WANT\n")

but again this is likely only useful for specialized cases.

you can set outfile to something like

hamogu

I'm not a big fan of this chain from Fit to IterFit to IterFitCallback. It seems convoluted and I'm sure there must be a simpler way of organizing it, but that's not for this PR to litigate.

anetasie · 2024-06-18T02:03:03Z

@DougBurke thanks for the description.

Rather than 'from numpy import x, y, z' use np.x, np.y, np.z directly. This is a style choice, but I think makes the code clearer.

- use super().__init__() rather than hard-code the super-class name - add or remove the NoNewAttributesAfterInit class (for example, IterFit is removed because - we never called the super-class so never actually made use of it - it's an internal class we don't expect the user to come across - a subsequent change is awkward if we actually initialize the NewAttributesAfterInit class - avoid re-creating the list that get_thawed_pars() creates for us (this is due to a clean up in sherpa#1974 that created this routine) - create a n-element tuple by appending to a list and then converting that to a tuple, rather than just adding a single tuple (since mypy doesn't like this)

Use isinstance rather than type. There are other uses of type in this module but it't tricky because they need checking to make sure we don't accidentally change behaviour (as some classes may be sub-classed).

We know explicitly allow the output file for fit calls to be a Path object (previously we allowed it but not intentionally) and also allow a file-handle like object to be used. This then lets you say out = String() fit(outfile=out) txt = out.getvalue() without having to create an on-disk file. In doing this the internals of the IterFit class were changed, and the callback routine it uses was converted from a local definition to a top-level class. This change I think better separates the code logic, but is also to help address issue sherpa#2007 (where the multiprocessing code doesn't seem to like our use of local function definitions when using anything but the "fork" method). As part of this work some basic typing statements have been added. There is some fun in how best to indicate the type of a "file handle": the easy solution is typing.TextIO but this is "too large" as all we need are write() and close() methods, so we use a Protocol. See the discussion at python/typing#829 Now, this may not be ideal - as we don't currently export the protocol so how will it work with the documentation - but it would be easy to change to use TextIO.

Clarify the internal fitting code, where a tuple was being converted to a list to be able to edit an element. We now explicitly decompose the tuple, so it is more obvious what is going on. I had hoped to be able to simplify this routine even more, but the difference between the statistic object stored locally and the one stored in the callback means it is not as simple as I had hoped.

The class used a boolean to determine whether to call a function when we can just check if the function is defined.

The issue at hand is that if you *change* the stat method of the Fit structure then you will get a mis-match with the actual fitted statistic, which remains using the _iterfit.stat method (i.e. the original statistic). We only have two test cases that expose this difference, so take advantage of the new "write the fit results to StingIO" capability and show that the stat value has changed (i.e. all but the last row use the original statistic, in this case Chi2DataVar, so the stat value is ~ 8500, and only the last row uses the Cash or CStat value). This is intended as a regression test.

@hamogu

We can't make use of a simple context manager for handling the fit output since a) we may not have a handle/filename b) even if we do, we may not want to close it (i.e. if sent in by the user) We can at least try to emulate the "close even on an error" handling of a context manager with try/finally, as suggested by @hamogu This is a bit invasive visually, but it's not clear to me that breaking the code up into smaller chunks is worth it here.

We can make sure that changing the Fit stat also changes the associated _iterfit.stat field, which seems like it should be the correct thing to do. There are only two tests where this is an issue, and it's unclear whether the changes are good, bad, or not significant, since a) the tests appear to be regression tests b) we don't know how the test data was created, so we don't know what the true values are Note that the parameter values do not appear to have changed "hugely", but I haven't looked into the results too deeply. The tests do show the change - in that the statistic value reported in the "fit(outfile=...)" option now remains consistent (i.e. matches the expected value, assuming our new interpretation of the stat field of the Fit object is correct [which it should be, I claim]).

We can just call the callback routne after updating the model values, rather than manually setting the thawed pars, calculating the stat, and writing out the values to a file. This is ony possible now that sherpa#2063 has been addressed.

Although this is primarily used in the UI code it is used often enough that it's useful to have in sherpa.utils.types.

DougBurke · 2024-06-27T02:27:53Z

Rebased

as CI: enforce NumPy does not use version 2.0 for the builds #2065 has been merged we don't need to include it (so just removes one commit)
as Simple typing support for sherpa.data #2060 has been merged we can move IdType to sherpa.utils.types (so adds the final commit, "types: move IdType to sherpa.utils.types")

There's been no other changes to the code in the rebase.

codecov · 2024-06-27T02:57:57Z

Codecov Report

Attention: Patch coverage is 97.32143% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.68%. Comparing base (44442ee) to head (c32c81c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2054      +/-   ##
==========================================
+ Coverage   86.64%   86.68%   +0.03%     
==========================================
  Files          82       82              
  Lines       29766    29786      +20     
  Branches     4487     4492       +5     
==========================================
+ Hits        25792    25821      +29     
+ Misses       3863     3854       -9     
  Partials      111      111

Files	Coverage Δ
sherpa/astro/ui/serialize.py	`89.67% <100.00%> (ø)`
sherpa/astro/ui/utils.py	`98.15% <100.00%> (+<0.01%)`	⬆️
sherpa/ui/utils.py	`97.25% <100.00%> (ø)`
sherpa/utils/types.py	`100.00% <100.00%> (ø)`
sherpa/fit.py	`83.12% <97.08%> (+0.42%)`	⬆️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44442ee...c32c81c. Read the comment docs.

DougBurke marked this pull request as draft May 29, 2024 14:20

DougBurke force-pushed the fit-cleanup branch 3 times, most recently from 72a4c67 to 0a206c4 Compare June 4, 2024 13:22

DougBurke force-pushed the fit-cleanup branch 2 times, most recently from 81ef507 to e0234a6 Compare June 8, 2024 18:20

DougBurke marked this pull request as ready for review June 8, 2024 18:29

DougBurke added type:enhancement area:code general:optimisation labels Jun 8, 2024

DougBurke changed the title ~~Fit cleanup~~ Allow a file handle or Path object to be sent to the outfile parameter of fit Jun 12, 2024

DougBurke requested a review from hamogu June 12, 2024 18:44

hamogu reviewed Jun 17, 2024

View reviewed changes

sherpa/fit.py Outdated Show resolved Hide resolved

sherpa/fit.py Outdated Show resolved Hide resolved

DougBurke mentioned this pull request Jun 17, 2024

Fit class: what to do about changing the stats class? #2063

Closed

DougBurke linked an issue Jun 17, 2024 that may be closed by this pull request

Fit class: what to do about changing the stats class? #2063

Closed

hamogu approved these changes Jun 18, 2024

View reviewed changes

DougBurke added 10 commits June 26, 2024 18:45

fit: use explicit symbol import from numpy

127b577

Rather than 'from numpy import x, y, z' use np.x, np.y, np.z directly. This is a style choice, but I think makes the code clearer.

fit: internal code change

b70a47e

Use isinstance rather than type. There are other uses of type in this module but it't tricky because they need checking to make sure we don't accidentally change behaviour (as some classes may be sub-classed).

fit: internal clean up to IterFit

6712a74

The class used a boolean to determine whether to call a function when we can just check if the function is defined.

fit: internal code cleanup

7f878e3

We can just call the callback routne after updating the model values, rather than manually setting the thawed pars, calculating the stat, and writing out the values to a file. This is ony possible now that sherpa#2063 has been addressed.

types: move IdType to sherpa.utils.types

c32c81c

Although this is primarily used in the UI code it is used often enough that it's useful to have in sherpa.utils.types.

DougBurke force-pushed the fit-cleanup branch from de3259d to c32c81c Compare June 27, 2024 02:25

wmclaugh merged commit 4e85332 into sherpa:main Jun 27, 2024
17 checks passed

DougBurke deleted the fit-cleanup branch June 27, 2024 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow a file handle or Path object to be sent to the outfile parameter of fit #2054

Allow a file handle or Path object to be sent to the outfile parameter of fit #2054

DougBurke commented May 29, 2024 •

edited

Loading

anetasie commented Jun 17, 2024

hamogu commented Jun 17, 2024

DougBurke commented Jun 17, 2024

hamogu left a comment

anetasie commented Jun 18, 2024

DougBurke commented Jun 27, 2024

codecov bot commented Jun 27, 2024

Allow a file handle or Path object to be sent to the outfile parameter of fit #2054

Allow a file handle or Path object to be sent to the outfile parameter of fit #2054

Conversation

DougBurke commented May 29, 2024 • edited Loading

Summary

Details

anetasie commented Jun 17, 2024

hamogu commented Jun 17, 2024

DougBurke commented Jun 17, 2024

hamogu left a comment

Choose a reason for hiding this comment

anetasie commented Jun 18, 2024

DougBurke commented Jun 27, 2024

codecov bot commented Jun 27, 2024

Codecov Report

DougBurke commented May 29, 2024 •

edited

Loading