More flexible fitting function, allow likelihood, remove uncertainties dependency #149

aminnj · 2021-03-15T23:16:46Z

Based on the discussion in #146, I added some features to plot_pull. The theme is making fitting more streamlined for exploration.

Pull curve_fit initial guess (p0) from default arguments, if they exist
Allow a string as an alternative to a lambda function (plot_pull("a+b*x"))
Cosmetic change to the band, and embedding fit result into the legend
Likelihood fit (plot_pull(..., likelihood=True)) (chi2 by default, as before)
Remove uncertainties.numpy dependency and construct band by resampling covariance matrix
Introduce iminuit (gets the covariance matrix right, unlike scipy.optimize most of the time), but the initial guesses for iminuit are seeded from scipy

Setup

import numpy as np
from hist import Hist

np.random.seed(42)
hh = Hist.new.Reg(50, -5, 5).Double().fill(np.random.normal(0,1,int(1e5)))

Before (including a bug-fix for the variances from the above issue):

from uncertainties import unumpy as unp
def func(x, constant, mean, sigma):
    exp = unp.exp if constant.dtype == np.dtype("O") else np.exp
    return constant * exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))

hh.plot_pull(func)

After:

# as before, but no need for `uncertainties.numpy` as the error band comes
# from resampling the covariance matrix
def func(x, constant, mean, sigma):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# `curve_fit` `p0` extracted from defaults, if any
def func(x, constant=80, mean=0., sigma=1.):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# strings are allowed to allow for more compactness than a lambda
# x is assumed to be the main variable
hh.plot_pull("constant*np.exp(-(x-mean)**2. / (2*sigma**2))")

# gaussian is a common/special function, so this also works
# reasonable guesses are made for constant/mean/sigma
hh.plot_pull("gaus")

# chi2 puts `a` around 5, but likelihood puts `a` around 1e3/50 = 20
hh.plot_pull("a+b*x", likelihood=True)

for more information, see https://pre-commit.ci

henryiii · 2021-03-17T21:30:40Z

Can you verify my changes are valid? Could you add a test with pytest-mpl, perhaps?

henryiii

Notes on my changes.

henryiii · 2021-03-17T21:31:52Z

src/hist/plot.py

@@ -45,7 +46,7 @@ def _filter_dict(
    }


-def _expr_to_lambda(expr: str) -> Callable:
+def _expr_to_lambda(expr: str) -> Callable[..., Any]:


This is the default for Callable - best to not leave empty Generics.

henryiii · 2021-03-17T21:32:50Z

src/hist/plot.py

-    ydata: ArrayLike,
-    yerr: ArrayLike,
+    func: Callable[..., Any],
+    xdata: np.ndarray,


ArrayLike can be a simple number, so xdata[stuff] was not valid. Either use np.asarray, or just require arrays.

henryiii · 2021-03-17T21:33:27Z

src/hist/plot.py

    likelihood: bool = False,
-) -> Tuple[ArrayLike, ArrayLike]:
+) -> Tuple[Tuple[float, ...], ArrayLike]:


Makes the typing easier, because NumPy's typing is a bit spotty. It doesn't know that *array is valid, etc. This is small so was a simple fix.

henryiii · 2021-03-17T21:34:21Z

src/hist/plot.py

-    p0 = None
-    if func.__defaults__ and len(func.__defaults__) + 1 == func.__code__.co_argcount:
-        p0 = func.__defaults__
+    params = list(inspect.signature(func).parameters.values())
+    p0 = [
+        1 if arg.default is inspect.Parameter.empty else arg.default
+        for arg in params[1:]
+    ]
+


I'm using inspect.signature, as it's a more public interface; __defaults__ (and maybe __code__) are untyped, AFAICT (or maybe just not always present on Callables). This also supports partial defaults, while the other was all or nothing.

henryiii · 2021-03-17T21:35:44Z

src/hist/plot.py

-        from iminuit import Minuit
-        from scipy.optimize import curve_fit
+        from iminuit import Minuit  # noqa: F401
+        from scipy.optimize import curve_fit  # noqa: F401
    except ImportError:


This could be a ModuleNotFoundError.

henryiii · 2021-03-17T21:36:22Z

src/hist/plot.py

-    if type(func) in [str]:
+    if isinstance(func, str):


Never check the type exactly, always use isinstance. It supports subclassing and MyPy type narrowing.

Also never make a list for containership testing, use a set, it's faster. x in {a, b}.

henryiii · 2021-03-17T21:36:51Z

src/hist/plot.py

-            constant = ydata.max()
+            constant = float(ydata.max())


Not always fond of NumPy's choices here. Not sure what a number[Any] is. So just forcing it to a Python float.

henryiii · 2021-03-17T21:37:34Z

src/hist/plot.py

-    parnames = func.__code__.co_varnames[1:]
+    assert not isinstance(func, str)
+
+    parnames = list(inspect.signature(func).parameters)[1:]


Same higher level usage of inspect.signature over .__code__.co_varname.

henryiii · 2021-03-17T21:40:16Z

@all-contributors please add @aminnj for code

allcontributors · 2021-03-17T21:40:24Z

@henryiii

I've put up a pull request to add @aminnj! 🎉

henryiii · 2021-03-17T21:40:53Z

Also, would you like to fill in the changelog so we can clear that needs-changelog badge? :)

(I can do it if you prefer)

aminnj · 2021-03-17T21:41:32Z

I appreciate the comments. I haven't yet grasped all the subtleties of the type system...

Is the suggestion to use pytest-mpl to make a snapshot of an example plot and use it as a reference image in unit testing? What if the style is changed upstream?

henryiii · 2021-03-17T21:47:02Z

This also need docs, but don't worry about that - I'll be reworking the docs in the near future to make them easier to edit. When I do, I'll add to this part.

There are two ways to test. The easier way is to use pytest-mpl to make a snapshot of an example plot and use it as a reference image in unit testing. If the style is changed upstream, we will immediately know and will have to change our test image.

The better way to test is to mock mpl and then verify the sequence of commands to the mpl functions do not change. That also gives much better error messages if something gets changed, rather than just an image diff. But that's much harder to set up. You can see what I did for mplhep here: https://github.com/scikit-hep/mplhep/blob/5effa0b857d7eea93da0299cbbcdc777bd3abaaf/tests/test_mock.py - but as you see, I didn't end up adding it for everything because it's a bit of work for each one.

for more information, see https://pre-commit.ci

aminnj · 2021-03-17T22:44:42Z

I added in pytest-mpl and seems CI is finally happy with it. Lowered the DPI for the test case made with pytest tests/test_plot.py --mpl-generate-path=baseline. By default, it has a very matplotlib 1.x feel.

henryiii · 2021-03-18T01:10:50Z

Looks good to me, can you add a changelog entry, then we can merge? If not, let me know, and I'll add one; I usually lightly push contributors to add changelog entries but add them if not; original contributors are usually the best at advertising what they have done. :)

setup.py

aminnj · 2021-03-18T03:12:25Z

I added in some test cases that I had forgotten earlier (the h.plot_pull("gaus") alias and likelihood=True). The changelog entry I added is just a compact summary of the points in the original post of this thread, but let me know if I should include more details.

henryiii

Looks great, thanks!

eduardo-rodrigues · 2021-03-18T07:21:57Z

Hi @henryiii, @aminnj, I'm following your nice developments. I just wanted to make a comment now that I see h.plot_pull("gaus"): I know that ROOT does this, amputating Gauss's name to save a little letter, but I hate that. TBH I see no gain and consider far more important to save the great mathemacian's name. Could we please please do "gauss"?

aminnj · 2021-03-18T07:30:42Z

I personally have no strong attachment to "gaus", so I can offer a +1 to "gauss"

nsmith- · 2021-03-18T18:42:38Z

I was taking a look at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html today and noticed it has a fit function that does an unbinned max-likelihood fit. I wonder if there's a possibility we could add a binned_fit function to scipy? It could take a UHI plottable or numpy array as input.

aminnj added 2 commits March 15, 2021 15:34

more flexible fitting function, allow likelihood

1ded859

add gaussian shortcut

93a59ab

github-actions bot added the needs changelog label Mar 15, 2021

pre-commit-ci bot and others added 3 commits March 15, 2021 23:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

155ae98

for more information, see https://pre-commit.ci

change dependency

80cb32c

Merge branch 'fitting' of github.com:aminnj/hist into fitting

95fcff6

matthewfeickert mentioned this pull request Mar 16, 2021

[BUG] error bars in plot_pull #146

Closed

henryiii added 2 commits March 17, 2021 16:10

Merge branch 'master' into fitting

3d6ca17

fix(types): fix typing issues

0de0869

henryiii reviewed Mar 17, 2021

View reviewed changes

allcontributors bot mentioned this pull request Mar 17, 2021

docs: add aminnj as a contributor #152

Merged

aminnj and others added 8 commits March 17, 2021 14:56

add determinism to prevent rare fit failures

087a9ca

add pytest-mpl

322bd30

[pre-commit.ci] auto fixes from pre-commit.com hooks

150ed5d

for more information, see https://pre-commit.ci

lower dpi

49e02f2

fix reference

791194c

[pre-commit.ci] auto fixes from pre-commit.com hooks

88afa48

for more information, see https://pre-commit.ci

update manifest for ci

b4f4520

Merge branch 'fitting' of github.com:aminnj/hist into fitting

55176fc

henryiii reviewed Mar 18, 2021

View reviewed changes

setup.py Show resolved Hide resolved

aminnj added 2 commits March 17, 2021 19:58

test the gaussian alias

78b5070

verify that likelihood does not crash at least

3b373a7

add changelog entry

a149b98

github-actions bot removed the needs changelog label Mar 18, 2021

henryiii approved these changes Mar 18, 2021

View reviewed changes

henryiii merged commit 84a5d80 into scikit-hep:master Mar 18, 2021

henryiii mentioned this pull request Mar 18, 2021

rename gaus -> gauss #157

Closed

This was referenced Mar 18, 2021

[FEATURE] Remove fit result summary from pull plot legend as default #159

Closed

[BUG] Plot artifact in baseline test images #167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible fitting function, allow likelihood, remove uncertainties dependency #149

More flexible fitting function, allow likelihood, remove uncertainties dependency #149

aminnj commented Mar 15, 2021

henryiii commented Mar 17, 2021

henryiii left a comment

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii Mar 17, 2021

henryiii commented Mar 17, 2021

allcontributors bot commented Mar 17, 2021

henryiii commented Mar 17, 2021 •

edited

aminnj commented Mar 17, 2021 •

edited

henryiii commented Mar 17, 2021

aminnj commented Mar 17, 2021

henryiii commented Mar 18, 2021

aminnj commented Mar 18, 2021

henryiii left a comment

eduardo-rodrigues commented Mar 18, 2021

aminnj commented Mar 18, 2021

nsmith- commented Mar 18, 2021

More flexible fitting function, allow likelihood, remove uncertainties dependency #149

More flexible fitting function, allow likelihood, remove uncertainties dependency #149

Conversation

aminnj commented Mar 15, 2021

henryiii commented Mar 17, 2021

henryiii left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henryiii commented Mar 17, 2021

allcontributors bot commented Mar 17, 2021

henryiii commented Mar 17, 2021 • edited

aminnj commented Mar 17, 2021 • edited

henryiii commented Mar 17, 2021

aminnj commented Mar 17, 2021

henryiii commented Mar 18, 2021

aminnj commented Mar 18, 2021

henryiii left a comment

Choose a reason for hiding this comment

eduardo-rodrigues commented Mar 18, 2021

aminnj commented Mar 18, 2021

nsmith- commented Mar 18, 2021

henryiii commented Mar 17, 2021 •

edited

aminnj commented Mar 17, 2021 •

edited