Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible fitting function, allow likelihood, remove uncertainties dependency #149

Merged
merged 18 commits into from Mar 18, 2021

Conversation

aminnj
Copy link
Contributor

@aminnj aminnj commented Mar 15, 2021

Based on the discussion in #146, I added some features to plot_pull. The theme is making fitting more streamlined for exploration.

  • Pull curve_fit initial guess (p0) from default arguments, if they exist
  • Allow a string as an alternative to a lambda function (plot_pull("a+b*x"))
  • Cosmetic change to the band, and embedding fit result into the legend
  • Likelihood fit (plot_pull(..., likelihood=True)) (chi2 by default, as before)
  • Remove uncertainties.numpy dependency and construct band by resampling covariance matrix
  • Introduce iminuit (gets the covariance matrix right, unlike scipy.optimize most of the time), but the initial guesses for iminuit are seeded from scipy

Setup

import numpy as np
from hist import Hist

np.random.seed(42)
hh = Hist.new.Reg(50, -5, 5).Double().fill(np.random.normal(0,1,int(1e5)))

Before (including a bug-fix for the variances from the above issue):

from uncertainties import unumpy as unp
def func(x, constant, mean, sigma):
    exp = unp.exp if constant.dtype == np.dtype("O") else np.exp
    return constant * exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))

hh.plot_pull(func)

image

After:

# as before, but no need for `uncertainties.numpy` as the error band comes
# from resampling the covariance matrix
def func(x, constant, mean, sigma):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# `curve_fit` `p0` extracted from defaults, if any
def func(x, constant=80, mean=0., sigma=1.):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# strings are allowed to allow for more compactness than a lambda
# x is assumed to be the main variable
hh.plot_pull("constant*np.exp(-(x-mean)**2. / (2*sigma**2))")

# gaussian is a common/special function, so this also works
# reasonable guesses are made for constant/mean/sigma
hh.plot_pull("gaus")

image

# chi2 puts `a` around 5, but likelihood puts `a` around 1e3/50 = 20
hh.plot_pull("a+b*x", likelihood=True)

image

@henryiii
Copy link
Member

Can you verify my changes are valid? Could you add a test with pytest-mpl, perhaps?

Copy link
Member

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes on my changes.

@@ -45,7 +46,7 @@ def _filter_dict(
}


def _expr_to_lambda(expr: str) -> Callable:
def _expr_to_lambda(expr: str) -> Callable[..., Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default for Callable - best to not leave empty Generics.

ydata: ArrayLike,
yerr: ArrayLike,
func: Callable[..., Any],
xdata: np.ndarray,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayLike can be a simple number, so xdata[stuff] was not valid. Either use np.asarray, or just require arrays.

likelihood: bool = False,
) -> Tuple[ArrayLike, ArrayLike]:
) -> Tuple[Tuple[float, ...], ArrayLike]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes the typing easier, because NumPy's typing is a bit spotty. It doesn't know that *array is valid, etc. This is small so was a simple fix.

src/hist/plot.py Outdated
Comment on lines 92 to 98
p0 = None
if func.__defaults__ and len(func.__defaults__) + 1 == func.__code__.co_argcount:
p0 = func.__defaults__
params = list(inspect.signature(func).parameters.values())
p0 = [
1 if arg.default is inspect.Parameter.empty else arg.default
for arg in params[1:]
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using inspect.signature, as it's a more public interface; __defaults__ (and maybe __code__) are untyped, AFAICT (or maybe just not always present on Callables). This also supports partial defaults, while the other was all or nothing.

from iminuit import Minuit
from scipy.optimize import curve_fit
from iminuit import Minuit # noqa: F401
from scipy.optimize import curve_fit # noqa: F401
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a ModuleNotFoundError.

src/hist/plot.py Outdated
Comment on lines 264 to 268
if type(func) in [str]:
if isinstance(func, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never check the type exactly, always use isinstance. It supports subclassing and MyPy type narrowing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also never make a list for containership testing, use a set, it's faster. x in {a, b}.

src/hist/plot.py Outdated
Comment on lines 267 to 271
constant = ydata.max()
constant = float(ydata.max())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always fond of NumPy's choices here. Not sure what a number[Any] is. So just forcing it to a Python float.

parnames = func.__code__.co_varnames[1:]
assert not isinstance(func, str)

parnames = list(inspect.signature(func).parameters)[1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same higher level usage of inspect.signature over .__code__.co_varname.

@henryiii
Copy link
Member

@all-contributors please add @aminnj for code

@allcontributors
Copy link
Contributor

@henryiii

I've put up a pull request to add @aminnj! 🎉

@henryiii
Copy link
Member

henryiii commented Mar 17, 2021

Also, would you like to fill in the changelog so we can clear that needs-changelog badge? :)

(I can do it if you prefer)

@aminnj
Copy link
Contributor Author

aminnj commented Mar 17, 2021

I appreciate the comments. I haven't yet grasped all the subtleties of the type system...

Is the suggestion to use pytest-mpl to make a snapshot of an example plot and use it as a reference image in unit testing? What if the style is changed upstream?

@henryiii
Copy link
Member

This also need docs, but don't worry about that - I'll be reworking the docs in the near future to make them easier to edit. When I do, I'll add to this part.

There are two ways to test. The easier way is to use pytest-mpl to make a snapshot of an example plot and use it as a reference image in unit testing. If the style is changed upstream, we will immediately know and will have to change our test image.

The better way to test is to mock mpl and then verify the sequence of commands to the mpl functions do not change. That also gives much better error messages if something gets changed, rather than just an image diff. But that's much harder to set up. You can see what I did for mplhep here: https://github.com/scikit-hep/mplhep/blob/5effa0b857d7eea93da0299cbbcdc777bd3abaaf/tests/test_mock.py - but as you see, I didn't end up adding it for everything because it's a bit of work for each one.

@aminnj
Copy link
Contributor Author

aminnj commented Mar 17, 2021

I added in pytest-mpl and seems CI is finally happy with it. Lowered the DPI for the test case made with pytest tests/test_plot.py --mpl-generate-path=baseline. By default, it has a very matplotlib 1.x feel.

image

@henryiii
Copy link
Member

Looks good to me, can you add a changelog entry, then we can merge? If not, let me know, and I'll add one; I usually lightly push contributors to add changelog entries but add them if not; original contributors are usually the best at advertising what they have done. :)

setup.py Show resolved Hide resolved
@aminnj
Copy link
Contributor Author

aminnj commented Mar 18, 2021

I added in some test cases that I had forgotten earlier (the h.plot_pull("gaus") alias and likelihood=True). The changelog entry I added is just a compact summary of the points in the original post of this thread, but let me know if I should include more details.

Copy link
Member

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@henryiii henryiii merged commit 84a5d80 into scikit-hep:master Mar 18, 2021
@eduardo-rodrigues
Copy link
Member

Hi @henryiii, @aminnj, I'm following your nice developments. I just wanted to make a comment now that I see h.plot_pull("gaus"): I know that ROOT does this, amputating Gauss's name to save a little letter, but I hate that. TBH I see no gain and consider far more important to save the great mathemacian's name. Could we please please do "gauss"?

@aminnj
Copy link
Contributor Author

aminnj commented Mar 18, 2021

I personally have no strong attachment to "gaus", so I can offer a +1 to "gauss"

@nsmith-
Copy link
Contributor

nsmith- commented Mar 18, 2021

I was taking a look at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html today and noticed it has a fit function that does an unbinned max-likelihood fit. I wonder if there's a possibility we could add a binned_fit function to scipy? It could take a UHI plottable or numpy array as input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants