-
Notifications
You must be signed in to change notification settings - Fork 131
Curve fitting analysis helper functions #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5db1a42
a6d9c93
b816312
1e0e755
8d1c971
4c8057c
271995a
0a7fb8b
0ba6a74
ee870b9
f54cfef
5457af8
c084e67
e2998f9
c19e55c
1992900
a1acfac
e091964
6bb4dc7
361ae6c
ab805e9
17876df
90f1926
363d09e
9024ded
6b1f362
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # This code is part of Qiskit. | ||
| # | ||
| # (C) Copyright IBM 2021. | ||
| # | ||
| # This code is licensed under the Apache License, Version 2.0. You may | ||
| # obtain a copy of this license in the LICENSE.txt file in the root directory | ||
| # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. | ||
| # | ||
| # Any modifications or derivative works of this code must retain this | ||
| # copyright notice, and modified files need to carry a notice indicating | ||
| # that they have been altered from the originals. | ||
|
|
||
| """ | ||
| Analysis helper functions | ||
| """ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,304 @@ | ||
| # This code is part of Qiskit. | ||
| # | ||
| # (C) Copyright IBM 2021. | ||
| # | ||
| # This code is licensed under the Apache License, Version 2.0. You may | ||
| # obtain a copy of this license in the LICENSE.txt file in the root directory | ||
| # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. | ||
| # | ||
| # Any modifications or derivative works of this code must retain this | ||
| # copyright notice, and modified files need to carry a notice indicating | ||
| # that they have been altered from the originals. | ||
| """ | ||
| Curve fitting functions for experiment analysis | ||
| """ | ||
| # pylint: disable = invalid-name | ||
|
|
||
| from typing import List, Dict, Tuple, Callable, Optional, Union | ||
|
|
||
| import numpy as np | ||
| import scipy.optimize as opt | ||
| from qiskit.exceptions import QiskitError | ||
| from qiskit_experiments.base_analysis import AnalysisResult | ||
| from qiskit_experiments.analysis.data_processing import filter_data | ||
|
|
||
|
|
||
| def curve_fit( | ||
| func: Callable, | ||
| xdata: np.ndarray, | ||
| ydata: np.ndarray, | ||
| p0: Union[Dict[str, float], np.ndarray], | ||
| sigma: Optional[np.ndarray] = None, | ||
| bounds: Optional[Union[Dict[str, Tuple[float, float]], Tuple[np.ndarray, np.ndarray]]] = None, | ||
| **kwargs, | ||
| ) -> AnalysisResult: | ||
| r"""Perform a non-linear least squares to fit | ||
|
|
||
| This solves the optimization problem | ||
|
|
||
| .. math:: | ||
| \Theta_{\mbox{opt}} = \arg\min_\Theta \sum_i | ||
| \sigma_i^{-2} (f(x_i, \Theta) - y_i)^2 | ||
|
|
||
| using ``scipy.optimize.curve_fit``. | ||
|
|
||
| Args: | ||
| func: a fit function `f(x, *params)`. | ||
| xdata: a 1D float array of x-data. | ||
| ydata: a 1D float array of y-data. | ||
| p0: initial guess for optimization parameters. | ||
chriseclectic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| sigma: Optional, a 1D array of standard deviations in ydata | ||
| in absolute units. | ||
| bounds: Optional, lower and upper bounds for optimization | ||
| parameters. | ||
| kwargs: additional kwargs for scipy.optimize.curve_fit. | ||
|
|
||
| Returns: | ||
| result containing ``popt`` the optimal fit parameters, | ||
| ``popt_err`` the standard error estimates popt, | ||
| ``pcov`` the covariance matrix for the fit, | ||
| ``reduced_chisq`` the reduced chi-squared parameter of fit, | ||
| ``dof`` the degrees of freedom of the fit, | ||
| ``xrange`` the range of xdata values used for fit. | ||
|
|
||
| Raises: | ||
| QiskitError: if the number of degrees of freedom of the fit is | ||
| less than 1. | ||
|
|
||
| .. note:: | ||
| ``sigma`` is assumed to be specified in the same units as ``ydata`` | ||
| (absolute units). If sigma is instead specified in relative units | ||
| the `absolute_sigma=False` kwarg of scipy curve_fit must be used. | ||
| This affects the returned covariance ``pcov`` and error ``popt_err`` | ||
| parameters via ``pcov(absolute_sigma=False) = pcov * reduced_chisq`` | ||
| ``popt_err(absolute_sigma=False) = popt_err * sqrt(reduced_chisq)``. | ||
| """ | ||
| # Format p0 parameters if specified as dictionary | ||
| if isinstance(p0, dict): | ||
| param_keys = list(p0.keys()) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this will make the ordering of p0 as a vector unstable, which will probably mess someone up when they assume the covariance matrix is always ordered the same. Maybe sort alphabetically by key or something?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently the popt, popt_keys, popt_err, pcov ordering will be the ordering of |
||
| param_p0 = list(p0.values()) | ||
|
|
||
| # Convert bounds | ||
| if bounds: | ||
| lower = [bounds[key][0] for key in param_keys] | ||
| upper = [bounds[key][1] for key in param_keys] | ||
| param_bounds = (lower, upper) | ||
| else: | ||
| param_bounds = None | ||
|
|
||
| # Convert fit function | ||
| def fit_func(x, *params): | ||
| return func(x, **dict(zip(param_keys, params))) | ||
|
|
||
| else: | ||
| param_keys = None | ||
| param_p0 = p0 | ||
| param_bounds = bounds | ||
| fit_func = func | ||
|
|
||
| # Check the degrees of freedom is greater than 0 | ||
| dof = len(ydata) - len(param_p0) | ||
| if dof < 1: | ||
| raise QiskitError( | ||
| "The number of degrees of freedom of the fit data and model " | ||
| " (len(ydata) - len(p0)) is less than 1" | ||
| ) | ||
|
|
||
| # Override scipy.curve_fit default for absolute_sigma=True | ||
| # if sigma is specified. | ||
| if sigma is not None and "absolute_sigma" not in kwargs: | ||
| kwargs["absolute_sigma"] = True | ||
|
|
||
| # Run curve fit | ||
| # TODO: Add error handling so if fitting fails we can return an analysis | ||
| # result containing this information | ||
| # pylint: disable = unbalanced-tuple-unpacking | ||
| popt, pcov = opt.curve_fit( | ||
| fit_func, xdata, ydata, sigma=sigma, p0=param_p0, bounds=param_bounds, **kwargs | ||
| ) | ||
| popt_err = np.sqrt(np.diag(pcov)) | ||
|
|
||
| # Calculate the reduced chi-squared for fit | ||
| yfits = fit_func(xdata, *popt) | ||
| residues = (yfits - ydata) ** 2 | ||
| if sigma is not None: | ||
| residues = residues / (sigma ** 2) | ||
chriseclectic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| reduced_chisq = np.sum(residues) / dof | ||
|
|
||
| # Compute xdata range for fit | ||
| xdata_range = [min(xdata), max(xdata)] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we optionally truncate the fit range? i.e. adding range truncation function in the helper library. This is useful for fitting power dependency (sometime we see non-linearly with large xvalue, and want to exclude this region)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could add something like a Or maybe a more general solution would be to beef up the filters so they aren't just value checks Im not sure how you could allow both callable and fixed value filters though. They might just have to be all callable if we do that (like a fixed value one would need to be done something like
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think **filter is used to filter out circuit based on metadata so current implementation looks good.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any of scipy
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I just meant consistency of function signature, or docstring. current interface is ok in terms of functionality. thanks for updating. |
||
|
|
||
| result = { | ||
| "popt": popt, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question; if the user passed in a dict, should we return a dict here?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid having two different return types, i left this as an array, and if p0 was supplied with a dict the |
||
| "popt_keys": param_keys, | ||
| "popt_err": popt_err, | ||
| "pcov": pcov, | ||
| "reduced_chisq": reduced_chisq, | ||
| "dof": dof, | ||
| "xrange": xdata_range, | ||
| } | ||
|
|
||
| return AnalysisResult(result) | ||
|
|
||
|
|
||
| def multi_curve_fit( | ||
| funcs: List[Callable], | ||
| series: np.ndarray, | ||
| xdata: np.ndarray, | ||
| ydata: np.ndarray, | ||
| p0: np.ndarray, | ||
| sigma: Optional[np.ndarray] = None, | ||
| weights: Optional[np.ndarray] = None, | ||
| bounds: Optional[Union[Dict[str, Tuple[float, float]], Tuple[np.ndarray, np.ndarray]]] = None, | ||
| **kwargs, | ||
| ) -> AnalysisResult: | ||
| r"""Perform a linearized multi-objective non-linear least squares fit. | ||
|
|
||
| This solves the optimization problem | ||
|
|
||
| .. math:: | ||
| \Theta_{\mbox{opt}} = \arg\min_\Theta \sum_{k} w_k | ||
| \sum_{i} \sigma_{k, i}^{-2} | ||
| (f_k(x_{k, i}, \Theta) - y_{k, i})^2 | ||
|
|
||
| for multiple series of :math:`x_k, y_k, \sigma_k` data evaluated using | ||
| a list of objective functions :math:`[f_k]` | ||
| using ``scipy.optimize.curve_fit``. | ||
|
|
||
| Args: | ||
| funcs: a list of objective functions :math:`[f_0, f_1, ...]` where | ||
| each function has signature :math`f_k`(x, *params)`. | ||
| series: a 1D int array that specifies the component objective | ||
| function :math:`f_k` to evaluate corresponding x and y | ||
| data with. | ||
| xdata: a 1D float array of xdata. | ||
| ydata: a 1D float array of ydata. | ||
| p0: initial guess for optimization parameters. | ||
| sigma: Optional, a 1D array of standard deviations in ydata | ||
| in absolute units. | ||
| weights: Optional, a 1D float list of weights :math:`w_k` for each | ||
| component function :math:`f_k`. | ||
| bounds: Optional, lower and upper bounds for optimization | ||
| parameters. | ||
| kwargs: additional kwargs for scipy.optimize.curve_fit. | ||
|
|
||
| Returns: | ||
| result containing ``popt`` the optimal fit parameters, | ||
| ``popt_err`` the standard error estimates popt, | ||
| ``pcov`` the covariance matrix for the fit, | ||
| ``reduced_chisq`` the reduced chi-squared parameter of fit, | ||
| ``dof`` the degrees of freedom of the fit, | ||
| ``xrange`` the range of xdata values used for fit. | ||
|
|
||
| Raises: | ||
| QiskitError: if the number of degrees of freedom of the fit is | ||
| less than 1. | ||
|
|
||
| .. note:: | ||
| ``sigma`` is assumed to be specified in the same units as ``ydata`` | ||
| (absolute units). If sigma is instead specified in relative units | ||
| the `absolute_sigma=False` kwarg of scipy curve_fit must be used. | ||
| This affects the returned covariance ``pcov`` and error ``popt_err`` | ||
| parameters via ``pcov(absolute_sigma=False) = pcov * reduced_chisq`` | ||
| ``popt_err(absolute_sigma=False) = popt_err * sqrt(reduced_chisq)``. | ||
| """ | ||
| num_funcs = len(funcs) | ||
|
|
||
| # Get positions for indexes data sets | ||
| series = np.asarray(series, dtype=int) | ||
| idxs = [series == i for i in range(num_funcs)] | ||
|
|
||
| # Combine weights and sigma for transformation | ||
| if weights is None: | ||
| wsigma = sigma | ||
| else: | ||
| wsigma = np.zeros(ydata.size) | ||
| if sigma is None: | ||
| for i in range(num_funcs): | ||
| wsigma[idxs[i]] = 1 / np.sqrt(weights[i]) | ||
| else: | ||
| for i in range(num_funcs): | ||
| wsigma[idxs[i]] = sigma[idxs[i]] / np.sqrt(weights[i]) | ||
|
|
||
| # Define multi-objective function | ||
| def f(x, *params): | ||
| y = np.zeros(x.size) | ||
| for i in range(num_funcs): | ||
| xi = x[idxs[i]] | ||
| yi = funcs[i](xi, *params) | ||
| y[idxs[i]] = yi | ||
| return y | ||
|
|
||
| # Run linearized curve_fit | ||
| analysis_result = curve_fit(f, xdata, ydata, p0, sigma=wsigma, bounds=bounds, **kwargs) | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code of this function is difficult to follow in the lack of internal documentation. Also, in the doc string, the meaning of the |
||
| return analysis_result | ||
|
|
||
|
|
||
| def process_curve_data( | ||
| data: List[Dict[str, any]], data_processor: Callable, x_key: str = "xval", **filters | ||
| ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: | ||
| """Return tuple of arrays (x, y, sigma) data for curve fitting. | ||
|
|
||
| Args | ||
| data: list of circuit data dictionaries containing counts. | ||
| data_processor: callable for processing data to y, sigma | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this intended to be the same data processor as in #22 ?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I left comments on this in your PR. Ideally I would like if it could be either a DataProcessor from 22, or just a single node, or a regular function with the correct signature. |
||
| x_key: key for extracting xdata value from metadata (Default: "xval"). | ||
| filters: additional kwargs to filter metadata on. | ||
|
|
||
| Returns: | ||
| tuple: ``(x, y, sigma)`` tuple of arrays of x-values, | ||
| y-values, and standard deviations of y-values. | ||
| """ | ||
| filtered_data = filter_data(data, **filters) | ||
| size = len(filtered_data) | ||
| xdata = np.zeros(size, dtype=float) | ||
| ydata = np.zeros(size, dtype=float) | ||
| ydata_var = np.zeros(size, dtype=float) | ||
|
|
||
| for i, datum in enumerate(filtered_data): | ||
| metadata = datum["metadata"] | ||
| xdata[i] = metadata[x_key] | ||
| y_mean, y_var = data_processor(datum) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The data process should be aware of whether
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is not a mistake. The computation of
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The choice of setting absolute_sigma = False/True should depend on how sigma was calculated, not the other way around. Absolute sigma is a property of curve fit function for how it calculates By defaulting to True we are assuming that sigma is usually specified in the same units of y, so in that sense we should make sure any data processors calculate sigma in the same units as y by default as well. |
||
| ydata[i] = y_mean | ||
| ydata_var[i] = y_var | ||
|
|
||
| return xdata, ydata, np.sqrt(ydata_var) | ||
|
|
||
|
|
||
| def process_multi_curve_data( | ||
| data: List[Dict[str, any]], | ||
| data_processor: Callable, | ||
| x_key: str = "xval", | ||
| series_key: str = "series", | ||
| **filters, | ||
| ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: | ||
| """Return tuple of arrays (series, x, y, sigma) data for multi curve fitting. | ||
|
|
||
| Args | ||
| data: list of circuit data dictionaries. | ||
| data_processor: callable for processing data to y, sigma | ||
| x_key: key for extracting xdata value from metadata (Default: "xval"). | ||
| series_key: key for extracting series value from metadata (Default: "series"). | ||
| filters: additional kwargs to filter metadata on. | ||
|
|
||
| Returns: | ||
| tuple: ``(series, x, y, sigma)`` tuple of arrays of series values, | ||
| x-values, y-values, and standard deviations of y-values. | ||
| """ | ||
| filtered_data = filter_data(data, **filters) | ||
| size = len(filtered_data) | ||
| series = np.zeros(size, dtype=int) | ||
| xdata = np.zeros(size, dtype=float) | ||
| ydata = np.zeros(size, dtype=float) | ||
| ydata_var = np.zeros(size, dtype=float) | ||
|
|
||
| for i, datum in enumerate(filter_data): | ||
| metadata = datum["metadata"] | ||
| series[i] = metadata[series_key] | ||
| xdata[i] = metadata[x_key] | ||
| y_mean, y_var = data_processor(datum) | ||
| ydata[i] = y_mean | ||
| ydata_var[i] = y_var | ||
|
|
||
| return series, xdata, ydata, np.sqrt(ydata_var) | ||
Uh oh!
There was an error while loading. Please reload this page.