-
Notifications
You must be signed in to change notification settings - Fork 131
Curve fitting analysis helper functions #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curve fitting analysis helper functions #19
Conversation
|
I can suggest lmfit as a really good and flexible package we’ve used. Has very nice models. lmfit builds on and extends many of the optimization methods of scipy.optimize. It has been mature for some time. Handles many non linear fit methods and provides useful enhancements |
|
Just a quick comment: How easily can we extend this base class for analysis of multiple or series of experiments? such as IRB. I think having data in I can write MVP :) |
|
I'm also asking myself the same question as @nkanazawa1989. I expect something more flexible, that allows multiple fittings and some processing of them within one analysis result. Some of this sort of things is already supported as a |
|
For the calibration module we have our own base analysis class which looks very similar: https://github.com/Qiskit/qiskit-experiments/pull/20/files#diff-746d34cd1c5ff8ee88c96d2914cacbf4c98b02d3b9acb3ba9c0958da5df5d0ac I'd be happy to subclass from |
|
After discussing offline a bit with Oliver I want to rethink the metadata for some more complicated cases he suggested including multiple objective functions. This is probably more along lines of what you suggest @nkanazawa1989. @eggerdj @nkanazawa1989 @yaelbh before going any further with this PR let's discuss more offline, and try come up with something that also includes everything you need for the curve fitting portion of the |
55e7ef3 to
f3cb555
Compare
|
I refactored this into some library functions for curve fitting and plotting. These functions are:
The |
cd23fc1 to
271995a
Compare
eggerdj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments to emphasis where the relations between this PR and #22 are located in the code.
|
I update this PR again to remove the actual |
If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.
|
|
||
| # Check data set has at least 1 more values than model parameters | ||
| num_params = len(param_p0) | ||
| if len(ydata) <= num_params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be num_params+1 otherwise chisq diverges
| xdata_range = [min(xdata), max(xdata)] | ||
|
|
||
| result = { | ||
| "popt": popt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question; if the user passed in a dict, should we return a dict here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid having two different return types, i left this as an array, and if p0 was supplied with a dict the popt_keys will include the corresponding key labels for each param
| """ | ||
| # Format p0 parameters if specified as dictionary | ||
| if isinstance(p0, dict): | ||
| param_keys = list(p0.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will make the ordering of p0 as a vector unstable, which will probably mess someone up when they assume the covariance matrix is always ordered the same. Maybe sort alphabetically by key or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the popt, popt_keys, popt_err, pcov ordering will be the ordering of dict.keys. The popt_keys is intended to remove any ambiguity for which params these are. I don't think it makes that much more sense to order alphabetically instead of the order the dict was originally constructed if p0 is a dict.
* Fix error in level2_probabilities multiplying by shots instead of dividing * Add explicit method kwarg for mean_xy_data
| for i, datum in enumerate(filter_data): | ||
| metadata = datum["metadata"] | ||
| xdata[i] = metadata[x_key] | ||
| y_mean, y_var = data_processor(datum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data process should be aware of whether sigma is absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not a mistake. The computation of sigma is different for different values of absolute_sigma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice of setting absolute_sigma = False/True should depend on how sigma was calculated, not the other way around. Absolute sigma is a property of curve fit function for how it calculates pcov (and hence how we deduce popt_err). According to scipy docs this is pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)
By defaulting to True we are assuming that sigma is usually specified in the same units of y, so in that sense we should make sure any data processors calculate sigma in the same units as y by default as well.
| residues = (yfits - ydata) ** 2 | ||
| if sigma is not None: | ||
| residues = residues / (sigma ** 2) | ||
| chisq = np.sum(residues) / (len(yfits) - num_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to name this field of the analysis result chisq, or change it to reduced_chisq ? Also, maybe update the function documentation that reduced chi-square is calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had a discussion: if xdata consists of n points, and each was measured for s shots, should we fit n points, with the n means of s measurements, or should we fit n*s points? I argued that both are equivalent, because the objective function that's minimized by the least-squares algorithm is the same. This is still true (assuming that I was right in the first place), however after the fitting the returned reduced chi-square is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For you second question: After discussing this a bit with @oliverdial, I think the answer was they are only equivalent under some assumptions of the underlying noise models being the same. Eg. sampling noise vs single-shot measurement noise, is it gaussian or not etc. In general the sigma values could be different in the two cases, P(0) for N-shots is binomial dist but sigma for each single shot depends on measurement discriminator error, so this can effect chi-squared and pcov
* Change chisq -> reduced_chisq in result and docs * add dof (degrees of freedom) to returned results * add documentation about sigma by default being absolute sigma.
|
|
||
| # Run linearized curve_fit | ||
| analysis_result = curve_fit(f, xdata, ydata, p0, sigma=wsigma, bounds=bounds, **kwargs) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code of this function is difficult to follow in the lack of internal documentation. Also, in the doc string, the meaning of the series argument is not clear. The text at the beginning of the doc string implies that xdata, ydata, and sigma are expected to be 2D arrays, but then it turns out that they are 1D arrays, and the connection is not clear.
* Add CurveFitAnalysis class * Get outcome label from metadata * Fix import error message if no matplotlib * Add option for fitting means of batched data sets * Reorganize helper functions into separate file * Add curve_fit, multi_curve_fit, and plotting functions for xy data * Refactor example CurveFitAnalysis class to use library functions * Add curve fitting data * Remove CurveFitAnalysis class and leave as library functions * Linting * Clean up curve_fit_data * Remove unneeded check for meas level now that data processor function is an arg * Add optional kwargs for x and series metadata keys for extracting data * Run black * Add filters to curve fit data * Reorganize files * Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py * Move other processing functions to `data_processing.py` * Remove unneeded init import * Initial review edits * Separate x and series data for multi curve fit Rename curve_fit_data to process_curve_data Rename multi_curve_fit_data to process_multi_curve_data * Allow passing p0, and bounds as dict to curve fit * Add sigma to mean_xy_data If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance. * Set default absolute_sigma = True * Fix logical check on np array * Use reduced chi-squared calculation * fixup data processing * Fix error in level2_probabilities multiplying by shots instead of dividing * Add explicit method kwarg for mean_xy_data * Add basic tests for curve fitting * More review edits * Change chisq -> reduced_chisq in result and docs * add dof (degrees of freedom) to returned results * add documentation about sigma by default being absolute sigma. Co-authored-by: Helena Zhang <helena@astbb.com>
* RB experiment class based on Chris' draft * Unifying sampling procedures * Some more unification * Style * Style fixes * Analysis now works directly with curve_fit functions * Added plotting (ad-hoc solution for now) * Linting * Adding error bars to plots * fixed link in readme (#15) * Curve fitting analysis helper functions (#19) * Add CurveFitAnalysis class * Get outcome label from metadata * Fix import error message if no matplotlib * Add option for fitting means of batched data sets * Reorganize helper functions into separate file * Add curve_fit, multi_curve_fit, and plotting functions for xy data * Refactor example CurveFitAnalysis class to use library functions * Add curve fitting data * Remove CurveFitAnalysis class and leave as library functions * Linting * Clean up curve_fit_data * Remove unneeded check for meas level now that data processor function is an arg * Add optional kwargs for x and series metadata keys for extracting data * Run black * Add filters to curve fit data * Reorganize files * Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py * Move other processing functions to `data_processing.py` * Remove unneeded init import * Initial review edits * Separate x and series data for multi curve fit Rename curve_fit_data to process_curve_data Rename multi_curve_fit_data to process_multi_curve_data * Allow passing p0, and bounds as dict to curve fit * Add sigma to mean_xy_data If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance. * Set default absolute_sigma = True * Fix logical check on np array * Use reduced chi-squared calculation * fixup data processing * Fix error in level2_probabilities multiplying by shots instead of dividing * Add explicit method kwarg for mean_xy_data * Add basic tests for curve fitting * More review edits * Change chisq -> reduced_chisq in result and docs * add dof (degrees of freedom) to returned results * add documentation about sigma by default being absolute sigma. Co-authored-by: Helena Zhang <helena@astbb.com> * Example notebook * Linting and documentation * Some more linting * Small change * Linting Co-authored-by: Yael Ben-Haim <yaelbh@il.ibm.com> Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> Co-authored-by: Helena Zhang <helena@astbb.com>
* Add CurveFitAnalysis class * Get outcome label from metadata * Fix import error message if no matplotlib * Add option for fitting means of batched data sets * Reorganize helper functions into separate file * Add curve_fit, multi_curve_fit, and plotting functions for xy data * Refactor example CurveFitAnalysis class to use library functions * Add curve fitting data * Remove CurveFitAnalysis class and leave as library functions * Linting * Clean up curve_fit_data * Remove unneeded check for meas level now that data processor function is an arg * Add optional kwargs for x and series metadata keys for extracting data * Run black * Add filters to curve fit data * Reorganize files * Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py * Move other processing functions to `data_processing.py` * Remove unneeded init import * Initial review edits * Separate x and series data for multi curve fit Rename curve_fit_data to process_curve_data Rename multi_curve_fit_data to process_multi_curve_data * Allow passing p0, and bounds as dict to curve fit * Add sigma to mean_xy_data If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance. * Set default absolute_sigma = True * Fix logical check on np array * Use reduced chi-squared calculation * fixup data processing * Fix error in level2_probabilities multiplying by shots instead of dividing * Add explicit method kwarg for mean_xy_data * Add basic tests for curve fitting * More review edits * Change chisq -> reduced_chisq in result and docs * add dof (degrees of freedom) to returned results * add documentation about sigma by default being absolute sigma. Co-authored-by: Helena Zhang <helena@astbb.com>
* RB experiment class based on Chris' draft * Unifying sampling procedures * Some more unification * Style * Style fixes * Analysis now works directly with curve_fit functions * Added plotting (ad-hoc solution for now) * Linting * Adding error bars to plots * fixed link in readme (qiskit-community#15) * Curve fitting analysis helper functions (qiskit-community#19) * Add CurveFitAnalysis class * Get outcome label from metadata * Fix import error message if no matplotlib * Add option for fitting means of batched data sets * Reorganize helper functions into separate file * Add curve_fit, multi_curve_fit, and plotting functions for xy data * Refactor example CurveFitAnalysis class to use library functions * Add curve fitting data * Remove CurveFitAnalysis class and leave as library functions * Linting * Clean up curve_fit_data * Remove unneeded check for meas level now that data processor function is an arg * Add optional kwargs for x and series metadata keys for extracting data * Run black * Add filters to curve fit data * Reorganize files * Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py * Move other processing functions to `data_processing.py` * Remove unneeded init import * Initial review edits * Separate x and series data for multi curve fit Rename curve_fit_data to process_curve_data Rename multi_curve_fit_data to process_multi_curve_data * Allow passing p0, and bounds as dict to curve fit * Add sigma to mean_xy_data If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance. * Set default absolute_sigma = True * Fix logical check on np array * Use reduced chi-squared calculation * fixup data processing * Fix error in level2_probabilities multiplying by shots instead of dividing * Add explicit method kwarg for mean_xy_data * Add basic tests for curve fitting * More review edits * Change chisq -> reduced_chisq in result and docs * add dof (degrees of freedom) to returned results * add documentation about sigma by default being absolute sigma. Co-authored-by: Helena Zhang <helena@astbb.com> * Example notebook * Linting and documentation * Some more linting * Small change * Linting Co-authored-by: Yael Ben-Haim <yaelbh@il.ibm.com> Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> Co-authored-by: Helena Zhang <helena@astbb.com>
Summary
Adds an analysis class wrappingscipy.optimize.curve_fitfor reuse in other experiments.Adds helper functions for writing curve fitting analysis classes. These include functions for extracting single or multi-series x, y, sigma data, fitting functions for a single or multi curve fit, and plotting functions.
Details and comments
As an example of this being used for RB analysis see this notebook