Skip to content

Conversation

@chriseclectic
Copy link
Collaborator

@chriseclectic chriseclectic commented Mar 17, 2021

Summary

Adds an analysis class wrapping scipy.optimize.curve_fit for reuse in other experiments.

Adds helper functions for writing curve fitting analysis classes. These include functions for extracting single or multi-series x, y, sigma data, fitting functions for a single or multi curve fit, and plotting functions.

Details and comments

As an example of this being used for RB analysis see this notebook

@zlatko-minev
Copy link

zlatko-minev commented Mar 17, 2021

I can suggest lmfit as a really good and flexible package we’ve used. Has very nice models.
Main thing is to have a function to make inital guesses. Lmfit has some guesses built in. In my experience, for custom functions and models one ideally wants to use their own custom guess function.

lmfit builds on and extends many of the optimization methods of scipy.optimize. It has been mature for some time. Handles many non linear fit methods and provides useful enhancements
Main things: parameter and results classes, and the minimize function interface for general models

@nkanazawa1989
Copy link
Collaborator

Just a quick comment: How easily can we extend this base class for analysis of multiple or series of experiments? such as IRB. I think having data in dataframe gives more flexibility. For example
https://github.com/nkanazawa1989/qiskit-ignis/blob/3a472e13d93306e418bdeb2d0c6870a895372a82/qiskit/ignis/experiments/calibration/cal_base_analysis.py#L86-L104

I can write MVP :)

@chriseclectic chriseclectic changed the title Add CurveFitAnalysis class [WIP] Add CurveFitAnalysis class Mar 18, 2021
@yaelbh
Copy link
Collaborator

yaelbh commented Mar 18, 2021

I'm also asking myself the same question as @nkanazawa1989. I expect something more flexible, that allows multiple fittings and some processing of them within one analysis result. Some of this sort of things is already supported as a CompositeExperiment. Also not everyone must inherit from CurveFitAnalysis. But we can think of extracting out the chisq function, and then CurveFitAnalysis can call it but also other analysis classes.

@eggerdj eggerdj mentioned this pull request Mar 18, 2021
2 tasks
@eggerdj
Copy link
Contributor

eggerdj commented Mar 19, 2021

For the calibration module we have our own base analysis class which looks very similar: https://github.com/Qiskit/qiskit-experiments/pull/20/files#diff-746d34cd1c5ff8ee88c96d2914cacbf4c98b02d3b9acb3ba9c0958da5df5d0ac I'd be happy to subclass from CurveFitAnalysis but it needs to be able to handle multiple curves on the same plot (think of a Ramsey or a drag calibration).

@chriseclectic
Copy link
Collaborator Author

After discussing offline a bit with Oliver I want to rethink the metadata for some more complicated cases he suggested including multiple objective functions. This is probably more along lines of what you suggest @nkanazawa1989.

@eggerdj @nkanazawa1989 @yaelbh before going any further with this PR let's discuss more offline, and try come up with something that also includes everything you need for the curve fitting portion of the BaseCalibrationAnalysis class.

@chriseclectic
Copy link
Collaborator Author

I refactored this into some library functions for curve fitting and plotting. These functions are:

  • curve_fit wrapper for scipy curve_fit which returns an AnalysisResult object
  • multi_curve_fit a function for fitting multiple functions to indexed data over the same optimization parameters. This handles linearizing the multi objective and calling curve_fit.
  • plot_curve_fit generate a line plot with shaded confidence interval from curve fit result
  • plot_scatter generate scatter plot of (xdata, ydata) data
  • plot_errorbar generate errorbar plot of (xdata, ydata, sigma) data

The CurveFitAnalysis class puts them together for an example curve fit of count data, but is mostly a place holder since it still needs work for model definition and data processing which is being discussed and worked on in other PRs.

Copy link
Contributor

@eggerdj eggerdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments to emphasis where the relations between this PR and #22 are located in the code.

@yaelbh yaelbh mentioned this pull request Mar 29, 2021
@chriseclectic chriseclectic changed the title [WIP] Add CurveFitAnalysis class Curve fitting analysis helper functions Apr 8, 2021
@chriseclectic
Copy link
Collaborator Author

I update this PR again to remove the actual CurveFitAnalysis class and just include the library of helper functions. I think this can be merged first for use in other classes (such as T1, RB), and writing an actual curve fit analysis base class can come later after more discussion.

If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.

# Check data set has at least 1 more values than model parameters
num_params = len(param_p0)
if len(ydata) <= num_params:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be num_params+1 otherwise chisq diverges

xdata_range = [min(xdata), max(xdata)]

result = {
"popt": popt,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question; if the user passed in a dict, should we return a dict here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having two different return types, i left this as an array, and if p0 was supplied with a dict the popt_keys will include the corresponding key labels for each param

"""
# Format p0 parameters if specified as dictionary
if isinstance(p0, dict):
param_keys = list(p0.keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will make the ordering of p0 as a vector unstable, which will probably mess someone up when they assume the covariance matrix is always ordered the same. Maybe sort alphabetically by key or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the popt, popt_keys, popt_err, pcov ordering will be the ordering of dict.keys. The popt_keys is intended to remove any ambiguity for which params these are. I don't think it makes that much more sense to order alphabetically instead of the order the dict was originally constructed if p0 is a dict.

* Fix error in level2_probabilities multiplying by shots instead of dividing
* Add explicit method kwarg for mean_xy_data
for i, datum in enumerate(filter_data):
metadata = datum["metadata"]
xdata[i] = metadata[x_key]
y_mean, y_var = data_processor(datum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data process should be aware of whether sigma is absolute

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not a mistake. The computation of sigma is different for different values of absolute_sigma.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eggerdj Note the previous comment because it's relevant to #22

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of setting absolute_sigma = False/True should depend on how sigma was calculated, not the other way around. Absolute sigma is a property of curve fit function for how it calculates pcov (and hence how we deduce popt_err). According to scipy docs this is pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)

By defaulting to True we are assuming that sigma is usually specified in the same units of y, so in that sense we should make sure any data processors calculate sigma in the same units as y by default as well.

residues = (yfits - ydata) ** 2
if sigma is not None:
residues = residues / (sigma ** 2)
chisq = np.sum(residues) / (len(yfits) - num_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to name this field of the analysis result chisq, or change it to reduced_chisq ? Also, maybe update the function documentation that reduced chi-square is calculated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had a discussion: if xdata consists of n points, and each was measured for s shots, should we fit n points, with the n means of s measurements, or should we fit n*s points? I argued that both are equivalent, because the objective function that's minimized by the least-squares algorithm is the same. This is still true (assuming that I was right in the first place), however after the fitting the returned reduced chi-square is different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For you second question: After discussing this a bit with @oliverdial, I think the answer was they are only equivalent under some assumptions of the underlying noise models being the same. Eg. sampling noise vs single-shot measurement noise, is it gaussian or not etc. In general the sigma values could be different in the two cases, P(0) for N-shots is binomial dist but sigma for each single shot depends on measurement discriminator error, so this can effect chi-squared and pcov

* Change chisq -> reduced_chisq in result and docs
* add dof (degrees of freedom) to returned results
* add documentation about sigma by default being absolute sigma.

# Run linearized curve_fit
analysis_result = curve_fit(f, xdata, ydata, p0, sigma=wsigma, bounds=bounds, **kwargs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code of this function is difficult to follow in the lack of internal documentation. Also, in the doc string, the meaning of the series argument is not clear. The text at the beginning of the doc string implies that xdata, ydata, and sigma are expected to be 2D arrays, but then it turns out that they are 1D arrays, and the connection is not clear.

@yaelbh yaelbh mentioned this pull request Apr 18, 2021
@CLAassistant
Copy link

CLAassistant commented Apr 19, 2021

CLA assistant check
All committers have signed the CLA.

@coruscating coruscating merged commit c9a355e into qiskit-community:main Apr 19, 2021
gadial pushed a commit to gadial/qiskit-experiments that referenced this pull request Apr 19, 2021
* Add CurveFitAnalysis class

* Get outcome label from metadata

* Fix import error message if no matplotlib

* Add option for fitting means of batched data sets

* Reorganize helper functions into separate file

* Add curve_fit, multi_curve_fit, and plotting functions for xy data

* Refactor example CurveFitAnalysis class to use library functions

* Add curve fitting data

* Remove CurveFitAnalysis class and leave as library functions

* Linting

* Clean up curve_fit_data

* Remove unneeded check for meas level now that data processor function is an arg
* Add optional kwargs for x and series metadata keys for extracting data

* Run black

* Add filters to curve fit data

* Reorganize files

* Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py
* Move other processing functions to `data_processing.py`

* Remove unneeded init import

* Initial review edits

* Separate x and series data for multi curve fit

Rename curve_fit_data to process_curve_data
Rename multi_curve_fit_data to process_multi_curve_data

* Allow passing p0, and bounds as dict to curve fit

* Add sigma to mean_xy_data

If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.

* Set default absolute_sigma = True

* Fix logical check on np array

* Use reduced chi-squared calculation

* fixup data processing

* Fix error in level2_probabilities multiplying by shots instead of dividing
* Add explicit method kwarg for mean_xy_data

* Add basic tests for curve fitting

* More review edits

* Change chisq -> reduced_chisq in result and docs
* add dof (degrees of freedom) to returned results
* add documentation about sigma by default being absolute sigma.

Co-authored-by: Helena Zhang <helena@astbb.com>
coruscating pushed a commit that referenced this pull request Apr 20, 2021
* RB experiment class based on Chris' draft

* Unifying sampling procedures

* Some more unification

* Style

* Style fixes

* Analysis now works directly with curve_fit functions

* Added plotting (ad-hoc solution for now)

* Linting

* Adding error bars to plots

* fixed link in readme (#15)

* Curve fitting analysis helper functions (#19)

* Add CurveFitAnalysis class

* Get outcome label from metadata

* Fix import error message if no matplotlib

* Add option for fitting means of batched data sets

* Reorganize helper functions into separate file

* Add curve_fit, multi_curve_fit, and plotting functions for xy data

* Refactor example CurveFitAnalysis class to use library functions

* Add curve fitting data

* Remove CurveFitAnalysis class and leave as library functions

* Linting

* Clean up curve_fit_data

* Remove unneeded check for meas level now that data processor function is an arg
* Add optional kwargs for x and series metadata keys for extracting data

* Run black

* Add filters to curve fit data

* Reorganize files

* Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py
* Move other processing functions to `data_processing.py`

* Remove unneeded init import

* Initial review edits

* Separate x and series data for multi curve fit

Rename curve_fit_data to process_curve_data
Rename multi_curve_fit_data to process_multi_curve_data

* Allow passing p0, and bounds as dict to curve fit

* Add sigma to mean_xy_data

If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.

* Set default absolute_sigma = True

* Fix logical check on np array

* Use reduced chi-squared calculation

* fixup data processing

* Fix error in level2_probabilities multiplying by shots instead of dividing
* Add explicit method kwarg for mean_xy_data

* Add basic tests for curve fitting

* More review edits

* Change chisq -> reduced_chisq in result and docs
* add dof (degrees of freedom) to returned results
* add documentation about sigma by default being absolute sigma.

Co-authored-by: Helena Zhang <helena@astbb.com>

* Example notebook

* Linting and documentation

* Some more linting

* Small change

* Linting

Co-authored-by: Yael Ben-Haim <yaelbh@il.ibm.com>
Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>
Co-authored-by: Helena Zhang <helena@astbb.com>
@yaelbh yaelbh mentioned this pull request May 2, 2021
@coruscating coruscating added this to the Release 0.1 milestone Jun 16, 2021
@chriseclectic chriseclectic deleted the curve-fit-analysis branch July 21, 2021 18:45
@yaelbh yaelbh mentioned this pull request Oct 3, 2021
7 tasks
paco-ri pushed a commit to paco-ri/qiskit-experiments that referenced this pull request Jul 11, 2022
* Add CurveFitAnalysis class

* Get outcome label from metadata

* Fix import error message if no matplotlib

* Add option for fitting means of batched data sets

* Reorganize helper functions into separate file

* Add curve_fit, multi_curve_fit, and plotting functions for xy data

* Refactor example CurveFitAnalysis class to use library functions

* Add curve fitting data

* Remove CurveFitAnalysis class and leave as library functions

* Linting

* Clean up curve_fit_data

* Remove unneeded check for meas level now that data processor function is an arg
* Add optional kwargs for x and series metadata keys for extracting data

* Run black

* Add filters to curve fit data

* Reorganize files

* Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py
* Move other processing functions to `data_processing.py`

* Remove unneeded init import

* Initial review edits

* Separate x and series data for multi curve fit

Rename curve_fit_data to process_curve_data
Rename multi_curve_fit_data to process_multi_curve_data

* Allow passing p0, and bounds as dict to curve fit

* Add sigma to mean_xy_data

If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.

* Set default absolute_sigma = True

* Fix logical check on np array

* Use reduced chi-squared calculation

* fixup data processing

* Fix error in level2_probabilities multiplying by shots instead of dividing
* Add explicit method kwarg for mean_xy_data

* Add basic tests for curve fitting

* More review edits

* Change chisq -> reduced_chisq in result and docs
* add dof (degrees of freedom) to returned results
* add documentation about sigma by default being absolute sigma.

Co-authored-by: Helena Zhang <helena@astbb.com>
paco-ri pushed a commit to paco-ri/qiskit-experiments that referenced this pull request Jul 11, 2022
* RB experiment class based on Chris' draft

* Unifying sampling procedures

* Some more unification

* Style

* Style fixes

* Analysis now works directly with curve_fit functions

* Added plotting (ad-hoc solution for now)

* Linting

* Adding error bars to plots

* fixed link in readme (qiskit-community#15)

* Curve fitting analysis helper functions (qiskit-community#19)

* Add CurveFitAnalysis class

* Get outcome label from metadata

* Fix import error message if no matplotlib

* Add option for fitting means of batched data sets

* Reorganize helper functions into separate file

* Add curve_fit, multi_curve_fit, and plotting functions for xy data

* Refactor example CurveFitAnalysis class to use library functions

* Add curve fitting data

* Remove CurveFitAnalysis class and leave as library functions

* Linting

* Clean up curve_fit_data

* Remove unneeded check for meas level now that data processor function is an arg
* Add optional kwargs for x and series metadata keys for extracting data

* Run black

* Add filters to curve fit data

* Reorganize files

* Move curve_fit_data and mutli_curve_fit_data to curve_fitting.py
* Move other processing functions to `data_processing.py`

* Remove unneeded init import

* Initial review edits

* Separate x and series data for multi curve fit

Rename curve_fit_data to process_curve_data
Rename multi_curve_fit_data to process_multi_curve_data

* Allow passing p0, and bounds as dict to curve fit

* Add sigma to mean_xy_data

If sigma is provided the mean and variance is computed using inverse-variance weighting, otherwise it is computed as the sample mean and biased sample variance.

* Set default absolute_sigma = True

* Fix logical check on np array

* Use reduced chi-squared calculation

* fixup data processing

* Fix error in level2_probabilities multiplying by shots instead of dividing
* Add explicit method kwarg for mean_xy_data

* Add basic tests for curve fitting

* More review edits

* Change chisq -> reduced_chisq in result and docs
* add dof (degrees of freedom) to returned results
* add documentation about sigma by default being absolute sigma.

Co-authored-by: Helena Zhang <helena@astbb.com>

* Example notebook

* Linting and documentation

* Some more linting

* Small change

* Linting

Co-authored-by: Yael Ben-Haim <yaelbh@il.ibm.com>
Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>
Co-authored-by: Helena Zhang <helena@astbb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants