Uncertainties on Spectrum.data #43

bplimley · 2017-04-04T18:36:04Z

Closes #23.

If data input is an array of ints/floats, the data uncertainties automatically generated using sqrt(data), but with a minimum uncertainty of 1. Data can also be input as an array of uncertainties.ufloats in which case the uncertainties are not touched.

New Spectrum methods: data_vals, data_uncs

Spectra can be multiplied / divided by a factor that has uncertainty.

jccurtis · 2017-04-04T19:30:30Z

becquerel/core/spectrum.py

@@ -50,7 +51,12 @@ def __init__(self, data, bin_edges_kev=None):

        if len(data) == 0:
            raise SpectrumError('Empty spectrum data')
-        self.data = np.array(data, dtype=float)
+        if isinstance(data[0], UFloat):
+            self._data = np.array(data)


Maybe do this with np.issubdtype(data.dtype, UFloat)?

@jccurtis I like the issubdtype format but unfortunately:

In [140]: np.issubdtype(float, UFloat) Out[140]: True In [141]: np.issubdtype(int, UFloat) Out[141]: True In [144]: np.issubdtype(str, UFloat) Out[144]: True In [146]: np.issubdtype(np.ndarray, UFloat) Out[146]: True In [147]: np.issubdtype(bool, UFloat) Out[147]: True

What the heck, UFloat? Why are you so inclusive of everyone?

Boooo, okay this is unfortunate. We could do:

all([isinstance(d, UFloat) for d in data])

We could also keep the simple error check for speed. Ultimately we should make a Spectra class for large quantities of spectra. This takes ~1ms on my desktop for a 8192 channel spectrum.

chunhochow · 2017-04-04T19:31:33Z

becquerel/core/spectrum.py

+            self._data = np.array(data)
+        else:
+            unc = np.sqrt(data)
+            unc[unc == 0] = 1


Sorry dislike mutable variables. use maximum()?

bplimley · 2017-04-04T19:36:58Z

I should check the docstrings and add info about any assumptions made, e.g. on Spectrum.__init__().

jccurtis · 2017-04-05T22:54:18Z

becquerel/core/spectrum.py

@@ -39,7 +42,11 @@ def __init__(self, data, bin_edges_kev=None, input_file_object=None):
        """Initialize the spectrum.


Grrrr, this is for the previous line ...

Can we add another kwarg for uncs (or something else appropriate) which defaults to None but allows the user to pass in uncertainties without explicitly using the uncertainties package?

Very good idea. Done

jccurtis

Looks good now! There is some inefficiency with the uarray type checking but it is not critical for general use.

bplimley · 2017-04-07T05:34:13Z

So I didn't merge this yet because I wanted to think about how uncertainties work for energycal. Ideally, the uncertainty in a fit is represented by an uncertainty in the channel value of the feature, and this is passed into the calibrator for weighting the calibration curve fit. But we should also allow the user to enter a calibration point without uncertainty (in which case points are weighted equally... if there is a mix of uncertainties and non-uncertainty values, then I don't know).

I'm thinking the channels property of the EnergyCal will return an array of UFloats, and we add ch_vals and ch_uncs similar to the properties in Spectrum.

bplimley

To review where uncertainties are used:

Spectrum.data always has uncertainties on it
EnergyCalBase.channels is always UFloat type but may have np.nan uncertainties representing unknown uncertainty. If all uncertainties are nan they are weighted equally in the fit. If some are nan they are weighted equal to the average of the non-nan points. Not sure this is the best approach.
The scaling factor in spectrum multiplication/division can optionally have uncertainty. (I'm not sure the use case for this, but it is pretty simple.)

bplimley · 2017-04-08T06:51:44Z

becquerel/core/energycal.py

+          ch_unclist (optional): list/tuple/array of uncertainties of the
+            channel values
+          kevlist: list/tuple/array of the corresponding energy values [keV]
+          pairlist: list/tuple/array of paired values, (ch, kev)

        Raises:
          BadInput: for bad pairlist, chlist, and/or kevlist.
        """


The code of from_points is getting a little messy due to input handling. I can break the input handling out into another (private) method if that would be preferred. I don't see how to actually simplify the code, unless we reduce the checks that are currently done:

Allow calpoints to be specified by a list of pairs, or a pair of lists

Allow inputs of np.array in addition to list/tuple

see each of the BadInput exception descriptions for other situations that are checked. There are 5 different errors and only 1 is duplicated in two places (for pairlist input and chlist/kevlist input).

The first point might not be strictly necessary. We could eliminate the pairlist input style, which would simplify the logic and eliminate a couple of the exception cases.

bplimley · 2017-04-08T06:54:53Z

becquerel/core/energycal.py

        """

-        return np.array(list(self._calpoints.values()), dtype=float)
+        return np.array(list(self._calpoints.values()))
+


I modeled these properties on Spectrum.data, Spectrum.data_vals, Spectrum.data_uncs.

bplimley · 2017-04-08T07:04:30Z

becquerel/core/energycal.py

-        self._set_coeff('b', b)
-        self._set_coeff('c', c)
+        # normally channel is the independent variable.
+        # but uncertainty is on channel, not energy. so fit the inverse


If I'm not mistaken, the weights in a fit should correspond to the y-error rather than the x-error, since least squares is based on the y-residuals. Unfortunately, in an energy calibration fit, the x-values (channel) are the ones with uncertainty.

Three ways forward:

Fit the inverse of calibration curve equations, so that the x-error becomes the y-error and can be used to weight the fit. The quadratic case may be doable, I was looking at the quadratic equation but it's late. But more complex equations (like NaI light yield) will not invert nicely.

Ignore the uncertainty in channel value. (At least for non-invertable equations.)

Use the x-error to weight the y-residuals, even though it's wrong. If the calibration curve is mostly linear, I think this is a reasonable approximation.

I've done approach 1 for the linear cal here because it's easy enough.

bplimley · 2017-04-08T07:07:32Z

becquerel/core/utils.py

+        return unumpy.uarray(x_array, default_unc_func(x_array))
+
+
+def handle_unc(x_val, x_unc, default_unc):


There is some overlapping code between handle_uncs and handle_unc (the former handling lists/arrays, and the latter handling scalars). Should I combine the two functions?

I think the two should be combined to prevent the need for external type checking.

Well, currently the Spectrum and EnergyCal methods I've written require either a single value or a list/array of values, rather than accepting either. So rather than requiring type-checking in Spectrum or EnergyCal, by calling one or the other of these handle_unc[s] methods, it does the type-checking for the external method.

They could still be combined and the same functionality could be obtained by an optional kwarg in the new handle_unc.

bplimley · 2017-04-08T07:09:03Z

becquerel/core/utils.py

+            return False
+
+
+def handle_uncs(x_array, x_uncs, default_unc_func):


This code was moved here because both Spectrum and EnergyCalBase are doing the same thing in terms of handling uncertainty inputs. Is core/utils.py a good path/name?

👍 for the path name

bplimley · 2017-04-08T07:16:05Z

becquerel/core/energycal.py

+        if nan_weights.all():
+            sig = np.ones_like(nan_weights)
+        if nan_weights.any():
+            sig[nan_weights] = np.nanmean(sig)


Actually, if we want to make this behavior standard across calibration curves, this nan weight handling should be in the base class, I just realized.

bplimley · 2017-04-08T07:18:43Z

@jccurtis you should take another look at this when you have a chance. We can go over it together if you'd like. I have a number of points for discussion, in my line comments.

Too much science. Can't handle the science...

bplimley · 2017-04-13T00:55:56Z

I'm taking the energycal-related stuff out of this pull request, so that we can merge the basic uncertainties stuff more easily. (The other code is preserved in feature-energycal-uncertainties for later work.)

The "outdated" in-line comments in energycal.py and utils.py are on commits that are no longer a part of this pull request, so disregard them.

bplimley · 2017-04-13T00:58:57Z

@jccurtis the quantity of science in this PR should be manageable now. It is what we looked at together last week, when you approved, plus just a clean merge of the EnergyCal module from master branch.

Approve again and merge please?

jccurtis

Good call moving the ecal uncs to #45

jccurtis · 2017-04-13T17:55:22Z

@bplimley ready to delete the uncertainties branch?

bplimley · 2017-04-13T18:02:11Z

Thanks @jccurtis!

bplimley added 2 commits March 24, 2017 15:01

core.spectrum: implement uncertainties. existing tests pass

3d79512

core_test.py tests uncertainties, core.spectrum bugfix

4452eec

bplimley added core new feature labels Apr 4, 2017

requirements.txt uncertainties package

ba03e33

jccurtis reviewed Apr 4, 2017

View reviewed changes

chunhochow reviewed Apr 4, 2017

View reviewed changes

bplimley added 5 commits April 4, 2017 14:41

Spectrum.__init__ address PR comments

937044d

Merge branch 'master' into uncertainties

3931a59

core.spectrum docstrings reflect uncertainties

b92640f

core.spectrum revert np.issubdtype edit

6dfe036

core_test.py test construction with UFloats

aa2f423

jccurtis reviewed Apr 5, 2017

View reviewed changes

core.spectrum.Spectrum.__init__: uncs kwarg and tests

fad0536

jccurtis previously approved these changes Apr 6, 2017

View reviewed changes

Merge branch 'master' into uncertainties

6950f74

bplimley commented Apr 8, 2017

View reviewed changes

jccurtis mentioned this pull request Apr 11, 2017

Counts, CPS, CPS/keV #38

Closed

bplimley force-pushed the uncertainties branch 2 times, most recently from 5362123 to 6950f74 Compare April 13, 2017 00:51

bplimley mentioned this pull request Apr 13, 2017

Uncertainties in energy calibration points #45

Open

jccurtis approved these changes Apr 13, 2017

View reviewed changes

jccurtis merged commit bc9eb9b into master Apr 13, 2017

bplimley deleted the uncertainties branch April 13, 2017 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncertainties on Spectrum.data #43

Uncertainties on Spectrum.data #43

bplimley commented Apr 4, 2017 •

edited

Loading

jccurtis Apr 4, 2017

bplimley Apr 4, 2017

jccurtis Apr 5, 2017

chunhochow Apr 4, 2017

bplimley commented Apr 4, 2017

jccurtis Apr 5, 2017

bplimley Apr 5, 2017

jccurtis left a comment

bplimley commented Apr 7, 2017

bplimley left a comment •

edited

Loading

bplimley Apr 8, 2017

bplimley Apr 8, 2017

bplimley Apr 8, 2017

bplimley Apr 8, 2017

jccurtis Apr 10, 2017

bplimley Apr 10, 2017

bplimley Apr 8, 2017

jccurtis Apr 10, 2017

bplimley Apr 8, 2017

bplimley commented Apr 8, 2017

bplimley commented Apr 13, 2017

bplimley commented Apr 13, 2017 •

edited

Loading

jccurtis left a comment

jccurtis commented Apr 13, 2017

bplimley commented Apr 13, 2017

		@@ -39,7 +42,11 @@ def __init__(self, data, bin_edges_kev=None, input_file_object=None):
		"""Initialize the spectrum.

		return unumpy.uarray(x_array, default_unc_func(x_array))


		def handle_unc(x_val, x_unc, default_unc):

		return False


		def handle_uncs(x_array, x_uncs, default_unc_func):

Uncertainties on Spectrum.data #43

Uncertainties on Spectrum.data #43

Conversation

bplimley commented Apr 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplimley commented Apr 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jccurtis left a comment

Choose a reason for hiding this comment

bplimley commented Apr 7, 2017

bplimley left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplimley commented Apr 8, 2017

bplimley commented Apr 13, 2017

bplimley commented Apr 13, 2017 • edited Loading

jccurtis left a comment

Choose a reason for hiding this comment

jccurtis commented Apr 13, 2017

bplimley commented Apr 13, 2017

bplimley commented Apr 4, 2017 •

edited

Loading

bplimley left a comment •

edited

Loading

bplimley commented Apr 13, 2017 •

edited

Loading