Added a facet for array uncertainty propagation #1797

varchasgopalaswamy · 2023-06-06T01:06:03Z

I found pint quite useful in my research, but the one feature I've always wanted was units + uncertainty propagation + support for arrays. I made a small package that uses automatic differentiation to propagate uncertainties for all numpy functions, and I figured adding a facet for it in pint might be of interest! Let me know what you think.

MichaelTiemannOSC · 2023-06-07T10:36:35Z

Fascinating new package! Are you intending to obsolete the uncertainties package? I wrote a PR for supporting that: #1615

What I like about your approach is (1) nice modularity with facets, and (2) it uses ExtensionArray rather than wrappers. I think that since Pint and Pint-Pandas has gone all-in on ExtensionArrays, that's a great way to go.

But I wonder how you deal with parsing uncertainties as may inevitably be required when Quantities must be parsed, not only constructed. And is your package ready to do all the uncertainties tasks that would be required by Pint-Pandas?

hgrecco · 2023-06-07T12:06:31Z

Interesting PR. I think that @MichaelTiemannOSC questions are important, though.

varchasgopalaswamy · 2023-06-07T12:46:03Z

@MichaelTiemannOSC

How you deal with parsing uncertainties as may inevitably be required when Quantities must be parsed

By this I am guessing you mean converting a string "3 +- 5 um" into Uncertainty(3,5) * unit_registry('um')? If so, right now I don't think I do that. I think that's going to be a need for me as well down the line (my workflow manager serializes python objects into strings), so I'll likely implement that myself

And is your package ready to do all the uncertainties tasks that would be required by Pint-Pandas?

I don't think I got the ExtensionArray functionality working, primarily because I was having trouble making it work well with Pint normally - and then I saw the update with facets, and realized it was a lot more straightforward than I had thought! Again, having auto_uncertainties work with Pandas is something that's necessary for me so I expect I'll have that functionality in the near term.

Are you intending to obsolete the uncertainties package?

No, for a few reasons. First off, all the features above that you might want that aren't there yet. Second, it feels presumptuous without feedback from other users. Third, my package requires JAX for autodiff, and pip based binary installs of XLA only work on computers that support AVX instruction sets. Now AVX has been out for around a decade so I'm not sure if this is a big deal for the majority of users, but I have had to deal with older HPC systems where this was an issue.

If the PR is approved and users feel that they have everything they need, perhaps we can revisit this question then. Currently I view this PR as adding a bare minimum support for array based uncertainty propagation with units that we can build on going forwards.

andrewgsavage · 2023-06-07T13:08:18Z

I see you've used a single class for both scalar and arrays. I'd suggest having different classes for each will make it easier to write an ExtensionArray as pandas seems to be written with that in mind. It is possible to make it work as pint-pandas has shown, and it is improving, but there are still some pint-pandas features that don't currently work but would if pint split scalars and arrays. It would have saved many hours debugging too.

There are several bits of metadata you'll need to store with the ExtensionArray:

the numpy dtype you're storing the values and error (int, float...)
the error distribution type
the unit

This metadata is in addition to dealing with two bits of data (value and error) per scalar. I haven't yet seen an ExtensionArray that does this yet.

I'd suggest getting an ExtensionArray without units working as a first step, before dealing with storing and propgating units. I don't believe there's an ExtensionArray for uncertainties yet, so this would be valueable!

varchasgopalaswamy · 2023-06-10T18:29:36Z

But I wonder how you deal with parsing uncertainties as may inevitably be required when Quantities must be parsed, not only constructed.

@MichaelTiemannOSC the latest commits should add parsing support for scalar uncertainties. I looked through your PR #1615 to get an idea how to do it, but I thought this way seemed a bit easier - I simply added a few lines to string_preprocessor to turn '+/-', '+-' to the unicode "±" so that the existing tokenizer could handle it without needing lookaheads. I think that's fine since all the strings are treated as UTF-8, right?

I haven't tried to deal with array uncertainties, but I notice that pint can't parse array quantities in any case, so I didn't think that was vital to deal with right now. By this I mean

from pint import unit_registry 
import numpy as np 

x = np.asarray([2,3]) * unit_registry('s')
string_rep = str(x) 
# This is [2 3] s 
y = unit_registry.Quantity(string_rep)
# This is 6 s

I guess that's because the string gets tokenized to (2 * 3) * s?

MichaelTiemannOSC · 2023-08-30T08:46:52Z

I'm taking a look at your changes now--seeing how hard/easy it is for me to swap out what I did and swap in what you have done.

varchasgopalaswamy · 2023-08-30T14:17:54Z

I'm taking a look at your changes now--seeing how hard/easy it is for me to swap out what I did and swap in what you have done.

Let me know what you think. You probably need to install the version on GitHub rather than the released version on pypi; I've been adding better typing support.

We're using it in production internally and I've been pretty pleased with how smooth it's been, but I'd appreciate any feedback!

MichaelTiemannOSC · 2023-08-30T14:29:08Z

Yes, I am using a GitHub fork of your project (in case I need to suggest changes). So far it's looking very nice, but I'm still getting wires connected between how I used uncertainties and how I expect to us auto_uncertainties.

hgrecco · 2023-08-30T14:34:50Z

Awesome that this conversation is moving forward!

MichaelTiemannOSC · 2023-09-02T18:56:45Z

Here's my attempt to take stock of the various problems and solutions.

Python handles a range of numeric types (integer, float, complex), but math.isnan only works for int and float.

Numpy supports single- and multi-dimensional arrays. np.isnan operates on sclars and arrays. While one can construct numpy arrays holding arbitrary (object) types, numpy functions such as np.isnan raise errors if it cannot safely convert the data type of the scalar or array to a numeric type. (np.isnat similarly works only on datetime64 and timedelta64.)

The uncertainties package wraps both numeric and numpy types. Purely numeric types are overriden with ufunc functions that implement uncertainty math. Arrays of uncertainties are object arrays as far as Numpy is concerned, but when array operations are performed (such as adding together two arrays), the implicit map down the elements of the arrays activates the uncertainty ufunc to wrap the operation and produce an uncertainty result that can be rewrapped in an object Numpy array.

Numpy functions (such as isnan) are not overriden per se, but are implemented via unumpy versions, such as unumpy.isnan, which works on arrays of uncertainty values. And it's ugly, because any code that could touch uncertainties must check unp.isnan; np.isnan is not overriden the way that wrappers can override. And what's worse, when we consider Pandas arrays, which might use pd.NA instead of np.nan as their null value, we have to check pd.isna and unp.isnan (because unp.isnan doesn't know about pd.NA).

A unified Numpy/uncertainty interface would be nice!

Pint facets combine definitions (classes describing specific unit-related definitions), objects (that encapsulate behavior e.g. Context), and registry (implementing a subclass of PlainRegistry or class that can be mixed with it). The PlainRegistry implements the details of Quantity (which can be any numeric type or, if Numpy is available, array types as well). The NumpyRegistry brings together NumpyQuantity and NumpyUnit to specialize its definitions of Quantity and Unit.

The Measurement facet implements a scalar-only Quantity consisting of an uncertain magnitude and a unit. Quantities combined with Quantities yield Quantities, but Quantities combined with Measurements yield Measurements...except when Quantities contain array values, at which point Measurements are broadcast to the Quantity array:

>>> Q_([1, 2, 3], 'm') + M_(1, 2, 'm')
<Quantity([2.0+/-2.0 3.0+/-2.0 4.0+/-2.0], 'meter')>

Which means that like it or not, the Quantity facet does and will implement uncertainties within its facet.

In a sense the AutoUncertainties changes of this PR are what one would expect if we extended the Measurement facet to handle arrays itself, rather than punting those arrays back to the Quantity facet. And as we see, there's quite a bit of code and structure that has to be duplicated/reimplemented from the Quantity facet to give/(update) Measurements (to) the same functionality.

This gives us the potential niceness of being able to broadcast scalar Quantities to arrays of Measurements, Measurements to arrays of Quantities, and to do array operations on any combination of Quantities and Measurements (provided the shapes are compatible). Nice! But is it really so nice?

If Quantities can, by themselves, do all the work, what's the point of duplicating so much code to create a new thing that can do only half as much? One thing I've noticed is that AutoUncertainties allows uncertain values to be downcasted to nominal values whereas traditional uncertainties require users to explicitly ask for nominal values. I've seen the same sort of thing where some packages implementing complex numbers will just drop the imaginary component to automatically convert the complex number to a float, whereas other packages require the user to explicitly select the real component if that's what they want.

The big changes I think I have seen in the interfaces between Pint and Numpy/Pandas is that object arrays (Numpy) and ExtensionArrays (Pandas) are really solid when it comes to the kind of polymorphic behavior we seek to implement. But the more classes we have running around doing roughly the same things (such as Quantities and Measurements), the more we need to flesh out the combinatorial matrix. I can construct a PintArray from a sequence of Quantities, but not Measurements:

>>> PA_._from_sequence([Q_(1, 'm'), Q_(2, 'm')])
<PintArray>
[1, 2]
Length: 2, dtype: pint[meter]
>>> PA_._from_sequence([M_(1, 2, 'm'), M_(2, 3, 'm')])
*** ValueError: No dtype specified and not a sequence of quantities

Now, AutoUncertainties has a mostly-implemented Pandas EA for itself, but then we have to explain how PintArrays (which are grounded in Quantities) and UncertaintyArrays (grounded in Uncertainties) interoperate.

Finally, I made a Pull Request (hgrecco/pint-pandas#198) to start benchmarking PintArrays. The goal of that benchmarking is to both demonstrate what we consider to be the best, fastest implementation we can cleanly get using PintPandas, document that as a pattern, and ensure that we keep it running fast. New implementation ideas can compare themselves to that benchmark and see whether the new ideas really do help make things better, or instead just different (with possible performance penalties).

I acknowledge that the np.isnan vs. unp.isnan thing is ugly. But everything else about using uncertainties in Quantities looks better to me than creating a parallel university with Measurement (even if renamed Uncertainty).

codspeed-hq · 2023-09-13T14:07:56Z

CodSpeed Performance Report

Merging #1797 will not alter performance

_{Comparing varchasgopalaswamy:uncertainties (e3fc388) with master (ad02b87)}

Summary

✅ 437 untouched benchmarks

hgrecco · 2023-09-13T14:16:14Z

@MichaelTiemannOSC Thanks for the detailed explanation. I do agree with your view!

andrewgsavage · 2024-06-27T23:47:42Z

I am wondering a better approach is for pint to define an interface for uncertainty libraries, as both autouncertainties and uncertainties use almost the same variables and names. This would make it easier for potential users too as they'd be abe to use the same functiosn and methods from pint.

MichaelTiemannOSC · 2024-07-01T11:25:25Z

@andrewgsavage did you mean to say "as both autouncertainties and Measurements use almost the same variables and names."? I cannot quite parse your comment above. I did attend several Pandas community member meetings in hopes of advancing your ExtensionArray work. I encourage you to put the topic on the discussion agenda and try to attend one of those meetings.

andrewgsavage · 2024-07-01T11:35:09Z

meant to write 'as both autouncertainties and uncertainties '

ah I'd forgotten about that. was planning to make an EA for uncertainties as a seperate module without it being in pandas for the time being.

varchasgopalaswamy added 3 commits June 5, 2023 14:14

rename the first positional arg in _trapz to match numpy

4b885f5

added facet for auto uncertainties

5bad30a

fixed isinstance comparison

61c5fd0

added parsing, tests and a way to add errors directly

e49b894

varchasgopalaswamy added 6 commits June 10, 2023 14:40

amended setup to point to git for now

c374320

Merge branch 'hgrecco:master' into master

6c0505a

Merge branch 'hgrecco:master' into master

aa5d7a0

Merge branch 'hgrecco:master' into uncertainties

ed950ab

added uncertainties to typing

e4c7492

fix to type_check guard

e006eb6

MichaelTiemannOSC mentioned this pull request Aug 31, 2023

Uncertainty parsing considerations varchasgopalaswamy/AutoUncertainties#5

Open

MichaelTiemannOSC mentioned this pull request Nov 13, 2023

Add support for UFloat in PintArray (#139) hgrecco/pint-pandas#140

Open

5 tasks

varchasgopalaswamy and others added 3 commits February 1, 2024 10:37

Merge branch 'hgrecco:master' into master

bd13a5e

Merge branch 'hgrecco:master' into master

f9977ab

Merged updates from pint master

e3fc388

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a facet for array uncertainty propagation #1797

Added a facet for array uncertainty propagation #1797

varchasgopalaswamy commented Jun 6, 2023 •

edited

Loading

MichaelTiemannOSC commented Jun 7, 2023

hgrecco commented Jun 7, 2023

varchasgopalaswamy commented Jun 7, 2023 •

edited

Loading

andrewgsavage commented Jun 7, 2023

varchasgopalaswamy commented Jun 10, 2023

MichaelTiemannOSC commented Aug 30, 2023

varchasgopalaswamy commented Aug 30, 2023

MichaelTiemannOSC commented Aug 30, 2023

hgrecco commented Aug 30, 2023

MichaelTiemannOSC commented Sep 2, 2023

codspeed-hq bot commented Sep 13, 2023 •

edited

Loading

hgrecco commented Sep 13, 2023

andrewgsavage commented Jun 27, 2024 •

edited

Loading

MichaelTiemannOSC commented Jul 1, 2024

andrewgsavage commented Jul 1, 2024

Added a facet for array uncertainty propagation #1797

Are you sure you want to change the base?

Added a facet for array uncertainty propagation #1797

Conversation

varchasgopalaswamy commented Jun 6, 2023 • edited Loading

MichaelTiemannOSC commented Jun 7, 2023

hgrecco commented Jun 7, 2023

varchasgopalaswamy commented Jun 7, 2023 • edited Loading

andrewgsavage commented Jun 7, 2023

varchasgopalaswamy commented Jun 10, 2023

MichaelTiemannOSC commented Aug 30, 2023

varchasgopalaswamy commented Aug 30, 2023

MichaelTiemannOSC commented Aug 30, 2023

hgrecco commented Aug 30, 2023

MichaelTiemannOSC commented Sep 2, 2023

codspeed-hq bot commented Sep 13, 2023 • edited Loading

CodSpeed Performance Report

Merging #1797 will not alter performance

Summary

hgrecco commented Sep 13, 2023

andrewgsavage commented Jun 27, 2024 • edited Loading

MichaelTiemannOSC commented Jul 1, 2024

andrewgsavage commented Jul 1, 2024

varchasgopalaswamy commented Jun 6, 2023 •

edited

Loading

varchasgopalaswamy commented Jun 7, 2023 •

edited

Loading

codspeed-hq bot commented Sep 13, 2023 •

edited

Loading

andrewgsavage commented Jun 27, 2024 •

edited

Loading