Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-part uncertainties for numbers #24

Closed
josephwright opened this issue May 6, 2013 · 27 comments
Closed

Multi-part uncertainties for numbers #24

josephwright opened this issue May 6, 2013 · 27 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@josephwright
Copy link
Owner

[Slightly edited from an e-mail from Roberrt Riemann]

In physics you use sometimes the following syntax to indicate different types of errors: 2149 ± 46 ± 51. It would be nice to get that behaviour with something like

\num{10.55(18)(16)}
@ghost ghost assigned josephwright May 6, 2013
@josephwright
Copy link
Owner Author

Looking at this again, I feel that this is really beyond the scope of siunitx: I cannot find an example of this in anywhere I've looked. It can be achieved by hand:

\num[parse-numbers  = false ]{10.55 \pm 0.18 \pm 0.16}

and so I am closing this WONTFIX.

@josephwright
Copy link
Owner Author

I now have some examples for this: page 10 of http://www.springerlink.com/content/545u2ml70u605x42/ and page 13 of http://www.springerlink.com/content/ml21044675647532/. There seem to be three cases:

  • Numbers with a statistic error only, given as$(1.23 \pm 4.5)$\,pb
  • Numbers with an asymmetric error only, given as $\left( 1.23 \substack{+4.4 \\ -5.5} \right)$\,pb
  • Numbers with both types of error, given as $\left( 1.23 \pm 4.5 (\text{stat.}) \substack{+4.4 \\ -5.5} (\text{sys.}) \right)$\,pb

@josephwright
Copy link
Owner Author

I'm still struggling to come up with an interface for this. I think this is the sort of thing that would be best covered by my plans for v3, where I'd like to increase the separation of parts within the package to make 'pluggable' extension easier.

@joleroi
Copy link

joleroi commented Sep 7, 2017

I came here from this rather old post on stackexchange. Has there been any development towards multi-part or asymetric uncertainties?

@josephwright
Copy link
Owner Author

I've considered it by am still not that happy: the internals one needs to cope with such values are very complex and for almost all use cases are not needed. There's therefore a performance hit which I'm not keen on, plus a lot of work for me at the 'back end'. I'm also concerned that it ends up mixing concepts: the (...) value in siunitx has always conceptually been an uncertainty.

@alexshpilkin
Copy link

@josephwright For an example, try any recent results report in hep-ex. Literally the most recent submitted paper mentions “0.67 ± 0.18 (stat) ± 0.05 (syst)”. [If there were more than one result of this kind in the paper, the (stat) and (syst) labels would most probably appear “out of band” in the surrounding text.] I recently looked at an astrophysics paper that had four uncertainties (perturbative calculation, numerical modelling, averaging, instrument).

Note that while it is perhaps correct to say (about this as well as about #273) that the people who request such advanced features are perhaps a minority, the number of such uses might not be that insignificant (the RPP alone is about two thousand pages). And they care a lot. I mean, I’d like to recommend siunitx to my experimenter friends as the typographically correct way, but I can’t, because it doesn’t satisfy their needs.

(To be honest, the 123(4) notation also looks rather specialized to me—even more specialized—, but that might just be my bias towards scientific rather than engineering literature.)

@josephwright
Copy link
Owner Author

@alexshpilkin Sure, the amount of use of a particular style of output is hard to judge: my impression to date is that multiple uncertainties are common in astrophysics, but not elsewhere. (The only examples I've ever been sent are from that area.)

On the 123(4) format, it's common enough to be mentioned in the BIPM documentation (https://www.bipm.org/en/publications/si-brochure/section5-3-5.html): ultimately that's the reference for SI units, and so for siunitx.

I've not closed the issue precisely because I know it's important. At the moment, I'm imagining I'll need to look at a swap-out parser, etc. (I've already got to cover complex numbers, exponents, multi-part numbers, ...: it's a tricky mix!)

@alexshpilkin
Copy link

alexshpilkin commented Jun 13, 2018

@josephwright Well, the paper I referenced is in high-energy (collider) physics (which is where I first encountered this as well). The th/model/stat/syst split is actually quite common there when the experiment is complex enough (and each of these parts may well be asymmetrical as per #273, except the statistical one).

As to the 123(4) notation, well, SI itself is essentially an engineering system—in the sense that it’s best at dealing with mostly everyday values, so e.g. chemistry also counts as engineering here. (The most frequent sources of the parenthetical notation in my experience are actually chemists, with tables of constants in the second place.) It’s not a bad thing, it’s just useful to know what informed its design (and, it seems, documentation) and understand its limits.

I hear you on parsing in TeX, it’s surprisingly painful for what’s essentially a macro language. It’s not a simple problem you solve, and if you don’t consider this issue to be unimportant, then I’m fine just pointing you at arXiv’s hep-ex as another source of examples.

@josephwright
Copy link
Owner Author

@alexshpilkin Hmm, the need or at least possibility of 'open ended' lists of uncertainties is itself a bit tricky. I wonder if I can come up with some 'container' syntax, for example multi-part-uncertainty = true, which then allows a 'pluggable' parser just for that part. I could go with something like [ ... ] for such multi-part uncertainties:

\num{1.2[\pm 1.8(stat) \pm 2.1 (syst)]}

I'd then need an interface for creating 'sub parsers' and 'sub printers' for such things: would address my concerns over ordering. Still looks a bit awkward but it might be workable.

@josephwright
Copy link
Owner Author

Carrying forward some ideas form #273 (closed as a duplicate of this question), there area essentially three things which need to be done here:

  1. Ensure that the internal number format has flexibility in the nature of an uncertainty
  2. Provide one or more parsers for the various types of multi-part uncertainty
  3. Provide printing routines for the various types of multi-part uncertainty

I'll use this issue as a 'meta' one, and open specific issues for each of those ideas.

@josephwright
Copy link
Owner Author

I've now implemented the necessary data storage in v3: see #342. Writing print routines will likely be easy enough, so those might also get done for v3.0. The issue will be parsers: I'm currently thinking of perhaps having uncertainty-mode and using that to determine what type of uncertainty to look for.

@maxnoe
Copy link

maxnoe commented Mar 18, 2019

I think most of us will be vey happy with a new macro, that should make it much easier to come up with a good interface. E.g.

\SIAsymUncert{value}{lower}{upper}{unit}

@BoostCookie
Copy link

Because you've closed #273 and #342 I'm writing here regarding asymmetric uncertainties. Because the symmetric uncertainty can be parsed as \num{number+-uncert} I think the assymetric uncertainty should be parsed as \num{number+upper-lower}.

@Phidica
Copy link

Phidica commented Nov 18, 2020

If there is still sense in making suggestions about the syntax, then to mirror the \num{123(4)} style, which produces 123 ± 4, I would like to suggest extending what can appear inside of the parentheses. I guess I'll just show some examples of what I'm imagining:

\num{123(4,5,6)}                             -->   123 ± 4 ± 5 ± 6
\num{123(+4,-5)}                             -->   123^{+4}_{-5}
\num{123(+1,-2[stat],5[syst],6[any text])}   -->   123^{+1}_{-2} (stat) ± 5 (syst) ± 6 (any text)

By using a comma separated list we can enumerate any number of uncertainty sources. One downside is with ensuring that asymmetric uncertainties are always properly defined and have exactly one component with a + and one with a -.

Anyway, I don't know whether this kind of syntax parsing is easy or incredibly difficult with the package as it stands. Just wanted to voice how this feature looks in the ideal world of my imagination :p

@josephwright
Copy link
Owner Author

@Phidica An interesting idea and perhaps one to pursue, although I'm not 100% sure about trying to freely mix symmetrical and asymmetrical uncertainties (I need to have some internal structures to print things).

@josephwright
Copy link
Owner Author

josephwright commented Apr 26, 2021

To update everyone, my current plan is to take small steps. I'm going I think with uncertainty-type (I need uncertainty-mode elsewhere, and in a sense it doesn't quite fit the other mode uses, which are more output-oriented).

The plan then is to take small steps. The first 'new' type of uncertainty I think will be a single-asymmetric one, so something like uncertainty-type = single-symmetrical for the current approach and uncertainty-type = single-asymmetrical for the 12.3+4-5 type. I don't fancy 'auto-detection' between the two. That would probably mean an input syntax 12.3(4)(5) would be hard-coded as equivalent to 12.3 +4 -5 in this case. Output then can be a straight copy of the input or 12.3^{+4}_{-5}.

I can then look at more open-ended types. @Phidica's suggestion for non-bounded lists is interesting, but I do wonder if that's common. It's also a lot easier at the internal level if I know how many components I'm handling. I wonder if the stat/sys split needs to have free text in the input, or could be covered by uncertainty-parts with then uncertainty-type = named-symmetrical or named-unsymmetrical (number of parts required then taken from uncertainty-parts). I guess that depends on whether the same names always turn up: do I need to cope with 'This value has a sys and a stat, this value only has a stat'?

For those interested, the internal format at the moment uses {S}{nnn} to represent the symmetrical value. I'm thinking of {A}{{nn}{mm}} for a single asymmetrical, then {S2}{{nnn}{nnn}} for a two-part symmetrical, etc. That way internally the code won't care about the naming: they are just 'a list in order'.

@maxfl
Copy link

maxfl commented Apr 27, 2021

stat/syst is not the only possibility for the uncertainties. I've met following cases:

  • triplets of stat/syst/theory were used;
  • asymmetric stat and syst uncertainties, here;
  • in case of the error budget estimation, the groups may be arbitrary (detector, background, etc). It worth noting that in case of number of uncertainties is larger then 3 they are usually typeset in a table.
  • different labels are used in papers: stat, stat., syst, syst., (stat), (stat.), etc.

My personal impression is that split in 2-3 groups is used most often. The labels vary.

Hope this helps.

@josephwright
Copy link
Owner Author

I'm working on an implementation for this area. Looking again at the parser problem, I suspect @Phidica's idea slightly modified is best. I'm imagining

\num{1.23(+1:-2;5;6)}

which will result in something like

1.23 \substack{+0.01 \\ -0.02} \pm 0.05 \pm 0.06

or similar. I think 'labelled' uncertainties are best handled by having an option uncertainty-classes or similar, so if that is set then we take the label from there

\num[uncertaint-classes = sys;stat]{1.23(+1:-2;5;6)} 

or

\num[uncertaint-classes = {sys,stat}]{1.23(+1:-2;5;6)} 

That leaves open how to best give the uncertainty parts. One might use the approach I've suggested above or might prefer

\num{1.23(+0.01:-0.02;\pm0.05;\pm0.06)}

perhaps then allowing a 'mix'

\num{1.23(+0.01:-0.02;5;\pm0.06)}

where with no leading \pm the uncertainty is treated like the current bracketed ones (given in the last places).

I'll probably try to come up with something for beta testing over the next couple of weeks.

@maxfl
Copy link

maxfl commented Mar 25, 2022

I like the proposed solution with no \pm, but the mixture is also ok.

@Phidica
Copy link

Phidica commented Mar 25, 2022

Having the flexibility for either syntax seems good for different user preferences. Controlling the labels with an option is also a good, clean approach that I like.

What were you thinking should happen if only one "class" name has been set in the preamble? Would it show up on all uncertainties, even simple (ie, single-part) ones? Or should there need to be at least two class names set, by definition of the circumstance of needing multi-part uncertainties?

@josephwright
Copy link
Owner Author

Having the flexibility for either syntax seems good for different user preferences. Controlling the labels with an option is also a good, clean approach that I like.

I have the parser code to build on, so I hope I can pull this off - it's a question of tracking the data internally correctly.

What were you thinking should happen if only one "class" name has been set in the preamble? Would it show up on all uncertainties, even simple (ie, single-part) ones? Or should there need to be at least two class names set, by definition of the circumstance of needing multi-part uncertainties?

I was thinking something like this

  • If there is a single uncertainty (either a symmetrical or an asymmetrical), ignore any classes - so 1.23 \pm 0.04 prints the same as now
  • If there are multiple uncertainties, take the 'labels' in order, so with uncertainty-classes = sys;stat and 1.23(4;5;6) you'd get 1.23 \pm 0.04 \, (sys) \pm 0.05 \, (stat) \pm 0.06, i.e. if there are more uncertainty classes than labels, the remaining values are anonymous

(Implied there is some setting to decide how to format the classes)

@josephwright
Copy link
Owner Author

Continuing to think, I'm not keen on \num{1.23(\pm0.04;+0.05:-0.06) as that confuses the existing 'short' and 'long' syntaxes. So I think it needs to be \num 1.23 \pm 0.04 + 0.05 - 0.06 or \num{1.23(4;5:-6} or similar. The only question then is for the 'short' form is it better to have \num{1.23(4)(+5:-6) or \num{1.23(4;+5:-6)} or ... I'm thinking the second form, i.e. (...) is 'the entire uncertainty part'. I think overall I do want +...:-... explicitly for asymmetric uncertainties.

@Phidica
Copy link

Phidica commented Mar 26, 2022

I will say that mentally parsing the difference between the semicolons and colons when they're all in one big set of parentheses does take some focus, I think. In practice I'd probably be wanting to put whitespace around them so I can read them in my code. More fully "encapsulating" each uncertainty part in a different set of parentheses seems a lot more readable at a glance without needing to pad them out with spaces, if you're set on keeping the colon as the asymmetric separator (and I do like it for that). It also still feels fairly consistent with the existing design: one set of parentheses = one \pm uncertainty, therefore more parentheses in sequence = more uncertainty parts.

@josephwright
Copy link
Owner Author

I've closed sub-issue #344 with working code. What I don't have there yet is an interface for adjusting how multi-part uncertainties are printed. Probably I will do that after sorting extending the parser, at which point users can test.

For the present, if you want to check out the new code as far as it works, try something like

\documentclass{article}
\usepackage{siunitx}
\begin{document}
\ExplSyntaxOn
% One "A" uncertainty, one "S" one
% "A" = +75:-80, "S" = 15
% Likely input syntax \num{123.456(75:80)(15)}
\tl_set:Nn \l_tmpa_tl
  { { } { } { 123 } { 456 } { {AS} { {75} {80} } {15} } { }{ 0 } }
\exp_args:Nx \siunitx_print_number:n
  { \siunitx_number_output:N \l_tmpa_tl }
\ExplSyntaxOff
\end{document}

@josephwright
Copy link
Owner Author

I have

\documentclass{article}
\usepackage{siunitx}
\begin{document}
\num[uncertainty-descriptors = {sys,stat}]{1.23(4)(5)}
\end{document}

working. Next is likely the 1.23 \pm 0.04 \pm 0.05 format, then I'll look at asymmetrical values (I have the formatting all ready, it's just the parsing).

@josephwright
Copy link
Owner Author

The parser for 1.23 \pm 0.04 \pm 0.05 is now sorted. I'm now going to tidy up some aspects of that before even thinking about asymmetric values. In particular, I realise that one needs to worry about ambiguous number detection, which means I likely can't simply ignore uncertainty-mode.

@josephwright
Copy link
Owner Author

I am pushing to v3.2 for the asymmetrical aspect: I want to get some real usage of the multi-part symmetrical system first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants