New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct stat uncertainty calculation #24
Comments
As @alexander-held has very helpfully pointed out on Gitter I want to be able to use
this can be seen working below import numpy as np
import hist
from hist import Hist
import uproot4 as uproot
root_file = uproot.open("example.root")
values, edges = root_file["data"].to_numpy()
stat_uncert = np.sqrt(values)
_hist = Hist(
hist.axis.Regular(len(edges) - 1, edges[0], edges[-1], name=None),
storage=hist.storage.Weight(),
)
_hist[...] = np.stack([values, np.square(stat_uncert)], axis=-1)
print(f"yields: {_hist.view().value}")
print(f"\nstat_uncert: {np.sqrt(_hist.view().variance)}")
@henryiii has confirmed this and has mentioned there is work to make this work with visualization in the near future. 🚀 |
@alexander-held also sent a nice minimal working example of stat uncertainty propagation being done properly by default. 👍 import boost_histogram as bh
import numpy as np
bins = [1, 2, 3]
yields = np.asarray([3, 4])
stdev = np.asarray([1, 2])
h1 = bh.Histogram(
bh.axis.Variable(bins, underflow=False, overflow=False),
storage=bh.storage.Weight(),
)
h1[...] = np.stack([yields, stdev ** 2], axis=-1)
yields = np.asarray([5, 5])
stdev = np.asarray([1, 1])
h2 = bh.Histogram(
bh.axis.Variable(bins, underflow=False, overflow=False),
storage=bh.storage.Weight(),
)
h2[...] = np.stack([yields, stdev ** 2], axis=-1)
for hist in [h1, h2, h1 + h2]:
print(f"yields: {hist.view().value}, stdev: {np.sqrt(hist.view().variance)}")
|
This still seems to still be an issue, as
and # example.py
import hist
import numpy as np
import uproot
from hist import Hist
def convert_root_to_hist(in_hist):
values, edges = in_hist.to_numpy()
_hist = Hist(
hist.axis.Regular(len(edges) - 1, edges[0], edges[-1], name=None),
storage=hist.storage.Weight(),
)
# Poisson variance equals values
variances = values
_hist[...] = np.stack([values, variances], axis=-1)
return _hist
if __name__ == "__main__":
root_file = uproot.open("example.root")
data_hist = root_file["data"].to_hist()
print(f"data hist values: {data_hist.values()}")
print(f"Poisson uncertainty on values: {np.sqrt(data_hist.values())}")
print("\n# Unexpected results from .variances()...\n")
print(f"data hist variances: {data_hist.variances()}")
print(
f"sqrt of data hist variances: {np.sqrt(data_hist.variances())}"
) # This is just the count values again, strangely
print("\n# Building weighted from hand works...\n")
_hist = convert_root_to_hist(root_file["data"])
print(f"yields: {_hist.values()}")
print(f"\nstat_uncert: {np.sqrt(_hist.variances())}") gives
If a weight storage |
Can you print a repr of the hist given by |
Oh, is there no variance info for the uproot hist? So it's a Double storage but the automatic variances are not working? |
>>> from hist import Hist
>>> import uproot
>>> root_file = uproot.open("example.root")
>>> data_hist = root_file["data"].to_hist()
>>> print(data_hist)
regular(50, 0, 1000, metadata={'name': 'xaxis', 'title': ''}, options=underflow | overflow)
(-1): 0 ( 0): 0 ( 1): 0 ( 2): 0 ( 3): 0 ( 4): 1
( 5): 16 ( 6): 95 ( 7): 272 ( 8): 459 ( 9): 591 (10): 674
(11): 658 (12): 623 (13): 531 (14): 456 (15): 389 (16): 321
(17): 254 (18): 194 (19): 161 (20): 120 (21): 95 (22): 74
(23): 60 (24): 53 (25): 38 (26): 27 (27): 19 (28): 20
(29): 15 (30): 11 (31): 10 (32): 14 (33): 5 (34): 6
(35): 3 (36): 7 (37): 2 (38): 6 (39): 1 (40): 3
(41): 2 (42): 3 (43): 0 (44): 0 (45): 1 (46): 0
(47): 0 (48): 0 (49): 0 (50): 0
>>> hist_2 = Hist(root_file["data"])
>>> print(hist_2)
regular(50, 0, 1000, metadata={'name': 'xaxis', 'title': ''}, options=underflow | overflow)
(-1): 0 ( 0): 0 ( 1): 0 ( 2): 0 ( 3): 0 ( 4): 1
( 5): 16 ( 6): 95 ( 7): 272 ( 8): 459 ( 9): 591 (10): 674
(11): 658 (12): 623 (13): 531 (14): 456 (15): 389 (16): 321
(17): 254 (18): 194 (19): 161 (20): 120 (21): 95 (22): 74
(23): 60 (24): 53 (25): 38 (26): 27 (27): 19 (28): 20
(29): 15 (30): 11 (31): 10 (32): 14 (33): 5 (34): 6
(35): 3 (36): 7 (37): 2 (38): 6 (39): 1 (40): 3
(41): 2 (42): 3 (43): 0 (44): 0 (45): 1 (46): 0
(47): 0 (48): 0 (49): 0 (50): 0
>>>
@henryiii Correct, heputils/tests/example_files.py Lines 248 to 263 in 9450913
but maybe that's my failing on how I'm writing the ROOT file with |
Could you print a |
oh woops. I was blanking out and forgot that I could just call >>> from hist import Hist
>>> import uproot
>>> root_file = uproot.open("example.root")
>>> data_hist = data_hist = root_file["data"].to_hist()
>>> repr(data_hist)
"Hist(Regular(50, 0, 1000, name='xaxis', label='xaxis'), storage=Weight()) # Sum: WeightedSum(value=6290, variance=2.83e+06)"
>>> hist_2 = Hist(root_file["data"])
>>> repr(hist_2)
"Hist(Regular(50, 0, 1000, name='xaxis', label='xaxis'), storage=Weight()) # Sum: WeightedSum(value=6290, variance=2.83e+06)"
>>> sanity_check = (
... Hist.new
... .Reg(10, 0 ,1, name="x", label="x-axis")
... .Var(range(10), name="y", label="y-axis")
... .Int64()
... )
>>> print(sanity_check)
Hist(
Regular(10, 0, 1, name='x', label='x-axis'),
Variable([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
storage=Int64())
>>> repr(sanity_check)
"Hist(\n Regular(10, 0, 1, name='x', label='x-axis'),\n Variable([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),\n storage=Int64())" |
Weird, why is |
This is a weighed histogram, though, so I think that means something is getting squared when this is being stored. Uproot's variances are fine? |
This should be directly converted from fSumw2 from uproot. I need to refactor this code to use the new (0.12+) direct view setting feature. |
Given that the variances just from
it seems that there is something happening with |
Discussion on the variances is moving to scikit-hep/uproot3-methods#97 |
This is solved in scikit-hep/uproot3-methods#98 and with the release of |
In closing, this sort of thing works now (inside of # example.py
import hist
import numpy as np
from hist import Hist
if __name__ == "__main__":
bins = [1, 2, 3]
yields = np.asarray([3, 4])
stdev = np.asarray([1, 2])
h1 = Hist(
hist.axis.Regular(len(bins) - 1, bins[0], bins[-1]),
storage=hist.storage.Weight(),
)
h1[...] = np.stack([yields, stdev ** 2], axis=-1)
yields = np.asarray([5, 5])
stdev = np.asarray([1, 1])
h2 = Hist(
hist.axis.Regular(len(bins) - 1, bins[0], bins[-1]),
storage=hist.storage.Weight(),
)
h2[...] = np.stack([yields, stdev ** 2], axis=-1)
for hist in [h1, h2, h1 + h2]:
print(f"yields: {hist.values()}, stdev: {np.sqrt(hist.variances())}") giving
|
The stat uncertainty is currently being given as the Poisson uncertainty on the histogram
heputils/src/heputils/plot.py
Line 110 in 8822879
but for a stacked histogram this is incorrect, as it should be the sum in quadrature of the uncertainties on each of the histograms in the stack.
(Thanks to @msneubauer for the reminder on this)
The text was updated successfully, but these errors were encountered: