Skip to content
This repository has been archived by the owner on Oct 25, 2022. It is now read-only.

KeyError: 'sumw2' #29

Closed
LovelyBuggies opened this issue Mar 27, 2020 · 7 comments
Closed

KeyError: 'sumw2' #29

LovelyBuggies opened this issue Mar 27, 2020 · 7 comments

Comments

@LovelyBuggies
Copy link
Collaborator

LovelyBuggies commented Mar 27, 2020

@jpivarski There might be a bug: when I run the code

root_hist = aghast.to_root(ghastly_hist, "root_hist")

A KeyError is thrown. It says no key "sumw2" is found. I search <"sumw2"> in this file and find this is the only one.

sumw = obj.counts[tuple(slc)]
if isinstance(sumw, dict):
sumw, sumw2 = sumw["sumw"], sumw["sumw2"]
sumw2 = numpy.array(sumw2, dtype=numpy.float64, copy=False)
else:
sumw2 = None

P.S., I am using pip install aghast, so the package I'm using might be a little out of date. But I check the latest code and the latest PR #24 , there are no changes with respect to to_root() in this file.

@jpivarski
Copy link
Member

Here is where it's supposed to come from:

@property
def array(self):
out = {"sumw": self.sumw.array}
if self.sumw2 is not None:
out["sumw2"] = self.sumw2.array
if self.unweighted is not None:
out["unweighted"] = self.unweighted.array
return out

The bit of code you've quoted checks for the difference between a Counts, which can be represented by a single array, and a WeightedCounts, which is an array of "sumw" and an array of "sumw2".

Notice from the commit history that Aghast hasn't been touched in months. It will probably be summer or later when I can cycle back to it and do any development on it. If this method failed, then apparently it was inadequately tested and should be fixed. If you investigate this problem and fix it (probably just an object or a name out of place), then submit a PR and I'll accept it.

@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 28, 2020

@jpivarski I have found why it doesn't work. Actually, it's not aghast's mistakes, it has something to do with boost-histogram's storage -- boost-histogram issue #289. We don' need to worry about that because @henryiii is working on it -- boost-histogram PR #290. I will deep into that issue and this would be solved by the way.

+ h = bh.Histogram(bh.axis.Regular(20, -5, 5))
+ h.fill(np.random.normal(0, 1, int(10e3)))
+ w, data = h.to_numpy()
+ numpy_hist = (w.astype(int), data)         # this works
- numpy_hist = (w, data).                    # this is broken
+ ghastly_hist = aghast.from_numpy(numpy_hist)
+ root_hist = aghast.to_root(ghastly_hist, "bh_root_hist")

Btw, do you think aghast should support double or float storage, or we should add some tests for boost-histogram histograms (rather than just ROOT histograms) at "test_root.py"?

@jpivarski
Copy link
Member

Aghast's version of "storage" is a buffer, defined in the Flatbuffers specification here:

///////////////////////////////////////////////// buffers
enum DType: byte {
dtype_none = 0,
dtype_bool = 1,
dtype_int8 = 2,
dtype_uint8 = 3,
dtype_int16 = 4,
dtype_uint16 = 5,
dtype_int32 = 6,
dtype_uint32 = 7,
dtype_int64 = 8,
dtype_uint64 = 9,
dtype_float32 = 10,
dtype_float64 = 11
}
enum Endianness: byte {
little_endian = 0,
big_endian = 1,
}
enum DimensionOrder: byte {
c_order = 0,
fortran_order = 1
}
enum Filter: uint {
filter_none = 0,
filter_gzip = 1,
filter_lzma = 2,
filter_lz4 = 3
// not just compression; any 1-to-1 buffer transformation
}
struct Slice {
start: long;
stop: long;
step: int;
has_start: bool;
has_stop: bool;
has_step: bool;
}
enum ExternalSource: uint {
external_memory = 0,
external_samefile = 1,
external_file = 2,
external_url = 3
}
table RawInlineBuffer {
buffer: [ubyte] (required);
}
table RawExternalBuffer {
pointer: ulong;
numbytes: ulong;
external_source: ExternalSource = external_memory;
location: string;
}
table InterpretedInlineBuffer {
buffer: [ubyte] (required);
filters: [Filter];
postfilter_slice: Slice;
dtype: DType = dtype_none;
endianness: Endianness = little_endian;
dimension_order: DimensionOrder = c_order;
}
table InterpretedInlineInt64Buffer {
buffer: [ubyte] (required);
}
table InterpretedInlineFloat64Buffer {
buffer: [ubyte] (required);
}
table InterpretedExternalBuffer {
pointer: ulong;
numbytes: ulong;
external_source: ExternalSource = external_memory;
filters: [Filter];
postfilter_slice: Slice;
dtype: DType = dtype_none;
endianness: Endianness = little_endian;
dimension_order: DimensionOrder = c_order;
location: string;
}
union RawBuffer {
RawInlineBuffer,
RawExternalBuffer
}
union InterpretedBuffer {
InterpretedInlineBuffer,
InterpretedInlineInt64Buffer,
InterpretedInlineFloat64Buffer,
InterpretedExternalBuffer
}

There are simplified versions for int64 and float64 and maybe that's what you're seeing:

table InterpretedInlineInt64Buffer {
buffer: [ubyte] (required);
}
table InterpretedInlineFloat64Buffer {
buffer: [ubyte] (required);
}

But these complicating optimizations should go away when the Flatbuffers part is lowered into C++. (Right now, it's dominated by performance issues in Python.)

@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 28, 2020

Yep, the boost-histogram is thinking about to deprecate the 'float' storage in the PR I mentioned above, so can this be a problem?

these complicating optimizations should go away when the Flatbuffers part is lowered into C++.

I think we don't need to make some changes for aghast now, the bh.Histogram will be converted nicely once 'float' is not allowed in boost-histogram anymore.

Actually, it's not aghast's mistakes

If you really want to tolerant 'float' storage for now, maybe we only need to judge and add .asarray(int) to those broken in to_root().

@jpivarski
Copy link
Member

Aghast has to be able to express a superset of what each library is capable of—it can be able to express more. If boost-histogram discontinues float32, then we just don't need to convert that type anymore. If it's being used to convert to a ROOT TH1F (float32 storage), then there would just be an upcast or downcast.

@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 28, 2020

@jpivarski I just realized that THIS IS A BUG because boost_histogram should be able to support float storage (the issue and PR I mentioned are to solve float to int by default). Anyway, I have fixed it (not only for sumw2 but also unweighted). 😊

jpivarski added a commit that referenced this issue Mar 28, 2020
@LovelyBuggies
Copy link
Collaborator Author

Merged. This issue has completed its mission!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants