root file, written with uproot, number of events in tree issue #359

marinang · 2019-10-01T16:25:04Z

Hi,

A similar problem as in #345 but this time regarding the number of events in the tree that I write in a file, with the following code:

    with uproot.recreate(file_out) as f:
        f["DecayTree"] = uproot.newtree({b:np.float32 for b in allbranches}, flushsize="5 MB")
        
        for i, d in enumerate(datasets):
            f["DecayTree"].extend(dict(d))

When I read back the file, there are more events in the tree than I wrote. Trying again by reducing the flushsize decrease the number of events read, but it is still not the same the number of event that I wrote ...

marinang · 2019-10-01T16:25:38Z

This simple reproduces my problem:

In [1]: import uproot

In [2]: import numpy as np

In [3]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({"a": "float32", "b": "float32", "c": "float32", "d": "float32"})
   ...:
   ...:     for i in range(5):
   ...:         f["t"].extend({"a": np.random.normal(0, 1, 1000), "b":np.random.normal(0, 1, 1000),
   ...:                        "c": np.random.normal(0, 1, 1000), "d": np.random.normal(0, 1, 1000)})
   ...:

In [4]: uproot.numentries("example.root", "t")
Out[4]: 20000

However changing the flush size doesn't seem to do anything here.

jpivarski · 2019-10-01T16:34:09Z

Does this only happen with extend or does it also happen with newbasket (the low-level interface)?
Thanks!

marinang · 2019-10-01T16:35:02Z

This example does reproduce the effect of the flush size.

In [1]: import uproot

In [2]: import numpy as np

In [3]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="10 MB")
   ...:
   ...:     for j in range(5):
   ...:         f["t"].extend({f"branch_{i}": np.random.normal(0, 1, 5000) for i in range(50)})
   ...:

In [4]: uproot.numentries("example.root", "t")
Out[4]: 100000

In [5]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="2 MB")
   ...:
   ...:     for j in range(5):
   ...:         f["t"].extend({f"branch_{i}": np.random.normal(0, 1, 5000) for i in range(50)})
   ...:
   ...:

In [6]: uproot.numentries("example.root", "t")
Out[6]: 58056

@jpivarski I will give it a try.

marinang · 2019-10-01T16:39:42Z

Thanks @jpivarski with the low-level interface it works (I tend to forget about it 🤷‍♂️).

In [7]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="2 MB")
   ...:
   ...:     for j in range(5):
   ...:         for i in range(50):
   ...:             f["t"][f"branch_{i}"].newbasket(np.random.normal(0, 1, 5000))
   ...:
   ...:

In [8]: uproot.numentries("example.root", "t")
Out[8]: 25000

jpivarski · 2019-10-01T16:43:07Z

Ultimately, both should work, but we're trying to factorize bugs in low-level writing from bugs in the flushing logic. It looks like this one is in the flushing logic.

jpivarski added bug writing-improvements labels Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

root file, written with uproot, number of events in tree issue #359

root file, written with uproot, number of events in tree issue #359

marinang commented Oct 1, 2019

marinang commented Oct 1, 2019 •

edited

jpivarski commented Oct 1, 2019

marinang commented Oct 1, 2019

marinang commented Oct 1, 2019

jpivarski commented Oct 1, 2019

root file, written with uproot, number of events in tree issue #359

root file, written with uproot, number of events in tree issue #359

Comments

marinang commented Oct 1, 2019

marinang commented Oct 1, 2019 • edited

jpivarski commented Oct 1, 2019

marinang commented Oct 1, 2019

marinang commented Oct 1, 2019

jpivarski commented Oct 1, 2019

marinang commented Oct 1, 2019 •

edited