Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

root file, written with uproot, number of events in tree issue #359

Open
marinang opened this issue Oct 1, 2019 · 5 comments
Open

root file, written with uproot, number of events in tree issue #359

marinang opened this issue Oct 1, 2019 · 5 comments

Comments

@marinang
Copy link
Member

marinang commented Oct 1, 2019

Hi,

A similar problem as in #345 but this time regarding the number of events in the tree that I write in a file, with the following code:

    with uproot.recreate(file_out) as f:
        f["DecayTree"] = uproot.newtree({b:np.float32 for b in allbranches}, flushsize="5 MB")
        
        for i, d in enumerate(datasets):
            f["DecayTree"].extend(dict(d))

When I read back the file, there are more events in the tree than I wrote. Trying again by reducing the flushsize decrease the number of events read, but it is still not the same the number of event that I wrote ...

@marinang
Copy link
Member Author

marinang commented Oct 1, 2019

This simple reproduces my problem:

In [1]: import uproot

In [2]: import numpy as np

In [3]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({"a": "float32", "b": "float32", "c": "float32", "d": "float32"})
   ...:
   ...:     for i in range(5):
   ...:         f["t"].extend({"a": np.random.normal(0, 1, 1000), "b":np.random.normal(0, 1, 1000),
   ...:                        "c": np.random.normal(0, 1, 1000), "d": np.random.normal(0, 1, 1000)})
   ...:

In [4]: uproot.numentries("example.root", "t")
Out[4]: 20000

However changing the flush size doesn't seem to do anything here.

@jpivarski
Copy link
Member

Does this only happen with extend or does it also happen with newbasket (the low-level interface)?
Thanks!

@marinang
Copy link
Member Author

marinang commented Oct 1, 2019

This example does reproduce the effect of the flush size.

In [1]: import uproot

In [2]: import numpy as np

In [3]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="10 MB")
   ...:
   ...:     for j in range(5):
   ...:         f["t"].extend({f"branch_{i}": np.random.normal(0, 1, 5000) for i in range(50)})
   ...:

In [4]: uproot.numentries("example.root", "t")
Out[4]: 100000

In [5]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="2 MB")
   ...:
   ...:     for j in range(5):
   ...:         f["t"].extend({f"branch_{i}": np.random.normal(0, 1, 5000) for i in range(50)})
   ...:
   ...:

In [6]: uproot.numentries("example.root", "t")
Out[6]: 58056

@jpivarski I will give it a try.

@marinang
Copy link
Member Author

marinang commented Oct 1, 2019

Thanks @jpivarski with the low-level interface it works (I tend to forget about it 🤷‍♂️).

In [7]: with uproot.recreate("example.root") as f:
   ...:     f["t"] = uproot.newtree({f"branch_{i}": np.float32 for i in range(50)}, flushsize="2 MB")
   ...:
   ...:     for j in range(5):
   ...:         for i in range(50):
   ...:             f["t"][f"branch_{i}"].newbasket(np.random.normal(0, 1, 5000))
   ...:
   ...:

In [8]: uproot.numentries("example.root", "t")
Out[8]: 25000

@jpivarski
Copy link
Member

Ultimately, both should work, but we're trying to factorize bugs in low-level writing from bugs in the flushing logic. It looks like this one is in the flushing logic.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants