-
Notifications
You must be signed in to change notification settings - Fork 67
uproot 4 dev-branch? #461
Comments
Work on uproot 4 hasn't started, but a good way to get ready for it would be to use uproot as-is to make awkward0 arrays, then convert those to awkward1 with a conversion function that's almost finished: scikit-hep/awkward#135. A bigger trouble is that the Awkward 1.0 deployment procedure is currently broken (completely revamped last weekend; not quite recovered yet), so you'll have to wait for the above PR, which is minutes away from being done, and the deployment procedure, which is a bit more open-ended. My intention is to get that fixed today. Or you can install from source, which doesn't rely on the deployment procedure working. |
But you know, I do appreciate the offer and I'll point you to the Uproot 4 branch when I start working on it. According to the schedule, I should start working on it one month from now: April. March is for finishing Awkward and helping out with vector, the replacement for uproot-methods/TLorentzVector. On Uproot, the necessary work will be replacing Awkward0 array generation with Awkward1 array generation, and Awkward arrays will become an "extras" dependency (not strictly required, but highly recommended). Thus, a "base" Uproot installation would only be able to serve NumPy arrays, just as a "base" installation can only decompress GZIP and LZMA (in Python 3), not LZ4 or ZSTD, since those packages are considered "extras." The recommended Uproot installation procedure would become pip install uproot[all] with non- I should probably do that surgery, but the Uproot 3 → 4 transition opens the possibility for other, minor compatibility-breaking changes. One thing is that Python 3 strings, rather than bytestrings, will be presented to the user everywhere (assuming utf-8 encoding with "surrogateescape," which doesn't fail on wrong encodings). If you'd like to work on this or have other, minor interface changes, let's talk. Apart from the surgery of replacing Awkward0 with Awkward1, I'd like the main users of Uproot to contribute to the interface, since they know best what they want it to look like. |
Thanks for the roadmap summary, sounds really promising. I will have a look at the conversion function (scikit-hep/awkward#135) but today I already successfully tossed those nasty object-type arrays into In [15]: import uproot
In [16]: f = uproot.open("tests/samples/aanet_v2.0.0.root")
In [17]: arr = f['E']['Evt']['trks']['trks.fitinf'].array()
In [18]: arr[:, 0]
Out[18]: <ChunkedArray [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] at 0x00011a4a6390>
In [19]: awr = ak.Array(arr)
In [20]: awr[:, 0]
Out[20]: <Array [[0.00496, 0.00342, ... 1.84e+03, 54]] type='10 * var * float64'> I am wondering though how uproot 4 will deal with lazyarrays in combination with ...another question: do you plan to drop Python 2 support in uproot 4? I guess it's about time 😉 So far I am very happy with |
For lazyarrays: Awkward 1 needs ChunkedArrays and VirtualArrays to behave like Awkward 0. I've had a lot of conversions with @nsmith- about how to do that properly. For Python 2 support: actually, with Uproot 4 not strictly depending on Awkward, we can relax its Python constraint to 2.6! I won't be advertising that, though. This would make it quietly work on all sorts of ancient systems, the will only officially support Python 2.7 and recent versions of Python 3. The latter is driven by Awkward's dependence on NumPy 1.13.1, which in turn has a minimum Python version of 2.7. we can only guarantee feature-completeness in Python 3, but for a limited set of features, with good error messages you tried to go beyond that set, it won't break in old versions. |
Btw. (sorry for the late answer) my question regarding Python 2 was more like: I'd definitely ditch Python 2 support in favour of less maintenance work. I was expecting that uproot4 will only support 3.5+. Don't you think that with this major leap it would be a good idea to get rid of legacy dependencies or is this a project requirement (probably driven by the use-cases in HEP)? In our collaboration (which is more astroparticle than HEP) we successfully managed to get rid of Python 2 dependencies, but we are only a few hundred people and use Python mostly for high level analysis. |
On the Python 3 side, we'll be picking a rather high minimum, perhaps 3.5, like you said. Early Python 3 was volatile and hard to support. But supporting Python 2.7 and even 2.6 is just a matter of not using certain idioms. Similarly, there's very little that we need from modern NumPy versions—Awkward needed NEP13, which comes in NumPy 1.13.1, but Awkward 1 is a bigger dependency and so it will be optional for Uproot (some users don't have jagged arrays). For everything else, we could go all the way back to NumPy 1.8 or so. There are circumstances where people need to work in such old versions (weird circumstances, like a DAQ machine not connected to the network or Python running on an iPad), and I'm only considering it because it's so easy—very little maintainence burden. It's because Uproot without Awkward depends on so little that the minimum Python and NumPy versions can be pushed back so far. I would never recommend a user doing analysis to use Python 2, and old NumPy certainly could have issues like the memory leak that you mentioned. But this is the difference between application and library: in an application, use the latest versions to get the best software; in a library, depend on as little as you can to get the job done to avoid putting unnecessary constraints on applications. After all, somebody's going to want to open ROOT files on their iPad. |
Yes I fully agree with you, thanks for sharing these thoughts. At this very moment the most interesting feature at least for us is fancy indexing with nested data. We are trying our best to build wrapper classes around Uproot to make the user interface behave like So, it remains unclear how this will be integrated into Uproot; we can't wait to try it out 😄 Looking forward for Uproot 4! (and of course Awkward 1 😉) |
Nice! |
Sorry for my ignorance, maybe this is already mentioned somewhere but I couldn't find it. Is there a dev-branch for uproot 4? I am dealing with some nested awkward arrays and went through the awkward-1 resources (also https://github.com/scikit-hep/awkward-1.0/blob/master/docs/demos/2020-01-22-numba-demo-EVALUATED.ipynb which is impressive) and it seem that this will solve all my issues automatically (see scikit-hep/awkward-0.x#229).
Anyways, I would be happy to try uproot 4 (alpha) and maybe also contribute if possible.
The text was updated successfully, but these errors were encountered: