-
Notifications
You must be signed in to change notification settings - Fork 67
Error while readin AOD file #16
Comments
Well, it's not quite immediate because you've asked uproot to interpret and convert all arrays in the tree ( However, it's a real bug. I've looked at this in both versions and as it scans through branches, reading them all out, it encounters this one:
which somehow is failing to decompress. The branch has the same compression parameters as the file (in principle, they can be different, and I haven't handled that yet), and it's just zlib-7. Seeking to this point in the file, it has the right kind of header (starting with "XZ"), which is supposed to be 9 bytes long, followed by zlib data. Python's zlib then complains about the format of the data.
It looks wrong to me, too: after the 9-byte header, there's only 3 bytes of stuff before another "XZ". That could be part of the compressed data, but it would be a weird coincidence for the compressed data to also have an "XZ". It looks like another block of compressed data. Could the compressed data really be just 3 bytes long? Maybe Python's library has a problem with that (maybe the Python library would rather the data be padded...). According to the basket header, that compressed data is supposed to be 561 bytes compressed and 7168 uncompressed, which makes me even more suspicious. I'll have to come back to this. |
This is one of those wake-you-up-in-the-middle-of-the-night things. It was staring me in the face. The file declared the compression to be zlib-7, but the two-character header of this compressed block is "XZ", which means LZMA. Apparently, when So instead of using They check Now we can read in all those arrays, including the LZMA ones. Check out version 1.6.2 from GitHub.
|
Jim,
thanks for looking into this and provide me details. I better understand the
logic behind the uproot now.
Could you please commit 1.6.2 tag since I don't see it, may be you tag the code
but didn't commit.
Thanks,
Valentin.
…On 0, Jim Pivarski ***@***.***> wrote:
This is one of those wake-you-up-in-the-middle-of-the-night things. It was staring me in the face. The file declared the compression to be zlib-7, but the two-character header of this compressed block is "XZ", which means LZMA. Apparently, when `fCompress` and that two-character header disagree, the two-character header has precedence.
So instead of using `fCompress` to determine which compression algorithm to use, we should use the first two bytes of the 9-byte compressed block header. Here's where it is in C++ ROOT:
[https://github.com/root-project/root/blob/5b7d9393c1c0c242be452510ac8ddf08bd492d40/core/zip/src/RZip.cxx#L348](https://github.com/root-project/root/blob/5b7d9393c1c0c242be452510ac8ddf08bd492d40/core/zip/src/RZip.cxx#L348)
They check `is_valid_header_zlib`, `is_valid_header_lzma`, `is_valid_header_lz4` to determine the compression algorithm on the spot, rather than the file's or branch's own `fCompress`. I would have thought that `fCompress` _ought_ to agree, but okay.
Now we can read in all those arrays, including the LZMA ones. Check out version 1.6.2 from GitHub.
```
python -i -c 'import uproot; t = uproot.open("/afs/cern.ch/user/v/valya/public/C84930B2-7C55-E711-B915-02163E014722.root")["Events"]; print(t.arrays())'
```
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#16 (comment)
|
I had committed it, but it wasn't a release. I just uploaded it to PyPI and made a formal GitHub release. (I always do these two things together to ensure that the version numbers are in sync: it's easy to botch a version number in PyPI.) As for explaining the problem, I was just thinking out loud. This detail (whether to trust |
Jim, now I'm trying to read AOD file
/afs/cern.ch/user/v/valya/public/C84930B2-7C55-E711-B915-02163E014722.root
and it fails right away:
gives
The text was updated successfully, but these errors were encountered: