Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Error while readin AOD file #16

Closed
vkuznet opened this issue Nov 4, 2017 · 4 comments
Closed

Error while readin AOD file #16

vkuznet opened this issue Nov 4, 2017 · 4 comments

Comments

@vkuznet
Copy link

vkuznet commented Nov 4, 2017

Jim, now I'm trying to read AOD file
/afs/cern.ch/user/v/valya/public/C84930B2-7C55-E711-B915-02163E014722.root
and it fails right away:

    for branchname in tree.arrays().keys():
        print(branchname)

gives

Traceback (most recent call last):
  File "vk_test.py", line 38, in <module>
    branchNames(eTree)
  File "vk_test.py", line 24, in branchNames
    for branchname in tree.arrays().keys():
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 458, in arrays
    outi, res = branch.array(dtype, executor, False)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 1427, in array
    return TBranch.array(self, dtype, executor, block)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 1182, in array
    out[start:end] = self._basket(i, parallel=False)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 857, in _basket
    self._basketwalkers[i]._evaluate(parallel)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/_walker/lazyarraywalker.py", line 54, in _evaluate
    string = self._original_function(walker.readbytes(length))
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 84, in <lambda>
    return lambda x: zlib_decompress(x[9:])
error: Error -3 while decompressing data: incorrect header check
@jpivarski
Copy link
Member

Well, it's not quite immediate because you've asked uproot to interpret and convert all arrays in the tree (tree.arrays()), then only return their names (.keys()), which could be accomplished without all the heavy calculations by just tree.allbranchnames.

However, it's a real bug. I've looked at this in both versions and as it scans through branches, reading them all out, it encounters this one:

GlobalObjectMapRecord_hltGtStage2ObjectMap__HLT.obj.m_gtObjectMap.m_algoBitNumber

which somehow is failing to decompress. The branch has the same compression parameters as the file (in principle, they can be different, and I haven't handled that yet), and it's just zlib-7.

Seeking to this point in the file, it has the right kind of header (starting with "XZ"), which is supposed to be 9 bytes long, followed by zlib data. Python's zlib then complains about the format of the data.

01613360  58 5a 00 28 02 00 00 1c  00 fd 37 7a 58 5a 00 00  |XZ.(......7zXZ..|
01613400  01 69 22 de 36 02 00 21  01 00 00 00 00 37 27 97  |.i".6..!.....7'.|
01613420  d6 e0 1b ff 01 ec 5d 00  00 60 04 08 ca 06 5b f5  |......]..`....[.|
01613440  fa 66 ca fc 2c 3c 41 57  c9 09 f4 5f f9 55 48 b6  |.f..,<AW..._.UH.|
01613460  73 22 9b fe 54 36 56 93  d7 91 c5 94 58 f5 b0 d7  |s"..T6V.....X...|
01613500  c9 03 c0 fd dc f0 9e 3d  2a 61 2e 81 2f 2e 1c 2d  |.......=*a../..-|
01613520  42 88 81 7b 45 66 a2 bb  69 f2 06 b8 f7 bb bc 1d  |B..{Ef..i.......|
01613540  41 24 7a ec 6d fc c1 08  0f 48 8b 88 11 b2 0c 76  |A$z.m....H.....v|
01613560  c0 c1 87 6b bb b5 25 16  29 da 87 3d 32 e7 24 25  |...k..%.)..=2.$%|
01613600  69 7f 08 81 a4 cd a3 f3  7f c5 be 3c 2f 6a 49 13  |i..........</jI.|

It looks wrong to me, too: after the 9-byte header, there's only 3 bytes of stuff before another "XZ". That could be part of the compressed data, but it would be a weird coincidence for the compressed data to also have an "XZ". It looks like another block of compressed data. Could the compressed data really be just 3 bytes long? Maybe Python's library has a problem with that (maybe the Python library would rather the data be padded...).

According to the basket header, that compressed data is supposed to be 561 bytes compressed and 7168 uncompressed, which makes me even more suspicious. I'll have to come back to this.

@jpivarski
Copy link
Member

This is one of those wake-you-up-in-the-middle-of-the-night things. It was staring me in the face. The file declared the compression to be zlib-7, but the two-character header of this compressed block is "XZ", which means LZMA. Apparently, when fCompress and that two-character header disagree, the two-character header has precedence.

So instead of using fCompress to determine which compression algorithm to use, we should use the first two bytes of the 9-byte compressed block header. Here's where it is in C++ ROOT:

https://github.com/root-project/root/blob/5b7d9393c1c0c242be452510ac8ddf08bd492d40/core/zip/src/RZip.cxx#L348

They check is_valid_header_zlib, is_valid_header_lzma, is_valid_header_lz4 to determine the compression algorithm on the spot, rather than the file's or branch's own fCompress. I would have thought that fCompress ought to agree, but okay.

Now we can read in all those arrays, including the LZMA ones. Check out version 1.6.2 from GitHub.

python -i -c 'import uproot; t = uproot.open("/afs/cern.ch/user/v/valya/public/C84930B2-7C55-E711-B915-02163E014722.root")["Events"]; print(t.arrays())'

@vkuznet
Copy link
Author

vkuznet commented Nov 4, 2017 via email

@jpivarski
Copy link
Member

I had committed it, but it wasn't a release. I just uploaded it to PyPI and made a formal GitHub release. (I always do these two things together to ensure that the version numbers are in sync: it's easy to botch a version number in PyPI.)

As for explaining the problem, I was just thinking out loud. This detail (whether to trust fCompress or the compressed buffer header when the two are in conflict) is exactly the sort of reason we need multiple implementations of ROOT I/O, to spread the knowledge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants