Skip to content
This repository was archived by the owner on Jan 9, 2023. It is now read-only.
This repository was archived by the owner on Jan 9, 2023. It is now read-only.

IndexError when read_root used with chunksize returns an empty iterator  #63

@fdesse

Description

@fdesse

Dear all,

I have been facing the following issue. Suppose you have a bunch of large root files, so you want to use chunksize. But sometimes, you want to apply a tight cut, such that for some files, you end up with no entries. In this case,

for df in read_root(myfile, key=myTree, where=tight_selection, chunksize=100000):
     # Do something

raises an IndexError: index 0 is out of bounds for axis 0 with size 0 because the iterator returned by read_root has length zero.

I'm not sure what is the best to change. I guess this is the part of read_root that has to be changed:

    if chunksize:
        tchain = ROOT.TChain(key)
        for path in paths:
            tchain.Add(path)
        n_entries = tchain.GetEntries()
        # XXX could explicitly clean up the opened TFiles with TChain::Reset

        def genchunks():
            current_index = 0
            for chunk in range(int(ceil(float(n_entries) / chunksize))):
                arr = root2array(paths, key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *args, **kwargs)
                if flatten:
                    arr = do_flatten(arr, flatten)
                yield convert_to_dataframe(arr, start_index=current_index)
                current_index += len(arr)
return genchunks()

I guess if n_entries == 0, one should do something special, but I'm not sure what's the best to do. Maybe return None ? In that case the user can do:

df _ list = read_root(myfile, key=myTree, where=tight_selection, chunksize=100000)

if ( df_list != None ):
    for df in :
         # Do something

?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions