Dear all,
I have been facing the following issue. Suppose you have a bunch of large root files, so you want to use chunksize. But sometimes, you want to apply a tight cut, such that for some files, you end up with no entries. In this case,
for df in read_root(myfile, key=myTree, where=tight_selection, chunksize=100000):
# Do something
raises an IndexError: index 0 is out of bounds for axis 0 with size 0 because the iterator returned by read_root has length zero.
I'm not sure what is the best to change. I guess this is the part of read_root that has to be changed:
if chunksize:
tchain = ROOT.TChain(key)
for path in paths:
tchain.Add(path)
n_entries = tchain.GetEntries()
# XXX could explicitly clean up the opened TFiles with TChain::Reset
def genchunks():
current_index = 0
for chunk in range(int(ceil(float(n_entries) / chunksize))):
arr = root2array(paths, key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *args, **kwargs)
if flatten:
arr = do_flatten(arr, flatten)
yield convert_to_dataframe(arr, start_index=current_index)
current_index += len(arr)
return genchunks()
I guess if n_entries == 0, one should do something special, but I'm not sure what's the best to do. Maybe return None ? In that case the user can do:
df _ list = read_root(myfile, key=myTree, where=tight_selection, chunksize=100000)
if ( df_list != None ):
for df in :
# Do something
?