# Root and Parquet from the uproot backend

This is originally from a message posted by Alex to the iris-hep slack.

Notes:

* This requires the newest version of the `servicex` client (so 3.0 or better). See the `requirements-new.txt` for the packages installed to run this notebook page.

In [1]:
from servicex.dataset_identifier import FileListDataset
from servicex.models import ResultFormat
from servicex.servicex_client import ServiceXClient

In [2]:
sx = ServiceXClient(backend="uproot")
dataset_id = FileListDataset("root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root")

First problem the code in the slack message invokes the `uproot` [processing bug](https://github.com/iris-hep/func_adl_uproot/issues/112). Basically - until the bug fix goes into production, if you don't specify a tree name, and your file contains a ":" then `uproot` thinks what is after the colon is the tree name. The filename below does not have a colon after the filename! But, behind your back, `ServiceX` is inserting an XCache URL that contains a port number, and that causes bad stuff.

To work around this, we need to explicitly specify the tree name by using the `set_tree` method, as can be seen in the next cell.

* I figured out what the tree name was by giving it a bogus name and then expanded the `HARD FAILURE` message on the logging dashboard.

In [7]:
ds = (sx
      .func_adl_dataset(dataset_id, codegen="uproot", title="Root", result_format="parquet")
      .set_tree("mini")
)

First, lets try the query as `parquet` files:

In [6]:
files_parquet = ds.Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000).as_files()

Output()

In [8]:
files_parquet.file_list

['C:/Users/gordo/AppData/Local/Temp/281642cf-1566-40c1-81c4-9e75b546a4ec/root___xcache.af.uchicago.edu_1094__root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_MC_mc_345060.ggH125_ZZ4lep.4lep.root.parquet']

And then as `root-file`'s

In [9]:
ds = (sx
      .func_adl_dataset(dataset_id, codegen="uproot", title="Root", result_format="root-file")
      .set_tree("mini")
)
files_root = ds.Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000).as_files()

Output()

In [11]:
files_root.file_list

['C:/Users/gordo/AppData/Local/Temp/33c00389-7859-4520-a501-f164c05f9900/root___xcache.af.uchicago.edu_1094__root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_MC_mc_345060.ggH125_ZZ4lep.4lep.root']

## Package list

In [None]:
!pip list