# Root and Parquet from the uproot backend

This is originally from a message posted by Alex to the iris-hep slack.

Notes:

* This requires the newest version of the `servicex` client (so 3.0 or better). See the `requirements-new.txt` for the packages installed to run this notebook page.

In [1]:
from servicex.dataset_identifier import FileListDataset
from servicex.models import ResultFormat
from servicex.servicex_client import ServiceXClient

In [2]:
sx = ServiceXClient(backend="uproot")
dataset_id = FileListDataset("root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root")

First problem the code in the slack message invokes the `uproot` [processing bug](https://github.com/iris-hep/func_adl_uproot/issues/112). Basically - until the bug fix goes into production, if you don't specify a tree name, and your file contains a ":" then `uproot` thinks what is after the colon is the tree name. The filename below does not have a colon after the filename! But, behind your back, `ServiceX` is inserting an XCache URL that contains a port number, and that causes bad stuff.

To work around this, we need to explicitly specify the tree name by using the `set_tree` method, as can be seen in the next cell.

* I figured out what the tree name was by giving it a bogus name and then expanded the `HARD FAILURE` message on the logging dashboard.

In [3]:
ds = (sx
      .func_adl_dataset(dataset_id, codegen="uproot", title="Root", result_format="parquet")
      .set_tree("mini")
)

First, lets try the query as `parquet` files:

In [4]:
files_parquet = ds.Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000).as_files()

Output()

In [5]:
files_parquet.file_list

['C:/Users/gordo/AppData/Local/Temp/281642cf-1566-40c1-81c4-9e75b546a4ec/root___xcache.af.uchicago.edu_1094__root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_MC_mc_345060.ggH125_ZZ4lep.4lep.root.parquet']

And then as `root-file`'s

In [6]:
import logging
logging.basicConfig(level=logging.INFO)
# BUG: Turning on logging to INFO level or DEBUG level does not show you the qastle and total transform request sent to backend. This is a debugging feature.

In [7]:
ds = (sx
      .func_adl_dataset(dataset_id, codegen="uproot", title="Root", result_format="root-file", ignore_cache = True)
      .set_tree("mini")
)
files_root = ds.Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000).as_files()

Output()

INFO:httpx:HTTP Request: POST https://servicex.af.uchicago.edu//token/refresh "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: POST https://servicex.af.uchicago.edu//servicex/transformation "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: POST https://servicex.af.uchicago.edu//token/refresh "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: GET https://servicex.af.uchicago.edu//servicex/transformation/2d92bd71-4a4d-45ed-985c-36cbfaf24c8f "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: POST https://servicex.af.uchicago.edu//token/refresh "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: GET https://servicex.af.uchicago.edu//servicex/transformation/2d92bd71-4a4d-45ed-985c-36cbfaf24c8f "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: POST https://servicex.af.uchicago.edu//token/refresh "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: GET https://servicex.af.uchicago.edu//servicex/transformation/2d92bd71-4a4d-45ed-985c-36cbfaf24c8f "HTTP/1.1 200 OK"


In [8]:
files_root.file_list

['C:/Users/gordo/AppData/Local/Temp/2d92bd71-4a4d-45ed-985c-36cbfaf24c8f/root___xcache.af.uchicago.edu_1094__root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_MC_mc_345060.ggH125_ZZ4lep.4lep.root']

Lets see what the `qastle` looks like. We should be able to do this with the `as_qastle` method:

In [9]:
files_query= ds.Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000)
print(f"qastle: {files_query.as_qastle()}")

qastle: None


And this is a bug - the `as_qastle` should return what we expect.

But we can go a slightly different round to get the `qastle`:

In [10]:
files_query.generate_selection_string()

"(call Where (call Select (call EventDataset 'bogus.root' 'mini') (lambda (list e) (dict (list 'lep_pt') (list (subscript e 'lep_pt'))))) (lambda (list e) (> (subscript e 'lep_pt') 1000)))"

This contains none of the "interesting" information about root code formats, etc., in it. But that is in the `TransformRequest`, which is sent unadulterated up to the server:

In [11]:
files_query.transform_request

TransformRequest(title='Root', did=None, file_list=['root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root'], selection="(call Where (call Select (call EventDataset 'bogus.root' 'mini') (lambda (list e) (dict (list 'lep_pt') (list (subscript e 'lep_pt'))))) (lambda (list e) (> (subscript e 'lep_pt') 1000)))", image=None, codegen='uproot', tree_name=None, result_destination=<ResultDestination.object_store: 'object-store'>, result_format=<ResultFormat.root_file: 'root-file'>)

## Package list

In [12]:
!pip list

Package                   Version
------------------------- ---------
aiofile                   3.8.8
aiohttp                   3.8.5
aiosignal                 1.3.1
anyio                     4.0.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.0
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.1.0
Babel                     2.13.0
backcall                  0.2.0
beautifulsoup4            4.12.2
bleach                    6.1.0
cachetools                5.3.1
caio                      0.9.13
certifi                   2023.7.22
cffi                      1.16.0
charset-normalizer        3.3.0
click                     8.1.7
colorama                  0.4.6
comm                      0.1.4
contourpy                 1.1.1
cycler                    0.12.0
debugpy                   1.8.0
decorator                 5.1.1
defusedxml                0.7.1
executing             


[notice] A new release of pip available: 22.3.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip
