Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output format revamped #123

Closed
alecandido opened this issue Oct 28, 2021 · 0 comments · Fixed by #133
Closed

Output format revamped #123

alecandido opened this issue Oct 28, 2021 · 0 comments · Fixed by #133
Labels
enhancement New feature or request

Comments

@alecandido
Copy link
Member

Since the success in eko new output format implementation NNPDF/eko#76 I propose to follow the same path here, and so to split the operator from the metadata.
Main reason supporting this is that we'll save a lot in loading and dumping time.

Output structure

Here the typical structure of output is a bit different from those in eko, and a bit more complicate (but still perfectly suitable).

An Output object is still basically a dictionary, e.g. with the following keys:

[ins] In [19]: out.keys()
Out[19]: dict_keys(['XSHERACC', 'interpolation_is_log', 'interpolation_polynomial_degree', 'interpolation_xgrid', 'pids', 'projectilePID'])

At the moment we need an input runcard in order to be able to load the observable, because we don't know otherwise which are the observables (e.g. XSHERACC) names in the output object.

There are two possible solutions:

  • we scope observables within a single entry obsvervables in the object instance
  • we keep as it is, but we add an observables entry with a list of names of observables

While every other value is just a scalar (at most a list/array of scalars), each observable is a very nested object.

Observables -> ESF structure

Each observable is a list of ESF, potentially of different length. In turn, each ESF has the same structure.

[ins] In [14]: e = out["XSHERACC"][0]

[ins] In [15]: len(out["XSHERACC"])
Out[15]: 42

[ins] In [16]: np.array(e.get_raw()["orders"][0]["values"]).shape
Out[16]: (14, 50)

[ins] In [17]: e.get_raw()["orders"][0].keys()
Out[17]: dict_keys(['order', 'values', 'errors'])

[ins] In [18]: e.get_raw().keys()
Out[18]: dict_keys(['x', 'Q2', 'nf', 'orders'])

Thus we can make it an array, what we need is:

  • one array per observable
  • first dimension runs over ESFs (so we need to store x, Q2, and nf somewhere else, i..e in metadata)
  • second dimension runs over orders; they are the same for each observable -> we need to store the list of orders
  • two other dimensions run over flavor and interpolation grid (both already stored, respectively as pids and interpolation_xgrid)
  • optionally one more dimension can run over [value, error]

Recap

We need:

  • a tar archive containing
  • metadata, i.e. all key-value pairs, but observables, to which we should add
    • the list of orders
    • x, Q2, and nf for each ESF in each observable (so one list per observable, made of 3-tuples)
  • dump one array per observable; it's to dump mandatory separately because they might not be uniform
@alecandido alecandido added the enhancement New feature or request label Dec 10, 2021
@alecandido alecandido mentioned this issue Feb 18, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant