Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reporting features #149

Closed
5 tasks done
gidden opened this issue Jan 16, 2019 · 8 comments
Closed
5 tasks done

Improve reporting features #149

gidden opened this issue Jan 16, 2019 · 8 comments
Milestone

Comments

@gidden
Copy link
Member

gidden commented Jan 16, 2019

It should be possible to “report” or “post-process” a message_ix.Scenario (given a sufficient amount of configuration metadata) to generate output (IAMC-compliant pd.DataFrame or file) that can be directly submitted to IIASA databases for either single- or multi-model assessments.

Tasks

@gidden gidden added this to the Reporting Revamp milestone Jan 16, 2019
@OFR-IIASA
Copy link
Contributor

The first document I would like to add was originally intended as part of the message_ix documentation. This document explains the mathematical operations carried out for currently reported variables.
https://iiasahub.sharepoint.com/:u:/s/ene/MESSAGEix/Ea0QmEl4TB5BqxeIyzERqFIBkq54eBAeFDwq92xR9nTe7A?e=eRCodt

@OFR-IIASA
Copy link
Contributor

OFR-IIASA commented Jan 16, 2019

The second document, which may not reflect the most current version of the reporting, shows for every variable calculated in the model the exact calculation process and therefore provides a detailed overview of required operations.
https://iiasahub.sharepoint.com/:x:/s/ene/MESSAGEix/ERxvb_nkeTZGkD6iXpgznfoBTBxiqui1L50Gs4WqqyjFxQ?e=Maf3HA

@OFR-IIASA
Copy link
Contributor

There are several important features required for the current reporting. Please feel free to add features.

  • specify single technologies with filters on modes, commodities, etc.
  • special treatment of global variables: dont report, calc. mean, max, min from regions, weighted average based on another variable.
  • perform additional operations on results: multiplication with factors (see for example investments or prices)
  • unit conversion
  • use data stored as timeseries: e.g. factors for f-gases; globiom reporting can be moved to such an application.
  • calculation of temporary variables which are not reported
  • account for the fact that not all technologies defined in the reporting are also part of the model. i.e. reporting for the global model would be the same for all three SSPs, but not all the technologies in SSP1 are also included in SSP3 (resources for example)
  • special aggregates: non-hierarchy variables (sums across variables from different hierarchy levels)
  • reporting of historical data: pre-firstmodelyear
  • cumulating data over time: e.g. for variable cumulative resource extraction
  • the variable tree for reporting powerplant parameters is always the same (e.g. variable o&m costs, capital costs etc.) but it would be very tiresome to define these hierarchies multiple for each of the variables individually.

@khaeru
Copy link
Member

khaeru commented Jan 17, 2019

[Note for future readers that there is a separate, non-public Google Doc containing requirements discussion.]

@khaeru
Copy link
Member

khaeru commented Jan 17, 2019

I left a comment on #150, but this comment also responds to the discussion in #151. Hopefully this is the right place for it 🤷‍♂️

Other software efforts (dask (detailed example), TensorFlow, many others) use the pattern of a graph in which:

  • nodes represent tasks or atomic operations.
  • edges represent data.

#150 and #151 invert these, so that nodes are data and edges are (sort of) tasks. I don't see that it's necessary to invent a new pattern, and in the process cut ourselves off from libraries that would simplify the codebase/slow the accumulation of technical debt. Everything discussed so far can be expressed in the common pattern as tasks:

  • Perform basic arithmetic: addition, multiplication, division, of two or more values.
    • Note that "disaggregation" is just array multiplication, yielding a result of higher dimension.
  • Take a simple sum over one or more dimension(s) of an array (input: which dimension(s)).
    • This covers "cumulative" anything.
  • Take a weighted sum (input: the weights).
  • Take a sum across disparate items ("special aggregates" per @OFR-IIASA) (input: which items to include).
  • Other low-level/numpy-like operations, e.g. min, max, mean, median.
  • Convert units (can introspect units from inputs, or take separate inputs describing the source/target units).

In both the dask and tf semantics, even the basic action of yielding a fixed value (of 0 or more dimensions) is a task/operation/node. In the present discussion, that covers:

  • Yield values of specific GAMS objects (more generally, retrieving any value from the ixmp API).
  • Yield auxiliary/non-model data, e.g. weights, conversion factors, intensities not present in the model, historical data, etc.
  • Yield configuration values (i.e. "list of items to sum into a certain aggregate").

Non-mathematical manipulations of data are also tasks, e.g.:

  • Rename variables, e.g. from MESSAGEix internal names to IAMC names or others.

Using the common pattern, almost all of the requirements can be met by defining an exhaustive collection of tasks, and then by composing and manipulating graphs. We would provide both low- and high-level shorthands for such manipulation, e.g.:

  • duplicating structures (per the comment above about "powerplant parameters").
  • reading structures from file.
  • using user-provided tasks (= computations).

Note in particular the print_and_return task in the dask example linked above. We would define reports as tasks that each take specific other data as input, then format them to an expected return value (e.g. pyam.IamDataFrame or something else). By requesting the report, the computation of data that it depends on it triggered. The user can then write the return value to file formats of choice.

@khaeru khaeru changed the title Reporting Revamp Overview Improve reporting features Mar 1, 2019
@khaeru
Copy link
Member

khaeru commented Mar 1, 2019

I updated the description here to match the results of today's (2019-03-01) discussion. Further details are on the MESSAGEix OneNote folder for this date.

@khaeru khaeru modified the milestones: Reporting Revamp, 1.2.0 Mar 1, 2019
@khaeru
Copy link
Member

khaeru commented Mar 1, 2019

I've set the milestone for this issue to 1.2.0. Once #142 is merged, the milestone for this issue can be switched to 1.3.0 or 2.0 (whichever we'll target for the dask-based reporting).

@khaeru
Copy link
Member

khaeru commented Jun 25, 2019

Closing this as resolved by the experimental reporting modules in the ixmp 0.2 (just now including iiasa/ixmp#150) and message_ix 1.2 (later today, including #206) releases.

We can use separate, smaller issues to iterate on these features as needed.

@khaeru khaeru closed this as completed Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants