Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata regarding type of ensemble #299

Open
anders-kiaer opened this issue Jan 23, 2023 · 13 comments
Open

Metadata regarding type of ensemble #299

anders-kiaer opened this issue Jan 23, 2023 · 13 comments

Comments

@anders-kiaer
Copy link
Collaborator

E.g. is it a

  • history matching iteration ensemble (iter-0, ... iter-3...)
  • prediction ensemble
  • sensitivity ensemble

The "ensemble type" (better terminology might/probably exist) will be used by clients in order to filter out ensembles not relevant in a given analysis dashboard/pattern.

@anders-kiaer
Copy link
Collaborator Author

A bit more complex perhaps, but it would also be very useful to have metadata answering e.g. which history matching ensemble is a given prediction ensemble based on.

@anders-kiaer
Copy link
Collaborator Author

anders-kiaer commented Jan 24, 2023

Some user stories on parent-child-linking of ensembles:

  • You have a prediction run (which is a RESTART from an AHM run), and want to know the quality of the AHM-ensemble it was based on. With parent-child-linking metadata clients downstream can use this information to give the user this insight.
  • You have a prediction run (which is a RESTART from an AHM run), and as a user you want the time series and/or 3D grid to extend the whole time axis (history + prediction) and not just prediction. Or calculate recovery factor (produced volume vs. initial volumes). With parent-child-linking metadata clients downstream can fetch data from both ensembles (history and prediction) and combine them in the presentation.
  • You could have a "base prediction run" (e.g. going to 2030), and from there you have multiple different prediction scenarios reflecting different decisions to be made by the asset. If clients could e.g. query "which ensembles uses this ensemble as parent" that could be used to present a list of ensembles (future scenarios) based on the same base ensemble.
  • A bit related to the above point: Parent-child linking could intuitively be shown as a tree, where the leaves typically would be different scenarios. This would make it easier for the user to navigate / understand the relationship between different simulation ensembles (giving more insight to the user compared with e.g. a flat dropdown list of ensembles).

@perolavsvendsen
Copy link
Member

This links to #291 perhaps

@alifbe
Copy link
Contributor

alifbe commented Feb 6, 2023

I have tried to run drogon_pred_ref.ert and found that the results for pred_ref ensemble are uploaded to iter-0. From discussion with @perolavsvendsen apparently, dataio assumed iter-0 as default ensemble name and used them to generate ID.

See case 01_drogon_ahm_sumo in sumo prod

@perolavsvendsen
Copy link
Member

Yes, we currently don't get any information on type of ensembles within a case, and as far as I know no such definition exists either.

This obviously maps to ERT quite fast. If not, we would have to do something rule-based based on iteration names.

Is there a standard (convention) on iteration names?

@anders-kiaer
Copy link
Collaborator Author

No strict/official rules for iteration names to my knowledge (AHM runs typically go iter-0, iter-1, ..., iter-x and prediction cases often has pred in the name, but no guarantee iter and pred are used as substrings).

I agree this information/metadata should come from ERT. Knowing the workflow method used by ERT for generating the ensemble would probably be a good start (https://ert.readthedocs.io/en/latest/reference/running_ert.html - ensemble_smoother, es_mda, iterative_ensemble_smoother, ensemble_experiment). Can/is this information exposed in some way @sondreso?

Within an assisted history matching run it would also be useful for clients of the data sets to know which ensembles are part of the same assisted history matching run (i.e. iter-0 = prior, iter-{max} = posterior, and also the order of ensembles inbetween representing gradual updates from prior to posterior...).

Pinging @asnyv in case there are details / use cases you want to mention I haven't.

@perolavsvendsen
Copy link
Member

#305

@perolavsvendsen
Copy link
Member

equinor/ert#2359

@asnyv
Copy link
Collaborator

asnyv commented Feb 7, 2023

Think @anders-kiaer has covered most of it, but for predictions I think we (at least me) often skip the term "pred" and use some more or less descriptive name dependent on whatever we are simulating. From a technical perspective: completely arbitrary.

Also: for a while I think it was fairly common to have a structure like:

History matching:
some_ahm_case/realization-x/iter-y

Predictions:

some_prediction_scenario/realization-x
some_other_prediction_scenario/realization-x

Now it seems like more are going towards a structure where the prediction case is placed on the "iteration level" like you mention, so:

some_ahm_case/realization-x/iter-y
some_ahm_case/realization-x/some_prediction_scenario
some_ahm_case/realization-x/some_other_prediction_scenario

The advantage with the latter is that it is clearer what the basis for the prediction is, whilst the advantage of the first one is that it is easier to see all cases in a folder structure + it is a more natural structure for models without any history. But many of the models without history have now ended with the structure: some_prediction_scenario/realization-x/iter-0 as a more or less de facto standard, but I can't really say why 😅 My guess is that someone found it convenient as they could then reuse something they had hard-coded for AHM.

@perolavsvendsen
Copy link
Member

Suggest we first try to accurately reflect the name in the outgoing metadata, not default to iter-0 if it looks strange. The more challenging bit is probably the iteration id, which in turn maps to the iteration uuid.

I assume that ERT internally has an iteration ID which we cannot know if the iteration is called something other than e.g. iter-0. Every single instance of fmu-dataio will (currently) look at the file structure to determine what iteration we are in. The code has been structured in such a way that it should be possible to get this from ERT when it is available, but so far it is not.

An alternative is to take away all logic placed on the iteration id and only use the iteration name.

@perolavsvendsen
Copy link
Member

Drafting a possible PR, quick and dirty 👆. This needs discussions, as I am a bit unsure of the consequences. But possibly, the iteration name is a better option (outside the ERT context) than iteration id. E.g. the name is used as an identifier, not the ID. (This is similar to the current practice for the case.name, where no ID exists.

For the third similar object, realization, we are more dependent on the ID I guess.

@perolavsvendsen
Copy link
Member

This is also a feature request from SSDL, ref discussions with @bous251

@perolavsvendsen
Copy link
Member

#368 could possibly be relevant for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants