These are basic examples of how the `fmu-sumo` package can be used for consuming FMU results via the Python API.

To install fmu-sumo, use `pip install fmu-sumo`

We are always grateful for feedback and any question you might have. See the <a href="https://doc-sumo-doc-prod.radix.equinor.com/">Sumo Documentation</a> for how to get in touch.

In [None]:
from fmu.sumo.explorer import Explorer

In [None]:
%%html
<style>table {margin-left: 0 !important;}</style>

### Terminology

Hiearchy of FMU results:
| term | description | disk equivalent | examples of data in this context |
| -- | -- | -- | -- |
| model | The workflow that, when realized, created these results | /project/.../model/**revision**/ | |
| | | | |
| case | A collection of _ensembles_ | /scratch/field/user/**case** | Observations |
| ensemble | A collection of _realizations_ | /scratch/field/user/case/realization-*/**ensemble**/ | Aggregated results |
| realization | A collection of data objects produced by the same realization | /scratch/field/user/case/**realization**/ensemble/ | Exported results |
| | | | |
| entity | The array of objects that are representations of the same result | iter-\*/realization-\*/**share/results/type/myfile.xyz | |

For more details, refer to https://doc-sumo-doc-prod.radix.equinor.com/fmu_results

### Principles
The basic entry point for consuming FMU results is the _case_. On disk, we are used to the case concept as a folder with a name. A name, however, is not unique. Therefore, cases in terms of FMU results are assigned unique IDs allowing us to refer explicitly to a specific case regardless of its name.

When working with FMU results, we mostly operate within the bounds of a specific _ensemble_. Note, however, that FMU results exists both in the case context, the ensemble context and in the realization context.

### Initialize Explorer
Initializing the explorer will establish a connection to Sumo. If you don't have a valid access token stored, you will be taken through the authentication process.

In [None]:
sumo = Explorer()

In [None]:
# Get Drogon cases
myassetname = "Drogon"  # Must be a valid asset on Sumo
cases = sumo.cases.filter(asset=myassetname)

# Filter on user
cases = cases.filter(user="peesv")

# Iterate over results
print(f"\nFound {len(cases)} cases:")

for case in cases:
    print(f"Case: {case.name} | {case.uuid}")
    for ensemble in case.ensembles:
        print(f"    Ensemble: {ensemble.name} (n={len(ensemble.realizations)})")


# Option 1 (recommended): Get case by uuid
mycaseuuid = cases[0].uuid  # for sake of example
unique_cases = sumo.cases.filter(uuid=mycaseuuid)[0]

# Option 2: Get case by name (name is not guaranteed to be unique)
mycasename = cases[0].name  # for sake of example
named_cases = sumo.cases.filter(name=mycasename)

if len(named_cases) > 1:
    raise ValueError(f"More than one case exist with name {mycasename}, please use UUID.")
elif len(named_cases) == 0:
    raise ValueError(f"No case with name {mycasename} exist.")
else:
    mycase = named_cases[0]

In [None]:
# Select case
mycase = cases[-1] # for sake of the example we pick the last - you might want to select your case differently
print("Selected case: ")
print(f"  -> {mycase.name} ({mycase.uuid}) [{mycase.status}]")

# Select ensemble
myensemble = mycase.ensembles[-1] # for sake of the example - you might want to be more explitic on which ensemble you select
print("Selected ensemble:")
print(f"  -> {myensemble.name}")

At this point, we have identified our case and the ensemble we want to work with. Now we will consume data:

<div class="alert alert-block alert-info">
As opposed to on the disk, FMU results stored in Sumo are stored in a flat structure and we use metadata to identify individual data objects and arrays of data across an ensemble. The Explorer is fairly versitale, and backed by a powerful search engine. This means that there are usually multiple ways of getting to the same data, depending on your use case. In these examples, we show a few of these patterns.
</div>

### Search context
An important concept in the Explorer is the _SearchContext_. This is a collection of data, filtered according to your needs. Here are some examples of SearchContexts:


In [None]:
# all _surfaces_ in our ensemble
myensemble.surfaces

# all _tables_ in a case
myensemble.tables

## Filtering
The Search Context allows for further filtering:

In [None]:
# list all names available for these surfaces
myensemble.surfaces.names

# Filter on a specific name
myensemble.surfaces.filter(name="Therys Fm. Top")

# list all tagnames available for surfaces with this name
myensemble.surfaces.filter(name="Therys Fm. Top").tagnames

# Filter on a specific name and tagname
myensemble.surfaces.filter(name="Therys Fm. Top", tagname="DS_extract_geogrid")

# Filter on a specific name, tagname and realization
myensemble.surfaces.filter(name="Therys Fm. Top", tagname="DS_extract_geogrid", realization=0)

# In the Drogon example, this has now filtered down to exactly 1 surface object.
# This may not be the situation for your data, and more filters might be required.

Now we go from "finding data" to "using data". For the Surface example, we recommend using XTgeo:

In [None]:
mysurfs = myensemble.surfaces.filter(name="Therys Fm. Top", tagname="DS_extract_geogrid", realization=0)

if len(mysurfs) != 1:
    print(f"Warning! The collection has {len(mysurfs)} surfaces, which is not exactly 1 surface object.")

mysurf = mysurfs[0].to_regular_surface() # `mysurf` is now a RegularSurface object

mysurf.quickplot(title="A surface!")

### Statistical aggregations
A key feature of FMU is it's ability to represent uncertainty, by realizing the same result many times with different parameters. As opposed to classical model workflows that create a single, atomic instance of each result - FMU workflows produce results in the form of distributions.

To analyze such distributions, it is frequently useful to create statistical aggregations:

In [None]:
# Perform statistical aggregation on SurfaceCollection

surfs = myensemble.surfaces.filter(name="Therys Fm. Top", tagname="DS_extract_geogrid", realization=True)

print(f"There are {len(myensemble.realizations)} realizations, and the Search Context has {len(surfs)} individual surface objects.")

mean = surfs.aggregation(operation="mean")  # operations: max, mean, std, p10, p90, p50
mean.to_regular_surface().quickplot(title="Mean surface!")

Through the <a href="https://fmu-sumo.app.radix.equinor.com/">Sumo web interface</a>, you can also call bulk aggregation on all surfaces in your ensemble. When aggregated surfaces are made, you can also access them directly with a filter, as shown below:

In [None]:
mymean = myensemble.surfaces.filter(name="Therys Fm. Top", tagname="DS_extract_geogrid", aggregation="mean")
mymean[0].to_regular_surface().quickplot(title="Still the mean")

### Time filtration
The `TimeFilter` class can be used to construct a time filter which can be passed to the `SurfaceCollection.filter` method.

In [None]:
from fmu.sumo.explorer import TimeFilter, TimeType

# get surfaces with timestamps
time = TimeFilter(time_type=TimeType.TIMESTAMP)
surfs = myensemble.surfaces.filter(time=time)
print("Timestamp:", len(surfs))

# get surfaces with time intervals
time = TimeFilter(time_type=TimeType.INTERVAL)
surfs = myensemble.surfaces.filter(time=time)
print("Interval:", len(surfs))


# get surfaces with time data (timestamp or interval)
time = TimeFilter(time_type=TimeType.ALL)
surfs = myensemble.surfaces.filter(time=time)
print("Time data:", len(surfs))


# get surfaces without time data
time = TimeFilter(time_type=TimeType.NONE)
surfs = myensemble.surfaces.filter(time=time)
print("No time data:", len(surfs))


# get avaiable timestamps
timestamps = myensemble.surfaces.timestamps
print("\nTimestamps:", timestamps)

# get available intervals
intervals = myensemble.surfaces.intervals
print("Intervals:", intervals)


# get surfaces with timestamp in range
time = TimeFilter(
    time_type=TimeType.TIMESTAMP, start="2018-01-01", end="2022-01-01"
)
surfs = myensemble.surfaces.filter(time=time)

# get surfaces with time intervals in range
time = TimeFilter(
    time_type=TimeType.INTERVAL, start="2018-01-01", end="2022-01-01"
)
surfs = myensemble.surfaces.filter(time=time)

# get surfaces where intervals overlap with range
time = TimeFilter(
    time_type=TimeType.INTERVAL,
    start="2018-01-01",
    end="2022-01-01",
    overlap=True,
)
surfs = myensemble.surfaces.filter(time=time)

# get surfaces with exact timestamp matching (t0 == start)
time = TimeFilter(time_type=TimeType.TIMESTAMP, start="2018-01-01", exact=True)
surfs = myensemble.surfaces.filter(time=time)

# get surfaces with exact interval matching (t0 == start AND t1 == end)
time = TimeFilter(
    time_type=TimeType.INTERVAL,
    start="2018-01-01",
    end="2022-01-01",
    exact=True,
)
surfs = myensemble.surfaces.filter(time=time)

## Tables
FMU produces results across almost all types of data, including a significant amount of tables. While most tables are relatively small, some tables - such as the `UNSMRY` - can be very large. To deal with these large tables, Sumo is doing some data transformation. In short, we flip the data from being realization-oriented, to being vector-oriented.

<div class="alert alert-block alert-info">
    Refer to the <a href="https://doc-sumo-doc-prod.radix.equinor.com/">Sumo documentation</a> for more information about data transformation of tables.</div>

In this example, we will use the same ensemble as before. We will first find a summary file for a specific realization, and cast that to a Pandas dataframe.

### Getting a single realization of a table
Example: Inplace volumes

In [None]:
mysinglevolumes = myensemble.tables.filter(tagname="vol", name="geogrid", realization=0)
if len(mysinglevolumes) != 1:
    raise ValueError(f"Got {len(mysinglevolumes)} which is not exactly 1.")
print("Inplace volumes (geogrid) for realization 0:")
df = mysinglevolumes[0].to_pandas()
df

Example: UNSMRY

In [None]:
mysinglesummary = myensemble.tables.filter(tagname="summary", realization=0)
if len(mysinglesummary) != 1:
    raise ValueError(f"Got {len(mysinglesummary)} which is not exactly 1.")
print("Summary for realization 0:")
df = mysinglesummary[0].to_pandas()
df

Commonly, however, we don't want the Summary for a single realization but rather a specific set of columns for the ensemble. It might be tempting to start looping through all realizations, but this is not recommended! Rather, you can call for some data transformation (we call this "aggregation" as well) to provide easy access to single columns across an ensemble of large tables:

### Getting a column across many realizations

In [None]:
df = myensemble.tables.filter(tagname="summary").aggregation(operation="collection", column="FOPT").to_pandas()
print("FOPT across all realizations")
df

Note that in addition to the "FOPT" column, you also get the **DATE** and **REAL** columns as categorical columns.

Data transformations of large tables can be tedious. However, when data has been transformed, they are also stored in Sumo and you can access them quicker:

In [None]:
myensemble.tables.filter(tagname="summary", aggregation="collection", column="FOPT")[0].to_pandas()