# Goal: Extract facility metric data using the Contxt Python SDK

Contxt organizes its data in a Customer to Facility to Metric hierarchy. To retrieve metric data for all facilities, the general operation flow is:
1. Instance customer client
1. Locate customer facilities
1. Locate customer metric definitions
1. Retrieve metrics for each customer facility.

This document demonstrates this flow using the [Contxt Python SDK](https://github.com/ndustrialio/contxt-sdk-python).

Note: this example is functional but not optimized. At a minimum, multiprocessing could be used to speed up the data request for a large number of facilities.

In [1]:
from contxt.cli.clients import Clients

from random import randint
from datetime import datetime, timezone

# How long ago should we look for data?

The metric data retrieval service allows bounding a time window for returned data with `effective_start_time` and `effective_end_time` parameters. These values must be reported as ISO 8601 strings matching the `"%Y-%m-%dT%H:%M:%SZ"` format.

In this document, we're going to retrieve all data between now and the date in the `earliest_date` variable we declare below.

In [2]:
earliest_date = '2022-01-01T00:00:00Z'

# Initialize SDK client

If you are in a newly installed environment, you will need to configure your environment's Contxt authentication secrets with the CLI command: `contxt auth login` before running this notebook any further.

In [3]:
clients = Clients(env="production", org_slug="lineage")

# Get organization facilities

The next cell retrieves all of the organization's facilities.

In [4]:
facilities = clients.facilities.get_facilities()
len(facilities)

581

We'll use a randomly selected facility for the rest of the notebook

In [5]:
facilities = [facilities[randint(0, len(facilities) - 1)]]

# Retrieve Customer Metrics

## Locate the facility type definition

Contxt's dynamic nature allows for many asset types, but also leads to each organization having a unique definition for their asset types. The below code cell scans the asset type definitions for the organization and locates the definition for the `Facility` asset type. Most critically, later code will depend on the `Facility` asset type definition's unique id.

In [6]:
asset_types = clients.assets.get_asset_types(clients.org_id)
facility_type = [atype for atype in asset_types if atype.label == 'Facility'][0]
facility_type

AssetType(label='Facility', description='Physical Facility Locations', organization_id='02efa741-a96f-4124-a463-ae13a704b8fc', id='616eee50-5dd3-4009-b8a3-dedfd8a7d56d', is_global=True, global_asset_type_parent_id='5f310899-d8f9-4dac-ae82-cedb2048a8ef', parent_id=None, hierarchy_level=1, created_at=datetime.datetime(2018, 11, 15, 16, 34, 17, 41000, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2018, 11, 15, 16, 34, 17, 41000, tzinfo=datetime.timezone.utc))

## Retrieve Metric Definitions

A call to the SDK client's `get_metrics` function with the facility_type id will retrieve the full list of metric definitions. The below code cell demonstrates this with a filter applied to drop an unwanted definition.

In [7]:
metric_definitions = [mdef for mdef in clients.assets.get_metrics(facility_type.id) if mdef.label != "Blended Rate"]
print(f"Number of definitions: {len(metric_definitions)}")

Number of definitions: 69


## Retrieving Metric Values

Below, we make a python dataclass from our selected metrics. This will allow a degree of data validation as we build the full dataset over relying on simple dictionaries of data.

In [8]:
from dataclasses import make_dataclass, field
from typing import Any


FacilityEntry = make_dataclass(
    'FacilityEntry',
   ["facility", "date"] + [(mdef.label, Any, field(default=None)) for mdef in metric_definitions])

Next, we iterate over each metric for each facility to build out the full data set. Much of the work in this next code cell is in reorganizing the data returned by Contxt into a structure that allows building a single unified table. In this case, that structure is a list of items where each item is all of a Facility's metrics for a given date.

As written, this code can take about ~10 seconds per facility.

In [9]:
all_facility_data = []

# For each facility
for idx, facility in enumerate(facilities):
    facility_data = {}
    
    # For each metric of each facility
    for metric_def in metric_definitions:
        
        # Try to retrieve metric data, and skip this metric if it fails
        try:
            metric_values = clients.ems.get_metric_values(
                facility.asset_id, metric_def.label, params={"effective_start_date": earliest_date})
        except Exception:
            # print(f"\t{metric_def.label} not found for {facility.name}")
            continue
            
        # Reformat metric data into table structure
        for metric_value in metric_values:
            if metric_value.effective_start_date not in facility_data:
                facility_data[metric_value.effective_start_date] = {metric_def.label: metric_value.value}
            else:
                facility_data[metric_value.effective_start_date][metric_def.label] = metric_value.value
    all_facility_data.extend([FacilityEntry(facility.name, date, **fe) for date, fe in facility_data.items()])
    # print(f"Retrieve data for {facility.name}")
print(f"Retrieved {len(all_facility_data) * len(metric_definitions)} data points")

Retrieved 1380 data points


# Now you have your data!

At this point, the facility data has been pulled into the python environment and can be manipulated or handed off to other tools. For example, this list of `FacilityEntry` can be converted directly to a Pandas dataframe like so:

In [10]:
import pandas
df = pandas.DataFrame(all_facility_data)
df.describe()

Unnamed: 0,facility_daily_co2_factor,facility_daily_co2_per_unit,facility_daily_co2_tons,facility_daily_cubic_footage,facility_daily_electricity_spend,facility_daily_electricity_usage,facility_daily_energy_spend_per_lbs,facility_daily_energy_spend_per_unit,facility_daily_inbound_volume,facility_daily_iot_electricity_usage,...,facility_monthly_kwh_per_cuft,facility_monthly_kwh_per_lbs,facility_monthly_kwh_per_revenue,facility_monthly_kwh_per_unit,facility_monthly_max_cuft,facility_monthly_outbound_volume,facility_monthly_production_units,facility_monthly_revenue,facility_monthly_rolling_year_cubic_footage,facility_monthly_rolling_year_elec_kbtu
count,20.0,15.0,20.0,20.0,20.0,20.0,15.0,15.0,16.0,20.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
mean,0.376722,5.809146,1.322089,3229200.0,236.69139,7018.91,0.519932,0.519932,567868.8,7018.91,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
std,0.0,6.40272,0.254377,0.0,45.648693,1350.477521,0.572562,0.572562,396609.5,1350.477521,...,,,,,,,,,,
min,0.376722,1.437777,0.341498,3229200.0,60.857987,1813.0,0.128534,0.128534,0.0,1813.0,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
25%,0.376722,1.556934,1.271328,3229200.0,228.122772,6749.425,0.139564,0.139564,122305.3,6749.425,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
50%,0.376722,1.65671,1.366022,3229200.0,244.451401,7252.15,0.148573,0.148573,719029.5,7252.15,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
75%,0.376722,11.675963,1.470219,3229200.0,263.167979,7805.325,1.047325,1.047325,912056.8,7805.325,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
max,0.376722,18.451985,1.509638,3229200.0,270.886302,8014.6,1.64804,1.64804,1051317.0,8014.6,...,0.043472,7.204373,2.380411,7.204373,3229200.0,10399236.0,19485.137,58972.25,3229200.0,17007964.93
