# Goal extract facility data using the Python Contxt SDK

## Desired Metrics
Per Facility, per day:
* 'Utility Meter IDs',
* 'AMI Usage',
* 'CO2 Factor,
* 'CO2 tons per unit',
* 'CO2 tons',
* 'Cubic Ft',
* 'Final Usage,
* 'Has Complete Utility Bills',
* 'Complete bill data',
* 'Inbound Volume,
* 'IOT Usage',
* 'Outbound Volume',
* 'Revenue',
* 'UT vs RT Rolling Avg % Change',
* 'Statement Meter IDs',
* 'UT vs RT % Change',
* 'Utility Bill Usage',
* 'Spend'

In [1]:
from contxt.cli.clients import Clients
from contxt.utils.serializer import Serializer

import argparse
import os, sys
from pathlib import Path
from datetime import datetime, timedelta
import pytz

# What data are we looking for?

In [2]:
earliest_date = datetime(2019, 1, 1, tzinfo=pytz.UTC).strftime("%Y-%m-%dT%H:%M:%SZ")

# Initialize SDK client

If you are in a newly installed environment, you will need to configure your environment's Contxt authentication secrets with the CLI command: `contxt auth login` before running this notebook any further.

In [3]:
clients = Clients(env="production", org_slug="lineage")

# Get organization facilities

The next cell retrieves all of the organization's facilities.

In [4]:
facilities = clients.facilities.get_facilities()
len(facilities)

576

# Retrieving Facility Metrics over a time window

## Locate the facility type definition

Contxt's dynamic nature allows for many asset types, but also leads to each organization having a unique definition for their asset types. The below code cell scans the asset type definitions for the organization and locates the definition for the `Facility` asset type. Most critically, later code will depend on the `Facility` asset type definition's unique id.

In [5]:
asset_types = clients.assets.get_asset_types()
facility_type = [atype for atype in asset_types if atype.label == 'Facility'][0]
facility_type

AssetType(label='Facility', description='Physical Facility Locations', organization_id='02efa741-a96f-4124-a463-ae13a704b8fc', id='616eee50-5dd3-4009-b8a3-dedfd8a7d56d', is_global=True, global_asset_type_parent_id='5f310899-d8f9-4dac-ae82-cedb2048a8ef', parent_id=None, hierarchy_level=1, created_at=datetime.datetime(2018, 11, 15, 16, 34, 17, 41000, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2018, 11, 15, 16, 34, 17, 41000, tzinfo=datetime.timezone.utc))

## Retrieve Metric Definitions

In [6]:
#metric_definitions = [metric_def for metric_def in clients.assets.get_metrics(facility_type.id) if metric_def.label in only_metrics]
metric_definitions = [mdef for mdef in clients.assets.get_metrics(facility_type.id) if mdef.label != "Blended Rate"]
print(f"Number of definitions: {len(metric_definitions)}")

Number of definitions: 69


## Retrieving Metric Values

Below, we make a python dataclass from our selected metrics. This will allow a degree of data validation as we build the full dataset over relying on simple dictionaries of data.

In [7]:
from dataclasses import make_dataclass, field
from typing import Any


FacilityEntry = make_dataclass(
    'FacilityEntry',
   ["facility", "date"] + [(mdef.label, Any, field(default=None)) for mdef in metric_definitions])

Next, we iterate over each metric for each facility to build out the full data set

In [None]:
print((start := datetime.utcnow()))
all_facility_data = []
for idx, facility in enumerate(facilities):
    facility_data = {}
    for metric_def in metric_definitions:
        try:
            metric_values = clients.ems.get_metric_values(
                facility.asset_id, metric_def.label, params={"effective_start_date": earliest_date})
        except Exception:
            print(f"\t{metric_def.label} not found for {facility.name}")
            continue
        for mv in metric_values:
            if mv.effective_start_date not in facility_data:
                facility_data[mv.effective_start_date] = {metric_def.label: mv.value}
            else:
                facility_data[mv.effective_start_date][metric_def.label] = mv.value
    all_facility_data.extend([FacilityEntry(facility.name, date, **fe) for date, fe in facility_data.items()])
    print(f"Run time as of {idx + 1}/{len(facilities)}: {(datetime.utcnow() - start).total_seconds() / 60}")
print(len(all_facility_data))

2022-01-07 21:23:21.586002
	facility_daily_has_complete_utility_bills not found for Friona (CC: 100189)
	facility_daily_statement_meters not found for Friona (CC: 100189)
Run time as of 1/576: 0.21036103333333334
	facility_daily_active_meters not found for Everett (CC: 100304)
	facility_daily_has_complete_utility_bills not found for Everett (CC: 100304)
	facility_daily_statement_meters not found for Everett (CC: 100304)
Run time as of 2/576: 0.38815866666666665
	facility_daily_active_meters not found for Dayton (CC: 100311)
	facility_daily_has_complete_utility_bills not found for Dayton (CC: 100311)
	facility_daily_statement_meters not found for Dayton (CC: 100311)
Run time as of 3/576: 0.6030101833333333
	facility_daily_active_meters not found for Cofer (CC: 100208)
	facility_daily_has_complete_utility_bills not found for Cofer (CC: 100208)
	facility_daily_statement_meters not found for Cofer (CC: 100208)
Run time as of 4/576: 0.8276538333333333
	facility_daily_active_meters not found

At this point, the facility data has been pulled into the python environment and can be manipulated or handed off to other tools. For example, this list of `FacilityEntry` can be converted directly to a Pandas dataframe like so:

In [None]:
import pandas
df = pandas.DataFrame(all_facility_data)
df.to_csv("metric_data.csv", header=True, index=False)
df