# ATB Notebook

In this notebook, we demonstrate how to access and visualize the 2021 ATB database that is stored in the AWS database using the OEDI Open Data Access Tools.

# 0 Prerequisites

To run this example, it requires you have OEDI data lake deployed, where all quries run through. To learn how to deploy the OEDI data lake, please refer to the documentation here - https://openedi.github.io/open-data-access-tools/. We will be using the `oedi_atb` database. You must deploy the data lake with your own S3 bucket and update the `staging_location` and `staging_region_name` variables below.

In [1]:
database_name = "oedi_atb"
table_name = "atb_electricity_parquet_2021"
region_name = "us-west-2"
staging_location = "s3://user-owned-staging-bucket"

Alternatively, if you don't have an AWS account, you can download the atb data manually from our S3 bucket viewer https://data.openei.org/s3_viewer?bucket=oedi-data-lake. Use the following code to import the file into pandas dataframes, and then skip ahead to sections 2.2 and 3.2.

In [None]:
# import pandas as pd
# filepath = 'ATBe.csv' # Put the ATBe.csv file in the same directory as this notebook, or provide the full path
# raw_data = pd.read_csv(filepath)

# df = raw_data[[
#     'core_metric_key',
#     'core_metric_parameter',
#     'core_metric_case',
#     'crpyears',
#     'technology',
#     'scenario',
#     'core_metric_variable',
#     'value']]

# df = df[df.core_metric_parameter.isin([
#     'Calculated Rate of Return on Equity Real',
#     'Calculated Interest Rate Real',
#     'Debt Fraction',
#     'FCR',
#     'Interest Rate Nominal',
#     'Rate of Return on Equity Nominal',
#     'WACC Nominal',
#     'WACC Real'])]

# df = df[
#     (df.technology != 'AEO') &
#     (df.crpyears == 30) &
#     (df.scenario == 'Moderate')]

# df_CM = raw_data[[
#     'core_metric_parameter',
#     'core_metric_case',
#     'crpyears',
#     'technology',
#     'techdetail',
#     'scenario',
#     'units',
#     'core_metric_variable',
#     'value'
# ]]

# df_CM = df_CM[df_CM.core_metric_parameter.isin([
#     'LCOE',
#     'CAPEX',
#     'CF',
#     'Fixed O&M',
#     'Variable O&M'
# ])]

# df_CM = df_CM[df_CM.technology != 'AEO']

# 1 Metadata

In oedi, the `OEDIGlue` class provides utility methods to retrieve the metadata from the database, where the metadata includes `Columns`, `Partition Keys`, and `Partition Values`. Let's create an `OEDIGlue` object and use it to see see what columns are in this database.

In [2]:
from oedi.AWS.glue import OEDIGlue
glue = OEDIGlue()
glue.get_table_columns(database_name, table_name)

Unnamed: 0,Name,Type
0,revision,bigint
1,atb_year,bigint
2,core_metric_key,string
3,core_metric_parameter,string
4,core_metric_case,string
5,crpyears,bigint
6,technology,string
7,techdetail,string
8,scenario,string
9,core_metric_variable,string


# 2 Financial Data

## 2.1 Run Query

The `OEDIAthena` class is used to run a SQL query on the database and store the results in a pandas dataframe. Let's capture all the financial data. Note that 'core_metric_variable' is the year and 'value' is the value of the parameter given as a decimal rate.

In [26]:
from oedi.AWS.athena import OEDIAthena
athena = OEDIAthena(staging_location=staging_location, region_name=region_name)

query_string = f"""
    SELECT
        core_metric_key,
        core_metric_parameter,
        core_metric_case,
        crpyears,
        technology,
        scenario,
        core_metric_variable,
        value
    FROM {database_name}.{table_name}
    WHERE core_metric_parameter IN (
        'Calculated Rate of Return on Equity Real',
        'Calculated Interest Rate Real',
        'Debt Fraction',
        'FCR',
        'Interest Rate Nominal',
        'Rate of Return on Equity Nominal',
        'WACC Nominal',
        'WACC Real')
    AND technology <> 'AEO'
    AND crpyears = 30
    AND scenario = 'Moderate'
"""
df = athena.run_query(query_string)

In [35]:
# Warning: If this cell evaluates to True, then there is currently a duplication bug in the ATBe database.
# The code in the rest of this notebook is desinged to work around this bug. If the bug is ever fixed (and
# this cell evaluates to False), then a minor edit will be required and is explained in additional comments
# below.

len(df[(df.core_metric_parameter == 'WACC Real') & (df.technology == 'Biopower') & (df.core_metric_variable == '2019')]) == 4

True

## 2.2 Visualization

Now that we have a dataframe of the financials, we can build interactive plots using ipywidgets. We also add functionality for the user to export their data selection as a .csv.

In [4]:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
from ipywidgets import interact, Dropdown, widgets, Button, Layout, SelectMultiple, Output, Text
from math import ceil

In [24]:
# Note: If you run the code in section 3 and then come back to section 2, you must rerun this cell for it to work properly.

print('Select multiple technologies by using ctrl-click')

# Define style for widgets
style = {'description_width': 'initial'}

# Get unique values for user controls
core_metric_parameter = df.core_metric_parameter.unique()
technology_options = df.technology.unique()

# Find a reasonable maximum value for the y-axis for each core_metric_parameter
ymax = {}
for metric in core_metric_parameter:
    ymax[metric] = ceil((df[df['core_metric_parameter'] == metric].value.max() + .01) * 10)/10
    
# Make widgets
core_metric_parameter_W = Dropdown(options = core_metric_parameter, style = style)
technology_W = SelectMultiple(options = technology_options, style = style, value = [technology_options[0]])

# Text widget for export filename
filename_W = Text(
    value='ATB_filtered_financials.csv',
    description='File Name:',
    disabled=False
)

# The interact decorator calls the function below to continuously listen for inputs from
# the widgets and then filter and plot the corresponding data. 
@interact
def atb_filter_options(
    # set up the variables to "listen" to the inputs from the widgets
    core_metric_parameter = core_metric_parameter_W, 
    technology = technology_W,
    filename = filename_W
    ):

    # filter the df based on the user's choices in the dropdowns
    df_f = df[
    (df['core_metric_parameter'] == core_metric_parameter) &
    (df.technology.isin(technology))
    ]
    # Seperate market and R&D data
    # Note: This code handles the duplication bug
    df_market = df_f[(df_f['core_metric_case'] == 'Market') & (df_f['core_metric_key'].str.startswith('M'))]
    df_RandD = df_f[(df_f['core_metric_case'] == 'R&D') & (df_f['core_metric_key'].str.startswith('R'))]
    
    # The following two lines are for if the duplication bug is fixed
    #df_market = df_f[df_f['core_metric_case'] == 'Market']
    #df_RandD = df_f[df_f['core_metric_case'] == 'R&D']

    # Plot the data
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (8,5), sharey = True)
    fig.suptitle(core_metric_parameter)
    for tech in technology:
        df_market[df_market['technology'] == tech].plot(x = 'core_metric_variable', y = 'value', ax = ax1, xlabel = 'year', title = 'Market')
        df_RandD[df_RandD['technology'] == tech].plot(x = 'core_metric_variable', y = 'value', ax = ax2, xlabel = 'year', title = 'R & D')
    ax1.get_legend().remove()
    ax1.set_ylim(ymin = 0, ymax = ymax[core_metric_parameter])
    ax1.yaxis.set_major_formatter(mtick.PercentFormatter(1.0)) # Converts the decimal rate value to be displayed as a percentage
    ax2.legend(technology, loc='center left', bbox_to_anchor=(1, 0.5))

    # Now we set up widgets to facilitate exporting the data to a file

    # Make and style a Button widget to trigger data export
    button = Button(description="Export Selection to .csv", layout = Layout(width = '200px', height = '50px'))
    button.style.button_color = 'green'

    # Make a widget for output
    output = Output()
    
    # Set button action that will export the ATB dataset, filtered as per user's dropdown choices, once they click the button
    def on_button_clicked(b):
        with output:
            df_f.to_csv(filename)
            print("CSV Export Successful!")
    
    # Display button and link to action
    display(button, output)
    button.on_click(on_button_clicked)

Select multiple technologies by using ctrl-click


interactive(children=(Dropdown(description='core_metric_parameter', options=('Calculated Rate of Return on Equ…

# 3 Core Metrics
## 3.1 Run Query
Next, let's make a new dataframe by querying the core metrics.

In [6]:
query_string = f"""
    SELECT
        core_metric_parameter,
        core_metric_case,
        crpyears,
        technology,
        techdetail,
        scenario,
        units,
        core_metric_variable,
        value
    FROM {database_name}.{table_name}
    WHERE core_metric_parameter IN (
        'LCOE',
        'CAPEX',
        'CF',
        'Fixed O&M',
        'Variable O&M'
        )
    AND technology <> 'AEO'
"""
df_CM = athena.run_query(query_string)

## 3.2 Visualization

Just like before, once we have a dataframe, we can use ipywidgets to build visualizations and export functionality.

In [20]:
import numpy as np

# Define style for widgets
style = {'description_width': 'initial'}

# Get unique values for user controls
core_metric_parameters = df_CM.core_metric_parameter.unique()
technology_options = df_CM.technology.unique()
case_options = df_CM.core_metric_case.unique()

# List of scenarios for filtering and legend
scenarios = ['Conservative', 'Moderate', 'Advanced']

# Create dictionary of max values for the y-scales in the plots based on technology and core_metric_parameter
ymax = {}
for tech in technology_options:
    ymax[tech] = {}
    for metric in core_metric_parameters:
        x = df_CM[(df_CM['core_metric_parameter'] == metric) & (df_CM['technology'] == tech)].value.max()
        if np.isnan(x) or x == 0:
            ymax[tech][metric] = 1
        else:
            ymax[tech][metric] = x * 1.1

# Create dictionary of units to label the y-axes
ylabels = {}
for metric in core_metric_parameters:
    units = df_CM[df_CM['core_metric_parameter'] == metric].iloc[0]['units']
    if type(units) == str:
        ylabels[metric] = units
    else:
        ylabels[metric] = ''

# Make widgets
technology_W = Dropdown(options = technology_options, style = style)
case_W = Dropdown(options = case_options, style = style)
crpyears_W = Dropdown(style = style)
tech_detail_W = Dropdown(style = style)

# The options for crpyears and tech_detail depend on which technology is selected. We need to make these
# widgets update based on the technology_W widget.
crpyears_dict = {}
tech_detail_dict = {}
for item in technology_options: 
    crpyears_dict[item] = list(df_CM[df_CM['technology'] == item].crpyears.unique())
    tech_detail_dict[item] = list(df_CM[df_CM['technology'] == item].techdetail.unique())

def update_W_options(*args): 
    crpyears_W.options = crpyears_dict[technology_W.value]
    tech_detail_W.options = tech_detail_dict[technology_W.value]

technology_W.observe(update_W_options)

# Text widget for export filename
filename_W = Text(
    value='ATB_filtered_core_metrics.csv',
    description='File Name:',
    disabled=False
)

# The interact decorator calls the function below to continuously listen for inputs from
# the widgets and then filter and plot the corresponding data. 
@interact
def atb_filter_options(
    # Set up the variables to "listen" to the inputs from the widgets 
    technology = technology_W,
    case = case_W,
    crpyears = crpyears_W, 
    tech_detail = tech_detail_W,
    filename = filename_W,
    ):

    # Filter the df based on the user's choices in the dropdowns
    df_CM_f = df_CM[
    (df_CM['technology'] == technology) &
    (df_CM['core_metric_case'] == case) &
    (df_CM['crpyears'] == crpyears) &
    (df_CM['techdetail'] == tech_detail)
    ]

    # Plot the data in a grid
    fig, ((ax1, ax2, ax3),(ax4, ax5, ax6)) = plt.subplots(2, 3, figsize = (10,6))
    axes = iter([ax1, ax2, ax3, ax4, ax5])
    ax6.set_axis_off()
    for cmp in core_metric_parameters:
        ax = next(axes)
        ax.set_ylim(ymin = 0, ymax = ymax[technology][cmp])
        for scenario in scenarios:
            df_CM_f[(df_CM_f['core_metric_parameter'] == cmp) & (df_CM_f['scenario'] == scenario)].plot(
                x = 'core_metric_variable',
                y = 'value', ax = ax,
                xlabel = 'Year',
                ylabel = ylabels[cmp],
                title = cmp)
        ax.get_legend().remove()
    
    ax5.legend(scenarios, loc='center left', bbox_to_anchor=(1, 0.5))
    plt.tight_layout()
    
    # Now we set up widgets to facilitate exporting the data to a file

    # Make and style a Button widget to trigger data export
    button = Button(description="Export Selection to .csv", layout = Layout(width = '200px', height = '50px'))
    button.style.button_color = 'green'

    # Make a widget for output
    output = Output()
    
    # Set button action that will export the ATB dataset, filtered as per user's dropdown choices, once they click the button
    def on_button_clicked(b):
        with output:
            df_CM_f.to_csv(filename)
            print("CSV Export Successful!")
    
    # Display button and link to action
    display(button, output)
    button.on_click(on_button_clicked)
    

interactive(children=(Dropdown(description='technology', options=('Biopower', 'CSP', 'Commercial Battery Stora…