# FRESCO Anvil Database Access and Analytics Notebook

## Overview

The FRESCO Analytics Notebook is tailored for effortless analysis of the Anvil dataset. It offers functionalities to:

- Extract filtered data from the Anvil database.
- Conduct statistical analyses on the filtered data.
- Visualize the results of the analyses.

The notebook is structured into three main sections:

### 1. Data Filtering
Define your analysis scope by selecting a specific datetime window. Customize your dataset further with various filters.

### 2. Data Analysis Options
Explore a range of analysis options and select the ones that align with your requirements.

### 3. Data Analysis and Visualizations
Execute the chosen analysis on the filtered dataset and visualize the outcomes.

## Step-by-Step Instructions

- **Cell 1:** Define the dataset's temporal boundaries. This will guide the extraction of relevant host time series and job accounting data. Additional conditions can be added for more refined data filtering.

- **Cell 2:** Configure your statistical analyses by:
  - Choosing statistics (e.g., Mean, Median).
  - Setting a threshold for "Ratio of Data Outside Threshold".
  - Selecting an interval type (Count or Time). If "Time" is chosen, define the time unit and interval count.

- **Cell 3:** Analyze the time series data based on Cell 2 configurations and generate visualizations.

## Database Table Information

### Host Table
- **jid**: Unique job identifier
- **host**: Origin node of the data point
- **event**: Resource usage metric type
- **value**: Numeric value of the metric
- **unit**: Measurement unit of the metric
- **time**: Timestamp of the data point


**Event Column Metrics:**
- **cpuuser:** CPU user mode average percentage.
- **block:** Data transfer rate to/from block devices.
- **memused:** OS's total physical memory usage.
- **memused_minus_diskcache:** Physical memory usage excluding caches.
- **gpu_usage:** GPU active time average percentage (only for GPU jobs).
- **nfs:** Data transfer rate over NFS mounts.

### Job Table
- **account**: Account or project name
- **jid**: Unique job identifier
- **ncores**: Total cores assigned to the job
- **ngpus**: Total GPUs assigned to the job
- **nhosts**: Number of nodes assigned to the job
- **timelimit**: Requested job duration (in seconds)
- **queue**: Job submission queue name
- **end_time**: Job end time
- **start_time**: Job start time
- **submit_time**: Job submission time
- **username**: Job owner's name
- **exitcode**: Job's exit status
- **host_list**: List of nodes the job ran on
- **jobname**: Job's name

In [2]:
import notebook_functions as nbf
import matplotlib.pyplot as plt
import pandas as pd
import ipywidgets as widgets
from datetime import datetime
from IPython.display import display, clear_output, HTML

where_conditions_jobs = []
time_window_valid_jobs = False
MAX_DAYS_HOSTS = 31
MAX_DAYS_JOBS = 180
account_log_df = pd.DataFrame()
host_data_sql_query = ""

# ********************************* JOB DATA **********************************

# Display SQL query
def display_query_jobs():
    query, params = nbf.construct_job_data_query(where_conditions_jobs, job_data_columns_dropdown.value, validate_button_jobs.description, start_time_jobs.value, end_time_jobs.value)
    with query_output_jobs:
        clear_output(wait=True)
        print(f"Current SQL query:\n{query}\nParameters: {params}")


def update_value_input_jobs(change):
    global value_input_container_jobs
    if '_time' in change['new']:
        value_input = widgets.NaiveDatetimePicker(value=datetime.now().replace(microsecond=0), description='Value:')
    elif change['new'] == 'queue':
        value_input = widgets.Dropdown(
            options=['standard', 'wholenode', 'shared', 'highmem', 'gpu', 'benchmarking', 'wide', 'debug', 'gpu-debug'],
            description='Value:')
    elif change['new'] == 'exitcode':
        value_input = widgets.Dropdown(options=['TIMEOUT', 'COMPLETED', 'CANCELLED', 'FAILED', 'NODE_FAIL'],
                                       description='Value:')
    else:
        value_input = widgets.Text(description='Value:')
    value_input_container_jobs.children = [value_input]


# Add condition
def add_condition_jobs(b):
    global error_output_jobs, where_conditions_jobs
    if not time_window_valid_jobs:
        with error_output_jobs:
            clear_output(wait=True)
            print("Please enter a valid time window before adding conditions.")
        return
    with error_output_jobs:
        clear_output(wait=True)
        column = columns_dropdown_jobs.value
        value_widget = value_input_container_jobs.children[0]
        if isinstance(value_widget, widgets.Dropdown):
            value = value_widget.value
        elif isinstance(value_widget.value, str):
            value = value_widget.value.upper()
        else:
            value = value_widget.value
        error_message = nbf.validate_condition_jobs(column, value)
        if error_message:
            print(error_message)
        else:
            condition = (column, operators_dropdown_jobs.value, value)
            where_conditions_jobs.append(condition)
            condition_list_jobs.options = [f"{col} {op} '{val}'" for col, op, val in where_conditions_jobs]
            display_query_jobs()


# Remove condition
def remove_condition_jobs(b):
    global error_output_jobs
    with error_output_jobs:
        clear_output(wait=True)
        for condition in condition_list_jobs.value:
            index = condition_list_jobs.options.index(condition)
            where_conditions_jobs.pop(index)
        condition_list_jobs.options = [f"{col} {op} '{val}'" for col, op, val in where_conditions_jobs]
        display_query_jobs()


# Validate dates
def on_button_clicked_jobs(b):
    global time_window_valid_jobs
    time_difference = end_time_jobs.value - start_time_jobs.value
    if end_time_jobs.value and start_time_jobs.value >= end_time_jobs.value:
        b.description = "Invalid Times"
        b.button_style = 'danger'
        time_window_valid_jobs = False
    elif start_time_jobs.value and end_time_jobs.value <= start_time_jobs.value:
        b.description = "Invalid Times"
        b.button_style = 'danger'
        time_window_valid_jobs = False
    elif time_difference.days > MAX_DAYS_JOBS:  # Check if the time window is greater than one month
        b.description = "Time Window Too Large"
        b.button_style = 'danger'
        time_window_valid_jobs = False
    else:
        b.description = "Times Valid"
        b.button_style = 'success'
        time_window_valid_jobs = True
        with error_output_jobs:  # Clear the error message if the time window is valid
            clear_output(wait=True)
    display_query_jobs()  # Display the current SQL query for jobs



# Execute query button handler for jobs
def on_execute_button_clicked_jobs(b):
    global account_log_df
    with output_jobs:
        clear_output(wait=True)  # Clear the previous output
        if not time_window_valid_jobs:
            print("Please enter a valid time window before executing the query.")
            return
        try:
            query, params = nbf.construct_job_data_query(where_conditions_jobs, job_data_columns_dropdown.value, validate_button_jobs.description, start_time_jobs.value, end_time_jobs.value)
            account_log_df = nbf.execute_sql_query_chunked(query, account_log_df, params=params)
            print(f"\nResults for query: \n{query}\nParameters: {params}")
            display(account_log_df)

            # Code to give user the option to download the filtered data
            print("\nDownload the Job table data?")
            csv_acc_download_button = widgets.Button(description="Download as CSV")
            excel_acc_download_button = widgets.Button(description="Download as Excel")

            start_jobs = start_time_jobs.value.strftime('%Y-%m-%d-%H-%M-%S')
            end_jobs = end_time_jobs.value.strftime('%Y-%m-%d-%H-%M-%S')

            def on_acc_csv_button_clicked(b):
                print(f"\nGenerating job-data-excel-{start_jobs}-to-{end_jobs}.xlsx\n\nPlease check the file explorer on the left. The 'Refresh the File Browser' button might need to be clicked.")
                nbf.create_csv_download_file(account_log_df, filename=f"job-data-csv-{start_jobs}-to-{end_jobs}.csv")

            def on_acc_excel_button_clicked(b):
                print(f"\nGenerating job-data-csv-{start_jobs}-to-{end_jobs}.csv\n\nPlease check the file explorer on the left. The 'Refresh the File Browser' button might need to be clicked.")
                nbf.create_excel_download_file(account_log_df,  filename=f"job-data-excel-{start_jobs}-to-{end_jobs}.xlsx")

            csv_acc_download_button.on_click(on_acc_csv_button_clicked)
            excel_acc_download_button.on_click(on_acc_excel_button_clicked)

            # Put the buttons in a horizontal box
            button_box2 = widgets.HBox([csv_acc_download_button, excel_acc_download_button])
            display(button_box2)

        except Exception as e:
            print(f"An error occurred: {e}")



# ********************************* HOST DATA **********************************
where_conditions_hosts = []
time_window_valid_hosts = False
time_series_df = pd.DataFrame()


# Display SQL query
def display_query_hosts():
    global host_data_sql_query
    query = nbf.construct_query_hosts(where_conditions_hosts, host_data_columns_dropdown.value, validate_button_hosts.description, start_time_hosts.value, end_time_hosts.value)
    with query_output_hosts:
        clear_output(wait=True)
        print(f"Current SQL query:\n{query}")
        host_data_sql_query = query


# Add condition
def add_condition_hosts(b):
    global error_output_hosts, where_conditions_hosts
    if not time_window_valid_hosts:
        with error_output_hosts:
            clear_output(wait=True)
            print("Please enter a valid time window before adding conditions.")
        return
    with error_output_hosts:
        clear_output(wait=True)
        column = columns_dropdown_hosts.value
        value = value_input_hosts.value
        if 'job' in value.casefold() or 'node' in value.casefold():
            value = value.upper()
        error_message = nbf.validate_condition_hosts(column, value)
        if error_message:
            print(error_message)
        else:
            condition = (column, operators_dropdown_hosts.value, value)
            where_conditions_hosts.append(condition)
            condition_list_hosts.options = [f"{col} {op} '{val}'" for col, op, val in where_conditions_hosts]
            display_query_hosts()


# Remove condition
def remove_condition_hosts(b):
    global error_output_hosts
    with error_output_hosts:
        clear_output(wait=True)
        selected_conditions = list(condition_list_hosts.value)
        for condition in selected_conditions:
            index = condition_list_hosts.options.index(condition)
            where_conditions_hosts.pop(index)
        condition_list_hosts.options = [f"{col} {op} '{val}'" for col, op, val in where_conditions_hosts]
        display_query_hosts()


# Validate dates for hosts
def on_button_clicked_hosts(b):
    global time_window_valid_hosts
    time_difference = end_time_hosts.value - start_time_hosts.value
    if end_time_hosts.value and start_time_hosts.value >= end_time_hosts.value:
        b.description = "Invalid Times"
        b.button_style = 'danger'
        time_window_valid_hosts = False
    elif start_time_hosts.value and end_time_hosts.value <= start_time_hosts.value:
        b.description = "Invalid Times"
        b.button_style = 'danger'
        time_window_valid_hosts = False
    elif time_difference.days > MAX_DAYS_HOSTS:  # Check if the time window is greater than one month
        b.description = "Time Window Too Large"
        b.button_style = 'danger'
        b.button_style = 'danger'
        time_window_valid_hosts = False
    else:
        b.description = "Times Valid"
        b.button_style = 'success'
        time_window_valid_hosts = True
        with error_output_hosts:  # Clear the error message if the time window is valid
            clear_output(wait=True)
    display_query_hosts()  # Display the current SQL query for hosts


# Execute query button handler for hosts
def on_execute_button_clicked_hosts(b):
    global time_series_df, host_data_sql_query
    with output_hosts:
        clear_output(wait=True)  # Clear the previous output
        if not time_window_valid_hosts:
            print("Please enter a valid time window before executing the query.")
            return
        try:
            query, params = nbf.construct_query_hosts(where_conditions_hosts, host_data_columns_dropdown.value, validate_button_hosts.description, start_time_hosts.value, end_time_hosts.value)
            time_series_df = nbf.execute_sql_query_chunked(query, time_series_df, params=params)
            print(f"\nResults for query: \n{host_data_sql_query}\nParameters: {params}")
            display(time_series_df)

            # Code to give user the option to download the filtered data
            print("\nDownload the filtered Host table data?")
            csv_download_button = widgets.Button(description="Download as CSV")
            excel_download_button = widgets.Button(description="Download as Excel")

            start = start_time_hosts.value.strftime('%Y-%m-%d-%H-%M-%S')
            end = end_time_hosts.value.strftime('%Y-%m-%d-%H-%M-%S')

            def on_csv_button_clicked(b):
                print(f"\nGenerating host-data-csv-{start}-to-{end}.csv\n\nPlease check the file explorer on the left. The 'Refresh the File Browser' button might need to be clicked.")
                nbf.create_csv_download_file(time_series_df, filename=f"host-data-csv-{start}-to-{end}.csv")

            def on_excel_button_clicked(b):
                print(f"\nGenerating host-data-excel-{start}-to-{end}.xlsx\n\nPlease check the file explorer on the left. The 'Refresh the File Browser' button might need to be clicked.")
                nbf.create_excel_download_file(time_series_df, filename=f"host-data-excel-{start}-to-{end}.xlsx")

            csv_download_button.on_click(on_csv_button_clicked)
            excel_download_button.on_click(on_excel_button_clicked)

            # Put the buttons in a horizontal box
            button_box = widgets.HBox([csv_download_button, excel_download_button])
            display(button_box)

        except Exception as e:
            print(f"An error occurred: {e}")



# Function to update the value input widget based on the selected column
def update_value_input_hosts(change):
    global value_input_hosts
    if change['new'] == 'unit':
        value_input_hosts = widgets.Dropdown(
            options=['CPU %', 'GPU %', 'GB:memused', 'GB:memused_minus_diskcache', 'GB/s', 'MB/s'],
            description='Value:')
    elif change['new'] == 'event':
        value_input_hosts = widgets.Dropdown(
            options=['cpuuser', 'block', 'memused', 'memused_minus_diskcache', 'gpu_usage', 'nfs'],
            description='Value:')
    else:
        value_input_hosts = widgets.Text(description='Value:')
    value_input_container_hosts.children = [value_input_hosts]

# ****************************** HOST DATA WIDGETS **********************************

# Widgets
banner_hosts_message = widgets.HTML("<h1>Query the Host Data Table</h1>")
query_time_message_hosts = widgets.HTML(f"<h5>Please select the start and end times for your query. Max of <b>{MAX_DAYS_HOSTS}</b> days per query.</h5>")
query_cols_message = widgets.HTML("<h5>Please select columns you want to query:</h5>")
request_filters_message = widgets.HTML("<h5>Please add conditions to filter the data:</h5>")
current_filters_message = widgets.HTML("<h5>Current filtering conditions:</h5>")
host_data_columns_dropdown = widgets.SelectMultiple(
    options=['*', 'host', 'jid', 'type', 'event', 'unit', 'value', 'diff', 'arc'], value=['*'], description='Columns:')
columns_dropdown_hosts = widgets.Dropdown(options=['host', 'jid', 'type', 'event', 'unit', 'value', 'diff', 'arc'],
                                          description='Column:')
operators_dropdown_hosts = widgets.Dropdown(options=['=', '!=', '<', '>', '<=', '>=', 'LIKE'], description='Operator:')
value_input_hosts = widgets.Text(description='Value:')

start_time_hosts = widgets.NaiveDatetimePicker(value=datetime.now().replace(microsecond=0), description='Start Time:')
end_time_hosts = widgets.NaiveDatetimePicker(value=datetime.now().replace(microsecond=0), description='End Time:')

# start_time_hosts = widgets.DatePicker()
# end_time_hosts = widgets.DatePicker()

validate_button_hosts = widgets.Button(description="Validate Dates")
execute_button_hosts = widgets.Button(description="Execute Query")
add_condition_button_hosts = widgets.Button(description="Add Condition")
remove_condition_button_hosts = widgets.Button(description="Remove Condition")
condition_list_hosts = widgets.SelectMultiple(options=[], description='Conditions:')
output_hosts = widgets.Output()
query_output_hosts = widgets.Output()
error_output_hosts = widgets.Output()

# Attach the update function to the 'columns_dropdown' widget
columns_dropdown_hosts.observe(update_value_input_hosts, names='value')

# Container to hold the value input widget
value_input_container_hosts = widgets.HBox([value_input_hosts])

# Button events.
validate_button_hosts.on_click(on_button_clicked_hosts)
execute_button_hosts.on_click(on_execute_button_clicked_hosts)
add_condition_button_hosts.on_click(add_condition_hosts)
remove_condition_button_hosts.on_click(remove_condition_hosts)

condition_buttons = widgets.HBox([add_condition_button_hosts, remove_condition_button_hosts])


# Group the widgets for "hosts" into a VBox
hosts_group = widgets.VBox([
    banner_hosts_message,
    query_time_message_hosts,
    start_time_hosts,
    end_time_hosts,
    validate_button_hosts,
    query_cols_message,
    host_data_columns_dropdown,
    request_filters_message,
    columns_dropdown_hosts,
    operators_dropdown_hosts,
    value_input_container_hosts,
    condition_buttons,
    current_filters_message,
    condition_list_hosts,
    error_output_hosts,
    execute_button_hosts,
    query_output_hosts,
    output_hosts
])

# ****************************** JOB DATA WIDGETS **********************************

# Widgets for job_data
banner_jobs = widgets.HTML("<h1>Query the Job Data Table</h1>")
query_time_message_jobs = widgets.HTML(f"<h5>Please select the start and end times for your query. Max of <b>{MAX_DAYS_JOBS}</b> days per query.</h5>")
job_data_columns_dropdown = widgets.SelectMultiple(
    options=['*', 'jid', 'submit_time', 'start_time', 'end_time', 'runtime', 'timelimit', 'node_hrs', 'nhosts',
             'ncores', 'ngpus', 'username', 'account', 'queue', 'state', 'jobname', 'exitcode', 'host_list'],
    value=['*'], description='Columns:')
columns_dropdown_jobs = widgets.Dropdown(
    options=['jid', 'submit_time', 'start_time', 'end_time', 'runtime', 'timelimit', 'node_hrs', 'nhosts', 'ncores',
             'ngpus', 'username', 'account', 'queue', 'state', 'jobname', 'exitcode', 'host_list'],
    description='Column:')
operators_dropdown_jobs = widgets.Dropdown(options=['=', '!=', '<', '>', '<=', '>=', 'LIKE'], description='Operator:')
value_input_jobs = widgets.Text(description='Value:')
start_time_jobs = widgets.NaiveDatetimePicker(value=datetime.now().replace(microsecond=0), description='Start Time:')
end_time_jobs = widgets.NaiveDatetimePicker(value=datetime.now().replace(microsecond=0), description='End Time:')
validate_button_jobs = widgets.Button(description="Validate Dates")
execute_button_jobs = widgets.Button(description="Execute Query")
output_jobs = widgets.Output()
query_output_jobs = widgets.Output()
error_output_jobs = widgets.Output()
add_condition_button_jobs = widgets.Button(description="Add Condition")
remove_condition_button_jobs = widgets.Button(description="Remove Condition")
condition_list_jobs = widgets.SelectMultiple(options=[], description='Conditions:')

# Attach the update function to the 'columns_dropdown' widget
columns_dropdown_jobs.observe(update_value_input_jobs, names='value')

# Container to hold the value input widget
value_input_container_jobs = widgets.HBox([value_input_jobs])

# Button events
validate_button_jobs.on_click(on_button_clicked_jobs)
execute_button_jobs.on_click(on_execute_button_clicked_jobs)
add_condition_button_jobs.on_click(add_condition_jobs)
remove_condition_button_jobs.on_click(remove_condition_jobs)
condition_buttons_jobs = widgets.HBox([add_condition_button_jobs, remove_condition_button_jobs])  # HBox for the buttons


# Group the widgets for "jobs" into another VBox
jobs_group = widgets.VBox([
    banner_jobs,
    query_time_message_jobs,
    start_time_jobs,
    end_time_jobs,
    validate_button_jobs,
    query_cols_message,
    job_data_columns_dropdown,
    request_filters_message,
    columns_dropdown_jobs,
    operators_dropdown_jobs,
    value_input_container_jobs,
    condition_buttons_jobs,
    current_filters_message,
    condition_list_jobs,
    error_output_jobs,
    execute_button_jobs,
    query_output_jobs,
    output_jobs
])

# Use GridBox to place the two VBox widgets side by side
grid = widgets.GridBox(children=[hosts_group, jobs_group],
                       layout=widgets.Layout(
                           width='100%',
                           grid_template_columns='50% 50%',  # Two columns, each taking up 50% of the width
                           grid_template_rows='auto',        # One row, height determined by content
                       ))

display(grid)


GridBox(children=(VBox(children=(HTML(value='<h1>Query the Host Data Table</h1>'), HTML(value='<h5>Please sele…

In [2]:
try:
    stats = widgets.SelectMultiple(
        options=['None', 'Mean', 'Median', 'Standard Deviation', 'PDF', 'CDF', 'Ratio of Data Outside Threshold'],
        value=['None'],
        description='Statistics',
        disabled=False
    )

    ratio_threshold = widgets.IntText(
        value=0,
        description='Value:',
        disabled=True  # disabled by default
    )

    interval_type = widgets.Dropdown(
        options=['None', 'Count', 'Time'],
        value='None',
        description='Interval Type',
        disabled=True  # disabled by default
    )

    time_units = widgets.Dropdown(
        options=['None', 'Days', 'Hours', 'Minutes', 'Seconds'],
        value='None',
        description='Interval Unit',
        disabled=True  # disabled by default
    )

    time_value = widgets.IntText(
        value=0,
        description='Value:',
        disabled=True  # disabled by default
    )

    # Define a function to be called when stats value changes
    def on_stats_change(change):
        if change['type'] == 'change' and change['name'] == 'value':
            if "Ratio of Data Outside Threshold" in change['new']:
                # enable ratio_threshold if 'Ratio of Data Outside Threshold' is selected
                ratio_threshold.disabled = False
            else:
                # disable ratio_threshold if 'Ratio of Data Outside Threshold' is not selected
                ratio_threshold.disabled = True

            if change['new'][0] != "None":
                # enable interval_type if stats is not None
                interval_type.disabled = False
            else:
                # disable interval_type if stats is None
                interval_type.disabled = True
                interval_type.value = 'None'  # reset interval_type to 'None'

    stats.observe(on_stats_change)

    # Define a function to be called when interval_type value changes
    def on_interval_type_change(change):
        if change['type'] == 'change' and change['name'] == 'value':
            if change['new'] == "None":
                time_units.disabled = True
                time_value.disabled = True
                time_units.value = 'None'  # reset time_units to 'None'
                time_value.value = 0  # reset time_value to 0
            elif change['new'] == "Time":
                time_units.disabled = False
                time_value.disabled = False
            elif change['new'] == "Count":
                time_units.disabled = True
                time_value.disabled = False
            else:
                time_units.disabled = False
                time_value.disabled = False

    interval_type.observe(on_interval_type_change)

    # Display the widgets
    print("Please select a statistic to calculate.")
    display(stats)
    print("Please provide the threshold if 'Ratio of Data Outside Threshold' was selected.")
    display(ratio_threshold)
    print("Please select an interval type to use in the statistic calculation. If count is selected, the interval will correspond to a count of rows. If time is selected, the interval will be a time window.")
    display(interval_type)
    print("If time was selected, please select the unit of time.")
    display(time_units)
    print("Please provide the interval count.")
    display(time_value)

    time_series_df = nbf.remove_columns(time_series_df)
except NameError:
    print("ERROR: Please make sure to run the previous notebook cell before executing this one.")

Please select a statistic to calculate.


SelectMultiple(description='Statistics', index=(0,), options=('None', 'Mean', 'Median', 'Standard Deviation', …

Please provide the threshold if 'Ratio of Data Outside Threshold' was selected.


IntText(value=0, description='Value:', disabled=True)

Please select an interval type to use in the statistic calculation. If count is selected, the interval will correspond to a count of rows. If time is selected, the interval will be a time window.


Dropdown(description='Interval Type', disabled=True, options=('None', 'Count', 'Time'), value='None')

If time was selected, please select the unit of time.


Dropdown(description='Interval Unit', disabled=True, options=('None', 'Days', 'Hours', 'Minutes', 'Seconds'), …

Please provide the interval count.


IntText(value=0, description='Value:', disabled=True)

In [4]:
try:
    %matplotlib inline
    # Convert the 'time' columns to datetime
    try:
        time_series_df['time'] = pd.to_datetime(time_series_df['time'])
        time_series_df = time_series_df.set_index('time')
        time_series_df = time_series_df.sort_index()
    except Exception as e:
        print("")

    metric_func_map = {
        "Mean": nbf.get_mean if "Mean" in stats.value else "",
        "Median": nbf.get_median if "Median" in stats.value else "",
        "Standard Deviation": nbf.get_standard_deviation if "Standard Deviation" in stats.value else "",
        "PDF": nbf.plot_pdf if "PDF" in stats.value else "",
        "CDF": nbf.plot_cdf if "CDF" in stats.value else "",
        "Ratio of Data Outside Threshold": nbf.plot_data_points_outside_threshold if 'Ratio of Data Outside Threshold' in stats.value else ""
    }

    unit_map = {
        "CPU %": "cpuuser",
        "GPU %": "gpu_usage",
        "GB:memused": "memused",
        "GB:memused_minus_diskcache": "memused_minus_diskcache",
        "GB/s": "block",
        "MB/s": "nfs"
    }

    units = nbf.parse_host_data_query(host_data_sql_query, unit_map)  # get units requested in SQL query

    # set up outputs and tabbed layout
    tab = widgets.Tab()
    outputs = {}
    stat_values = []
    basic_stats = ['Mean', 'Median', 'Standard Deviation']

    # Populate the outputs dictionary
    for unit in units:
        outputs[unit] = {}
        if any(stat in stats.value for stat in basic_stats):
            for stat in stats.value + ('Box and Whisker',):
                outputs[unit][stat] = widgets.Output()
        else:
            for stat in stats.value:
                outputs[unit][stat] = widgets.Output()

    # set the tab children
    if any(stat in stats.value for stat in basic_stats):
        tab.children = [widgets.Accordion([widgets.Box([widgets.Label(stat), outputs[unit][stat]]) for stat in stats.value + ('Box and Whisker',)], titles=stats.value + ('Box and Whisker',)) for unit in units]
    else:
        tab.children = [widgets.Accordion([widgets.Box([widgets.Label(stat), outputs[unit][stat]]) for stat in stats.value], titles=stats.value) for unit in units]

    tab.titles = units

    with plt.style.context('fivethirtyeight'):
        unit_stat_dfs = {}
        time_map = {'Days': 'D', 'Hours': 'H', 'Minutes': 'T', 'Seconds': 'S'}
        for unit in units:
            unit_stat_dfs[unit] = {}
            for metric in stats.value:
                metric_df = time_series_df.query(f"`event` == '{unit_map[unit]}'")
                rolling = False

                # Calculate stats
                if interval_type.value == "Time":
                    rolling = True
                    window = f"{time_value.value}{time_map[time_units.value]}"
                elif interval_type.value == "Count":
                    rolling = True
                    window = time_value.value

                # Handle special cases outside the rolling condition
                if metric == "PDF":
                    with outputs[unit][metric]:
                        unit_stat_dfs[unit][metric] = metric_func_map[metric](metric_df)
                    continue
                elif metric == "CDF":
                    with outputs[unit][metric]:
                        unit_stat_dfs[unit][metric] = metric_func_map[metric](metric_df)
                    continue
                elif metric == "Ratio of Data Outside Threshold":
                    with outputs[unit][metric]:
                        unit_stat_dfs[unit][metric] = metric_func_map[metric](ratio_threshold.value, metric_df)
                    continue

                # Only calculate and plot basic stats if rolling is True
                if rolling:
                    unit_stat_dfs[unit][metric] = metric_func_map[metric](metric_df, rolling=True, window=window)

                    # Plot stats
                    with outputs[unit][metric]:
                        unit_stat_dfs[unit][metric].plot()
                        x_axis_label = ""
                        if interval_type.value == "Count":
                            x_axis_label += f"Count - Rolling Window: {time_value.value} Rows"
                        elif interval_type.value == "Time":
                            x_axis_label += f"Timestamp - Rolling Window: {time_value.value}{time_map[time_units.value]}"
                        y_axis_label = unit
                        plt.gcf().autofmt_xdate()  # auto formats datetimes
                        plt.style.use('fivethirtyeight')
                        plt.title(f"{unit} {metric}")
                        plt.legend(loc='upper left', fontsize="10")
                        plt.xlabel(x_axis_label)
                        plt.ylabel(y_axis_label)
                        plt.show()

            # Get the stats dataframes
            df_mean = unit_stat_dfs[unit].get('Mean')
            df_std = unit_stat_dfs[unit].get('Standard Deviation')
            df_median = unit_stat_dfs[unit].get('Median')

            # Plot box and whisker
            if any(df is not None for df in [df_mean, df_std, df_median]):
                with outputs[unit]['Box and Whisker']:
                    nbf.plot_box_and_whisker(df_mean, df_std, df_median)

        display(tab)
except NameError:
    print("ERROR: Please make sure to run the previous notebook cells before executing this one.")

Encountered the following error: 'time'


Tab(children=(Accordion(children=(Box(children=(Label(value='CDF'), Output())),), titles=('CDF',)), Accordion(…

In [None]:
try:
    def on_selection_change(change):
        if len(change.new) > 2:
            correlations.value = change.new[:2]

    def on_button_click(button):
        graph_output.clear_output()
        with graph_output:
            with plt.style.context('fivethirtyeight'):
                display(nbf.calculate_and_plot_correlation(time_series_df, correlations.value))

    correlations = widgets.SelectMultiple(
        options=['None', 'cpuuser', 'gpu_usage', 'nfs', 'block', 'memused', 'memused_minus_diskcache'],
        value=['None'],
        description='Metrics',
        disabled=False
    )

    plot_button = widgets.Button(
        description = "Plot correlation",
        disabled = False,
        icon= "chart-line"
    )
    plot_button.on_click(on_button_click)

    graph_output = widgets.Output()

    container = widgets.VBox(
        [widgets.HBox([correlations, plot_button], layout = widgets.Layout(
            width = "50%",
            justify_content="space-between",
            align_items="center"),),
        graph_output])
    correlations.observe(on_selection_change, names='value')

    # Give the user the option to calculate correlations
    print("Please select two metrics below to find their Pearson correlation:")
    display(container)

except NameError:
    print("ERROR: Please make sure to run the previous notebook cells before executing this one.")
