# FRESCO Anvil Database Access and Analytics Notebook

## Overview

The FRESCO Analytics Notebook is tailored for effortless analysis of the Anvil dataset. It offers functionalities to:

- Extract filtered data from the Anvil database.
- Conduct statistical analyses on the filtered data.
- Visualize the results of the analyses.

The notebook is structured into three main sections:

### 1. Data Filtering
Define your analysis scope by selecting a specific datetime window. Customize your dataset further with various filters.

### 2. Data Analysis Options
Explore a range of analysis options and select the ones that align with your requirements.

### 3. Data Analysis and Visualizations
Execute the chosen analysis on the filtered dataset and visualize the outcomes.

## Step-by-Step Instructions

- **Cell 1:** Define the dataset's temporal boundaries. This will guide the extraction of relevant host time series and job accounting data. Additional conditions can be added for more refined data filtering.

- **Cell 2:** Configure your statistical analyses by:
  - Choosing statistics (e.g., Mean, Median).
  - Setting a threshold for "Ratio of Data Outside Threshold".
  - Selecting an interval type (Count or Time). If "Time" is chosen, define the time unit and interval count.

- **Cell 3:** Analyze the time series data based on Cell 2 configurations and generate visualizations.

## Database Table Information

### Host Table
- **jid**: Unique job identifier
- **host**: Origin node of the data point
- **event**: Resource usage metric type
- **value**: Numeric value of the metric
- **unit**: Measurement unit of the metric
- **time**: Timestamp of the data point


**Event Column Metrics:**
- **cpuuser:** CPU user mode average percentage.
- **block:** Data transfer rate to/from block devices.
- **memused:** OS's total physical memory usage.
- **memused_minus_diskcache:** Physical memory usage excluding caches.
- **gpu_usage:** GPU active time average percentage (only for GPU jobs).
- **nfs:** Data transfer rate over NFS mounts.

### Job Table
- **account**: Account or project name
- **jid**: Unique job identifier
- **ncores**: Total cores assigned to the job
- **ngpus**: Total GPUs assigned to the job
- **nhosts**: Number of nodes assigned to the job
- **timelimit**: Requested job duration (in seconds)
- **queue**: Job submission queue name
- **end_time**: Job end time
- **start_time**: Job start time
- **submit_time**: Job submission time
- **username**: Job owner's name
- **exitcode**: Job's exit status
- **host_list**: List of nodes the job ran on
- **jobname**: Job's name

In [1]:
from notebook_utilities import NotebookUtilities

nu = NotebookUtilities()

GridBox(children=(VBox(children=(HTML(value='<h1>Query the Host Data Table</h1>'), HTML(value='<h4>Select star…




In [None]:
nu.display_statistics_widgets()

In [None]:
%matplotlib inline
nu.display_plots()

In [None]:
nu.pearson_correlation()