# Throughput
This notebook creates a throughput report for test results. It ties into the **Test Monitor Service** for retrieving filtered test results, the **Notebook Execution Service** for running outside of Jupyterhub, and the **Test Monitor Reports page** at #testmonitor/reports for displaying results.

The parameters and output use a schema recognized by the Test Monitor Reports page, which can be implemented by various report types. The Throughput notebook produces data that is best shown in a bar graph.

### Imports
Import Python modules for executing the notebook. Pandas is used for building and handling dataframes. Scrapbook is used for recording data for the Notebook Execution Service. The SystemLink Test Monitor Client provides access to test result data for processing.

In [1]:
import copy
import datetime
import pandas as pd
import scrapbook as sb
from dateutil import tz

import systemlink.clients.nitestmonitor as testmon

### Parameters
- `results_filter`: Dynamic Linq query filter for test results from the Test Monitor Service  
  Options: Any valid Test Monitor Results Dynamic Linq filter  
  Default: `'startedWithin <= "30.0:0:0"'`
- `group_by`: The dimension along which to reduce; what each bar in the output graph represents  
  Options: Day, System, Test Program, Operator, Part Number  
  Default: Day

Parameters are also listed in the metadata for the parameters cell, along with their default values. The Notebook Execution services uses that metadata to pass parameters from the Test Monitor Reports page to this notebook. Available `group_by` options are listed in the metadata as well; the Test Monitor Reports page uses these to validate inputs sent to the notebook.

To see the metadata, select the code cell and click the wrench icon in the far left panel.

In [2]:
results_filter = 'startedWithin <= "30.0:0:0"'
products_filter = ''
group_by = 'Day'

### Mapping from grouping options to Test Monitor terminology
Translate the grouping options shown in the Test Monitor Reports page to keywords recognized by the Test Monitor API.

In [3]:
groups_map = {
    'Day': 'started_at',
    'System': 'host_name',
    'Test Program': 'program_name',
    'Operator': 'operator',
    'Part Number': 'part_number',
    'Workspace': 'workspace',
}
grouping = groups_map[group_by]

### Create Test Monitor client
Establish a connection to SystemLink over HTTP.

In [4]:
results_api = testmon.ResultsApi()

### Query for results
Query the Test Monitor Service for results matching the `results_filter` parameter.

In [5]:
results_query = testmon.ResultsAdvancedQuery(
    results_filter, product_filter=products_filter, order_by=testmon.ResultField.STARTED_AT
)

results = []

response = await results_api.query_results_v2(post_body=results_query)
while response.continuation_token:
    results = results + response.results
    results_query.continuation_token = response.continuation_token
    response = await results_api.query_results_v2(post_body=results_query)

results_list = [result.to_dict() for result in results]

### Get group names
Collect the group name for each result based on the `group_by` parameter.

In [6]:
group_names = []
for result in results_list:
    if grouping != "host_name":
        if grouping in result:
            group_names.append(result[grouping])
    else:
        if 'properties' in result:
            group_names.append(result['properties']['Location'])       

### Create pandas dataframe
Put the data into a dataframe whose columns are test result id, start time, and group name.

In [7]:
formatted_results = {
    'id': [result['id'] for result in results_list],
    'started_at': [result['started_at'] for result in results_list],
    grouping: group_names
}

df_results = pd.DataFrame.from_dict(formatted_results)

#convert UTC timezone to local timezone
to_zone = tz.tzlocal()
utc = df_results['started_at']
def astimezone(x):
    return x.astimezone(to_zone)
# Convert time zone
central = utc.apply(astimezone)
df_results = pd.concat([df_results.drop(['started_at'],axis=1), central],axis=1)


### Handle grouping by day
The start time of each test is composed of both the date and the time when the test started in UTC. To group all test results from a single day together, convert to server time and remove time information from the group name.

In [8]:
df_results_copy = copy.copy(df_results)
df_results_copy.fillna(value='', inplace=True)

truncated_times = []
for val in df_results_copy['started_at']:
    local_time = val.astimezone(tz.tzlocal())
    truncated_times.append(str(datetime.datetime(local_time.year, local_time.month, local_time.day,local_time.hour)))
df_results_copy['started_at'] = truncated_times

display(df_results_copy)

Unnamed: 0,id,started_at
0,6088b393ca744a435c3bc221,2021-04-28 10:00:00
1,6088b393ca744a435c3bc227,2021-04-28 10:00:00
2,6088b393ca744a435c3bc22d,2021-04-28 10:00:00
3,6088b394ca744a435c3bc233,2021-04-28 10:00:00
4,6088b394ca744a435c3bc239,2021-04-28 10:00:00
...,...,...
4095,608b5846ca744a435c3da479,2021-04-30 11:00:00
4096,608b5846ca744a435c3da47f,2021-04-30 11:00:00
4097,608b5849ca744a435c3da491,2021-04-30 11:00:00
4098,608b5849ca744a435c3da493,2021-04-30 11:00:00


### Throughput calculation
Get the number of unique tests for each group.

In [9]:
df_throughput = df_results_copy.groupby(grouping).agg({'id': 'count'})
df_throughput = df_throughput.reset_index().set_axis([grouping, 'throughput'], axis=1)

if grouping == 'started_at':
    df_throughput['started_at'] = pd.to_datetime(df_throughput['started_at'])
else:
    df_throughput.sort_values(by=['throughput'], ascending=True, inplace=True)

### Convert the dataframe to the SystemLink reports output format
The result format for a SystemLink report consists of a list of output objects as defined below:
- `type`: The type of the output. Accepted values are 'data_frame' and 'scalar'.
- `id`: Corresponds to the id specified in the 'output' metadata. Used for returning multiple outputs with the 'V2' report format.
- `data`: A dict representing the 'data_frame' type output data.
    - `columns`: A list of dicts containing the names and data type for each column in the dataframe.
    - `values`: A list of lists containing the dataframe values. The sublists are ordered according to the 'columns' configuration.
- `value`: The value returned for the 'scalar' output type.
- `config`: The configurations for the given output.
    - `title`: The output title.
    - `graph`: The graph configurations.
        - `axis_labels`: The x-axis label and y-axis label.
        - `plots`: A list of plots to display mapped from the dataframe's columns, along with configuration options.
            - `x`: The dataframe column corresponding to the x-axis values.
            - `y`: The dataframe column corresponding to the y-axis values.
            - `style`: The plot's style. Accepted values are ['LINE', 'BAR', 'SCATTER'].
            - `color`: The plot's color. Accepted formats are ['blue', '#0000ff', 'rbg(0,0,255)'].
            - `label`: The plot's name, to be shown in a plot legend. 
            - `secondary_y`: Whether or not to display this plot on a second y-axis.
            - `group_by`: A list of columns in the dataframe on which to group data, e.g. to color individual points.
        - `orientation`: 'HORIZONTAL' or 'VERTICAL'.
        - `stacked`: Whether or not to display the plots stacked on top of each other.

Here is an example of a notebook result with two outputs, one of which is a dataframe with two columns, and the other is a scalar value:
```
[{
    'type': 'data_frame',
    'id': 'output_id_1',
    'data': {
        'columns': [
            {'name': 'time', 'type': 'datetime'},
            {'name': 'value', 'type': 'number'}
         ],
        'values': [
            ['2020-09-29T00:00:00.000Z', 46.1538461538],
            ['2020-09-30T00:00:00.000Z', 63.1578947368],
            ...
         ]
    },
    'config': {
        'title': 'My Title',
        'graph': {
            'axis_labels': ['X Axis', 'Y Axis'],
            'orientation': 'VERTICAL',
            'plots': [
                {'x': 'time', 'y': 'value', 'style': 'BAR', 'color': '#0000ff', 'label': 'Plot 1'}
            ]
        }
    }
}, {
    'type': 'scalar',
    'id': 'output_id_2',
    'config': {
        'title': 'My Title'
    },
    'value': 5
}]
```

For this report, there is one output, which is a dataframe with two columns. For a grouping of 'Day', the first column contains ISO-8601 date strings. For any other grouping option, the first column contains categorical string values. The second column contains numerical values representing the throughput.

| started_at                 | throughput    |
|----------------------------|---------------|
| '2020-09-29'               | 23            |
| '2020-09-30'               | 45            |
| '2020-10-01'               | 30            |

The graph configuration specifies a single plot, where the x-axis is the group values and the y-axis is the throughput. We use Pandas to convert the dataframe built in the previous cells into a tabular format and then return that with the result object.

In [10]:
df_throughput[grouping].replace(r'^$', 'No ' + group_by, regex=True, inplace=True)

df_dict = {
    'columns': pd.io.json.build_table_schema(df_throughput, index=False)['fields'],
    'values': df_throughput.values.tolist(),
}

throughput_graph = {
    'type': 'data_frame',
    'id': 'throughput_graph',
    'data': df_dict,
    'config': {
        'title': 'Throughput by {}'.format(group_by),
        'graph': {
            'axis_labels': [group_by, 'Throughput'],
            'plots': [
                {'x': grouping, 'y': 'throughput', 'style': 'BAR'}
            ]
        }
    }
}

if grouping == 'started_at':
    throughput_graph['config']['graph']['orientation'] = 'VERTICAL'
else:
    throughput_graph['config']['graph']['orientation'] = 'HORIZONTAL'
    throughput_graph['config']['graph']['plots'][0]['group_by'] = [grouping]

result = [throughput_graph]
display(result)

[{'type': 'data_frame',
  'id': 'throughput_graph',
  'data': {'columns': [{'name': 'started_at', 'type': 'datetime'},
    {'name': 'throughput', 'type': 'integer'}],
   'values': [[Timestamp('2021-04-28 10:00:00'), 8],
    [Timestamp('2021-04-28 11:00:00'), 160],
    [Timestamp('2021-04-28 12:00:00'), 537],
    [Timestamp('2021-04-28 13:00:00'), 248],
    [Timestamp('2021-04-28 14:00:00'), 360],
    [Timestamp('2021-04-28 15:00:00'), 82],
    [Timestamp('2021-04-28 16:00:00'), 445],
    [Timestamp('2021-04-28 17:00:00'), 350],
    [Timestamp('2021-04-28 18:00:00'), 479],
    [Timestamp('2021-04-28 19:00:00'), 204],
    [Timestamp('2021-04-28 20:00:00'), 511],
    [Timestamp('2021-04-28 21:00:00'), 486],
    [Timestamp('2021-04-28 22:00:00'), 80],
    [Timestamp('2021-04-29 14:00:00'), 1],
    [Timestamp('2021-04-30 09:00:00'), 126],
    [Timestamp('2021-04-30 10:00:00'), 16],
    [Timestamp('2021-04-30 11:00:00'), 6],
    [Timestamp('2021-05-05 12:00:00'), 1]]},
  'config': {'title': 

### Record results with Scrapbook

In [11]:
sb.glue('result', result)