# Failure Pareto for Test Results
This notebook creates a failure pareto report for test results. It ties into the **Test Monitor Service** for retrieving filtered test results, the **Notebook Execution Service** for running outside of Jupyterhub, and **File Service** to store analysis result.

The parameters and output use a schema recognized by the Test Monitor Reports page, which can be implemented by various report types. The Failure Pareto notebook produces data that is best shown in a pareto chart.

### Imports
Import Python modules for executing the notebook. Pandas is used for building and handling dataframes. Scrapbook is used for recording data for the Notebook Execution Service. The SystemLink Test Monitor Client provides access to test result data for processing.

In [None]:
import copy
import datetime
import os
import pandas as pd
import scrapbook as sb
from dateutil import tz

import matplotlib.pyplot as plt
import systemlink.clients.nitestmonitor as testmon
import systemlink.clients.nifile as nifile

### Parameters
- `result_ids`: IDs of the test results.

Parameters are also listed in the metadata for the parameters cell, along with their default values. The Notebook Execution services uses that metadata to pass parameters from the Test Monitor Reports page to this notebook. To see the metadata, select the code cell and click the wrench icon in the far left panel.

Sample metadata:

```json
{
  "papermill": {
    "parameters": {
      "result_ids": []
    }
  },
  "systemlink": {
    "namespaces": [],
    "parameters": [
      {
        "display_name": "result_ids",
        "id": "result_ids",
        "type": "string[]"
      }
    ],
    "version": 2
  },
  "tags": ["parameters"]
}
```

For more information on how parameterization works, review the [papermill documentation](https://papermill.readthedocs.io/en/latest/usage-parameterize.html#how-parameters-work).

In [None]:
result_ids = []

#### Constants

In [None]:
api_key = os.getenv("SYSTEMLINK_API_KEY")
systemlink_uri = os.getenv("SYSTEMLINK_HTTP_URI")

class ApiUrls:
    QUERY_PRODUCTS_URL = f"{systemlink_uri}/nitestmonitor/v2/query-products"
    UPDATE_PRODUCT_URL = f"{systemlink_uri}/nitestmonitor/v2/update-products"
    UPLOAD_FILE_URL = f"{systemlink_uri}/nifile/v1/service-groups/Default/upload-files"


GROUP_BY = 'Part Number'
PLOT_FILE_NAME = "pareto_graph.png"

### Mapping from grouping options to Test Monitor terminology
Translate the grouping options shown in the Test Monitor Reports page to keywords recognized by the Test Monitor API.

In [None]:
groups_map = {
    'Day': 'started_at',
    'System': 'system_id',
    'Test Program': 'program_name',
    'Operator': 'operator',
    'Part Number': 'part_number',
    'Workspace': 'workspace'
}
grouping = groups_map[GROUP_BY]

### Create Test Monitor client
Establish a connection to SystemLink over HTTP.

In [None]:
results_api = testmon.ResultsApi()
products_api = testmon.ProductsApi()
files_api = nifile.FilesApi()

### Query for results
Query the Test Monitor Service for results matching the `results_filter` parameter.

In [None]:

final_results_filter = ""
for count, result_id in enumerate(result_ids[:-1], start = 1):
    final_results_filter += f'Id == "{result_id}" or '

final_results_filter += f'Id == "{result_ids[-1]}" '
final_results_filter += 'and (status.statusType == "FAILED")'

results_query = testmon.ResultsAdvancedQuery(
    final_results_filter, order_by=testmon.ResultField.STARTED_AT
)

results = []

response = await results_api.query_results_v2(post_body=results_query)
while response.continuation_token:
    results = results + response.results
    results_query.continuation_token = response.continuation_token
    response = await results_api.query_results_v2(post_body=results_query)

results_list = [result.to_dict() for result in results]
workspace_id = results_list[0]['workspace']
part_numbers = [result['part_number'] for result in results_list]

### Get group names
Collect the group name for each result based on the `GROUP_BY` parameter.

In [None]:
group_names = []
for result in results_list:
    if grouping in result:
        group_names.append(result[grouping])

### Create pandas dataframe
Put the data into a dataframe whose columns are test result id, status, and group name.

In [None]:
formatted_results = {
    'id': [result['id'] for result in results_list],
    'status': [result['status']['status_type'] if result['status'] else None for result in results_list],
    grouping: group_names
}

df_results = pd.DataFrame.from_dict(formatted_results)

### Handle grouping by day
If the grouping is by day, the group name is the date and time when the test started in UTC. To group all test results from a single day together, convert to server time and remove time information from the group name.

In [None]:
df_results_copy = copy.copy(df_results)
df_results_copy.fillna(value='', inplace=True)

if grouping == 'started_at':
    truncated_times = []
    for val in df_results_copy[grouping]:
        local_time = val.astimezone(tz.tzlocal())
        truncated_times.append(str(datetime.date(local_time.year, local_time.month, local_time.day)))
    df_results_copy[grouping] = truncated_times

### Aggregate results into groups
Aggregate the data for each unique group and status.

*See documentation for [size](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.size.html) and [unstack](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) here.*

In [None]:
df_grouped = df_results_copy.groupby([grouping, 'status']).size().unstack(fill_value=0)
if 'PASSED' not in df_grouped:
    df_grouped['PASSED'] = 0
if 'FAILED' not in df_grouped:
    df_grouped['FAILED'] = 0
if 'ERRORED' not in df_grouped:
    df_grouped['ERRORED'] = 0

### Failure Pareto calculation
Count the number of test failures and calculate cumulative values for the pareto.

In [None]:
df_fail_count = pd.DataFrame(df_grouped['FAILED'] + df_grouped['ERRORED'])
if grouping != 'started_at':
    df_fail_count.sort_values(by=[0], ascending=False, inplace=True)
total = df_fail_count[0].sum()
pareto_values = []
cumulative = 0
for data_member in df_fail_count[0]:
    cumulative += data_member
    pareto_values.append(100 * (cumulative / total))

df_pareto = df_fail_count.reset_index().set_axis([grouping, 'fail_count'], axis=1)
df_pareto['cumulative'] = pareto_values

if grouping == 'started_at':
    df_pareto['started_at'] = pd.to_datetime(df_pareto['started_at'])

In [None]:
df_pareto[grouping].replace(r'^$', 'No ' + GROUP_BY, regex=True, inplace=True)

df_dict = {
    'columns': pd.io.json.build_table_schema(df_pareto, index=False)['fields'],
    'values': df_pareto.values,
}

pareto_graph = {
    "type": 'data_frame',
    'id': 'failure_pareto_results_graph',
    'data': df_dict,
    'config': {
        'title': 'Failure Pareto - Results by {}'.format(GROUP_BY),
        'graph': {
            'axis_labels': [GROUP_BY, 'Failure Count', 'Cumulative %'],
            'plots': [
                {'x': grouping, 'y': 'fail_count', 'style': 'BAR', 'GROUP_BY': [grouping]},
                {'x': grouping, 'y': 'cumulative', 'secondary_y': True, 'style': 'LINE'}
            ],
            'orientation': 'VERTICAL'
        }
    }
}

### Generate plot PNG

In [None]:
# Extract data from the pareto_graph dictionary
title = pareto_graph['config']['title']
axis_labels = pareto_graph['config']['graph']['axis_labels']
plots = pareto_graph['config']['graph']['plots']

# Plot data
fig, ax1 = plt.subplots()

for plot in plots:
    if plot['style'] == 'BAR':
        ax1.bar(
            df_pareto[plot['x']], 
            df_pareto[plot['y']], 
            label=plot['x'])
        ax1.set_ylabel(axis_labels[1])
        ax1.set_xlabel(axis_labels[0])
    elif plot['style'] == 'LINE':
        ax2 = ax1.twinx()
        ax2.plot(df_pareto[plot['x']], df_pareto[plot['y']], label=plot['x'], color='r')
        ax2.set_ylabel(axis_labels[2])

# Title and legend
plt.title(title)
plt.legend()
ax1.tick_params(axis='x', labelrotation = 90)

# Save as PNG file
plt.savefig(PLOT_FILE_NAME, bbox_inches='tight')

# Show the plot (optional)
plt.show()



### Upload Plot to files service and link to products

In [None]:
async def upload_file(file_name):
    response = await files_api.upload(file_name)

    return response

async def get_products(part_numbers):
    query_filter = ""
    for part_number in part_numbers[:-1]:
        query_filter += f'partNumber == "{part_number}" or '

    query_filter += f'partNumber == "{part_numbers[-1]}"'
    query_body = {"filter": query_filter}
    response = await products_api.query_products_v2(post_body=query_body)

    return response.products

async def add_file_id_to_products(part_numbers, file_id):
    products = await get_products(part_numbers)
    for product in products:
        product.file_ids.append(file_id)
    body = {"products": products, "replace": False}
    response = await products_api.update_products_v2(request_body=body)

    return response

In [None]:
upload_file_response = await upload_file(file_name=PLOT_FILE_NAME)
uploaded_file_id = upload_file_response.uri.split("/")[-1]
product_response = await add_file_id_to_products(
    part_numbers, file_id=uploaded_file_id
)
product_id = product_response.products[0].id

### Record results with Scrapbook

In [None]:
sb.glue(
    "The resultant failure pareto analysis is uploaded as an image",
    f'<a href="../../testinsights/products/product/{product_id}/files">Link to image</a>',
)

### Next Steps

1. Publish this notebook to SystemLink by right-clicking it in the JupyterLab File Browser with the interface as Test Data Analysis.
1. Manually Analyze the results inside results grid by clicking analyze button.