# DataSpace Analysis Example

This notebook will analyze the parametric data in a dataspace to calculate statistics.

### Imports

Import Python modules for executing the notebook. The ni_data_space_analyzer is used for performing the analysis. Pandas is used for building and handling dataframe. Scrapbook is used for running notebooks and recording data for the SystemLink Notebook Execution Service.

In [1]:
import pandas as pd
import scrapbook as sb

from ni_data_space_analyzer import DataSpaceAnalyzer
from ni_data_space_analyzer.exception import DataSpaceAnalyzerError

### Parameters

#### Channels

Channel will contain the `artifact_id` in the following format
```json
{
  "artifact_id": "<your artifact id>"
}
````

Example:
```json
{
    "artifact_id": "ec25561d-6509-49e5-9a78-30e9752733fe"
}
````

#### Analysis Options

The `analysis_options` is a list of analyses that the Notebook should perform.

### WorkSpace Id

The `workspace_id` is the ID of the workspace where the artifact data is located.

### Metadata

These are the parameters that the notebook expects to be passed in by SystemLink. For notebooks designed to be perform analysis inside a dataspace, must tag the cell with 'parameters' and at minimum specify the following in the cell metadata using the JupyterLab Property Inspector (double gear icon):

```json
{
  "papermill": {
    "parameters": {
      "analysis_options": [],
      "channels": {"artifact_id": "<artifact_id>"},
      "workspace_id": ""
    }
  },
  "systemlink": {
    "interfaces": [],
    "outputs": [
      {
        "display_name": "Min",
        "id": "min",
        "type": "scalar"
      },
      {
        "display_name": "Max",
        "id": "max",
        "type": "scalar"
      },
      {
        "display_name": "Mean",
        "id": "mean",
        "type": "scalar"
      },
      {
        "display_name": "2 STD",
        "id": "2std",
        "type": "scalar"
      },
      {
        "display_name": "-2 STD",
        "id": "-2std",
        "type": "scalar"
      },
      {
        "display_name": "Moving Mean",
        "id": "moving_mean",
        "type": "vector"
      },
      {
        "display_name": "CP",
        "id": "cp",
        "type": "vector"
      },
      {
        "display_name": "CPK",
        "id": "cpk",
        "type": "vector"
      }
    ],
    "parameters": [
      {
        "display_name": "Channels",
        "id": "channels",
        "type": "dict[string, string]"
      },
      {
        "display_name": "Analysis Options",
        "id": "analysis_options",
        "type": "string[]"
      },
      {
        "display_name": "Workspace ID",
        "id": "workspace_id",
        "type": "string"
      }
    ],
    "version": 2
  },
  "tags": ["parameters"]
}
````

For more information on how parameterization works, review the [papermill documentation](https://papermill.readthedocs.io/en/latest/usage-parameterize.html#how-parameters-work).


In [2]:
channels = {"artifact_id": "<artifact_id>"}
analysis_options = []
workspace_id = ""

### Supported Input analysis options and their output types

1. Mean: The central value of the data set.
2. 2 STD: Two standard deviations from the mean.
3. -2 STD: Negative two standard deviations from the mean.
4. Min: The minimum value in the data set.
5. Max: The maximum value in the data set.
6. Moving Mean: The central value of the most recent X data points.
7. Cpk: The process capability index. Describes the ability of a process to provide output that will be within the required specifications consistently.
8. Cp: The process capability. The process capability is a measure of the potential for a process to provide output that is within upper and lower specification limits.

In [3]:
supported_analysis = [
    {"id": "min", "type": "scalar"},
    {"id": "max", "type": "scalar"},
    {"id": "mean", "type": "scalar"},
    {"id": "2std", "type": "scalar"},
    {"id": "-2std", "type": "scalar"},
    {"id": "moving_mean", "type": "vector"},
    {"id": "cp", "type": "vector"},
    {"id": "cpk", "type": "vector"},
]

supported_analysis_options = list(map(lambda x: x["id"], supported_analysis))

### Utility Functions

#### Validating Analysis options

In [4]:
def validate_analysis_options(analysis_options) -> None:
    analysis_options = list(map(str.strip, analysis_options))

    invalid_options = list(set(analysis_options) - set(supported_analysis_options))

    if invalid_options:
        raise DataSpaceAnalyzerError(
            "The analysis failed because the following options are not supported: {0}.".format(
                ", ".join(invalid_options)
            )
        )

#### Analyzing channel data

In [6]:
def analyze_channel_data(channel_data: pd.DataFrame) -> pd.DataFrame:
    data_space_analyzer = DataSpaceAnalyzer(dataframe=channel_data)

    for option in analysis_options:
        if option == "min":
            data_space_analyzer.compute_min()
        elif option == "max":
            data_space_analyzer.compute_max()
        elif option == "mean":
            data_space_analyzer.compute_mean()
        elif option == "2std":
            data_space_analyzer.compute_2std()
        elif option == "-2std":
            data_space_analyzer.compute_negative_2std()
        elif option == "moving_mean":
            data_space_analyzer.compute_moving_mean()
        elif option == "cp":
            data_space_analyzer.compute_cp()
        elif option == "cpk":
            data_space_analyzer.compute_cpk()

    return data_space_analyzer.generate_analysis_output(
        analysis_options=analysis_options, supported_analysis=supported_analysis
    )

### Validating and Analyzing Channels

In [7]:
analysis_options = list(map(str.lower, analysis_options))
final_result = []

try:
    validate_analysis_options(analysis_options)
    data_space_analyzer = DataSpaceAnalyzer(pd.DataFrame())
    channels = data_space_analyzer.load_dataset(channels)

    for channel in channels:
        channel_name = channel["name"]
        channel_data = channel["data"]

        analysis_results = analyze_channel_data(channel_data)
        
        final_result.append({"plot_label": channel_name, "data": analysis_results})
    
    artifact_id = data_space_analyzer.save_analysis(workspace_id, final_result)

except DataSpaceAnalyzerError as e:
    raise Exception(e) from None

### Store the result information so that SystemLink can access it

SystemLink uses scrapbook to store result information from each notebook execution to display to the user in the Execution Details slide-out.
   

In [None]:
sb.glue("result", artifact_id)

#### Sample Output format

```json
{
    artifact_id: "ec25561d-6509-49e5-9a78-30e9752733fe"
}
```

`artifact_id` - The ID of the artifact file where the output data is compressed and stored.

# Next Steps

1. Publish this notebook to SystemLink by right-clicking it in the JupyterLab File Browser with the interface as DataSpace Analysis.
1. Manually Analyze the parametric data inside the dataspace by clicking analyze button.
   