# Test edge configuration package locally

In this notebook, the main goal is to test the Edge Package which we created in notebook [30-CreatePipelinePackage](30-CreatePipelinePackage.ipynb).
For this purpose we use class `LocalPipelineRunner` from module `simaticai.testing.pipeline_runner`.

To execute this notebook, we will need:
- State-Identifier-edge_1.zip created in notebook [30-CreatePipelinePackage](30-CreatePipelinePackage.ipynb)
- training data set which was used in notebook [10-CreateClusteringModel](10-CreateClusteringModel.ipynb)

The `LocalPipelineRunner` object takes the edge configuration package and extracts its components.
Once the components are extracted, you can run them individually by calling `run_component` with component name and structured input data.
The method builds a `venv` Python virtual environment for running the component and installs the required dependencies listed in `requirements.txt` for the component.
Once the virtual python environment is ready, the method executes and feeds the component with your test data.
The result will be the list of outputs of your component. If your component does not always produce an output, as is the case with a preprocessor aggregating a windowful of data, the output list will be shorter than the input list.

### Imports  

In [None]:
import glob
import pandas
from pathlib import Path

from simaticai.testing.pipeline_runner import LocalPipelineRunner

### Define package to test

In [None]:
package_path = Path('../packages/State-Identifier-edge_1.zip')

## Define a dataset to test the package
The goal here is to create a list of input data which the `run(..)` method will be triggered with.  
For this reason, we read our training data from the original csv file and build a list with dictionaries of the 'ph1', 'ph2', 'ph3' values.

In [None]:
data_path = "../data"
csv_files = glob.glob(f"{data_path}/**/*.csv")
pandas.DataFrame(data=csv_files)

In [None]:
input_series = pandas.read_csv(csv_files[0])  # read test data from the same csv file we used to train the model
input_list = input_series[['ph1', 'ph2', 'ph3']].to_dict(orient='records')  # creating a list of dictionaries as the `run(..)` method receives them
input_list[:10]

## Test the pipeline configuration package
To do so we instantiate a `LocalPipelineRunner` object with the path of our configuration package, and a directory where we want to check the results.
This directory will contain both the extracted component and the created python virtual environment.
If the directory is not defined, a temporary directory is created and deleted after testing.

In [None]:
test_dir = Path('../test')

with LocalPipelineRunner(package_path, test_dir) as pipelineRunner:
    # uncomment the following line to try the pipeline with non-default parameters
    # pipelineRunner.update_parameters({"step_size": 100})     
    outputs = pipelineRunner.run_pipeline(input_list[:1200])  # test with the first 1200 records which form 4 windows of data

## Check the results
The method returns with the calculated results where the result is not None, so we have a list with 4 results.

In [None]:
outputs

Since we instantiated the runner with an explicitly specified, existing working directory, the full result list is also available in file 'output.joblib' and can be read into a pandas `DataFrame`.

In [None]:
import joblib
results = joblib.load("../test/State-Identifier-edge_1/inference/output.joblib")

results[297:302]