# Experimenting with Python

This notebook provides a light example for running an experiment for a natural language to Python scenario. The experiment passes a dataset through some sample components that each serves a different part of the process and updates the data model.

In [None]:
import os
import sys
import json
from pathlib import Path
import traceback
from ffmodel.core import orchestrator

## Configs

This section captures the experimentation configs.

- `experiment_name`: the name of this experiment
- `solution_config`: the path to an solution configuration yaml file describing a solution that we'd like to experiment with
- `environment_config_path`: the path to an environment configuration yaml. Follow the instructions captured [here](../../docs/guides/environment_configs.md)
- `experiment_output_path`: the path to store the output from the experiment

In [None]:
experiment_name = "nl2python"
solution_config_path = "./nl2python_solution.yaml"
environment_config_path = "~/.ffmodel"

## Setup few shot bank

This experiment makes use of the `few_shot_embedding` pre-processor.
This component looks at a set of example input, output pairs and picks the ones that should be most relevant.
The selection is done based on the similarity between embeddings of the inputs.

The input to this component is a few shot file, which is a pickle file that contains that input, output pairs, as well as a pre-computed embedding for each input.
This way, at inference time only the embedding for the new input needs to be calculated.

The `few_shot_embedding` component contains a static method for helping create the few shot file.
In this example, the source input file can be found in `sample_datasets/nl2python_fewshot_dataset.jsonl`.

In [None]:
# Add the path for the top level root to enable components import
sys.path.insert(0, "../..")

if not os.path.exists("sample_datasets/nl2python_fewshot_dataset_text-embedding-ada-002.pkl"):
    from ffmodel.core.environment_config import EnvironmentConfigs
    from components.pre_processors.few_shot_embedding import Component

    EnvironmentConfigs.initialize(environment_config_path)
    Component.create_few_shot_file("sample_datasets/nl2python_fewshot_dataset.jsonl", "text-embedding-ada-002")

## Execute Local Experiment

With the solution config, environment config, and evaluation dataset defined, we can now run an experiment. In this notebook, we're running an experiment on your local machine. FFModel will refer to Azure Machine Learning to fetch the evaluation dataset we prepared previously, but all remaining steps (besides the inference on Azure OpenAI in the model caller step) will run in the context of your local machine.

In [None]:
from ffmodel.core import orchestrator

data_models = orchestrator.execute_experiment_on_local(
    solution_config_path, environment_config_path
)

## Analyze Experiment Results

With the experiment complete, we can now analyze the results. FFModel experiments run on data models, which hold the state of any given experiment request as it runs through the solution defined by the solution config. We can analyze the result in our data models below:

In [None]:
print(f"Number of data models returned: {len(data_models)}")

In [None]:
print(data_models[0])

Since our solution configuration included a writer component, we can also retrieve the aggregated experiment results across all the data models. For local experiments, outputs get written locally to the path designated in the solution config (note that the path is appended with a date-time stamp to differentiate between different runs). An example output path is included in the cell below and visualized as a data frame of results (note: ignore any AttributeErrors that might be thrown):

In [None]:
import os
import pandas as pd

# Update the Writer Output Path with the most-recently written file in
# Note: Update the file name with the name of your output file
experiment_results_output_path = (
    "outputs/nl2python_experiment_results-20230613-180911.jsonl"
)

# Display experiment results
experiment_results = pd.read_json(experiment_results_output_path, lines=True)
experiment_results

## Deployment

If you are happy with the performance of the solution, continue to [`ExampleDeploymentAML.ipynb`](./ExampleDeploymentAML.ipynb) to create a managed endpoint hosting the solution for external consumption.