# Example PrivacyFingerprint Experiment Pipeline

This notebook walks you through a potential experimental workflow, to introduce a user to how they can use the ExperimentalConfigHandler to test out various experimental workflows.

This pipeline will look at the difference in extraction outputs when using:
- GliNER
- UniversalNER hosted locally
- UniversalNER hosted via Ollama.

In [None]:
import os
import sys

path_root = os.path.dirname(os.getcwd())

if path_root not in sys.path:
    sys.path.append(path_root)

### Importing necessary functions from ./src folder

In [None]:
# Functions needed for the experimental config handler.
from src.config.experimental_config_handler import ExperimentalConfigHandler
from src.config.global_config import load_global_config

# Functions needed to create prompt templates and save them for the experiments.
from src.config.prompt_template_handler import (
    save_extraction_template_to_json,
    save_generate_template_to_json,
    load_and_validate_extraction_prompt_template,
    load_and_validate_generate_prompt_template,
)

### Importing and Loading Global and Experimental Config

**global_config_path**  this is the location of the global config path and then the output folder name is redefined to ensure the example experiments are out in the open. (Normally the default output folder should be used for your own experiments.)

**default_config_path** is given so the user can point to the default experimental config values. Currently the pipeline copies the original experimental config down into the folder, and if this exists, only uses the experimental config defined in that folder.

In [None]:
# Defines the location of the experimental config file you want to copy.
global_config_path = "../config/global_config.yaml"
global_config = load_global_config(global_config_path)
global_config.output_paths.output_folder = "../example_output"

# Defines the location of the default experimental config file you want the experimental config to have defaults using.
default_config_path = "../config/experimental_config.yaml"

Then you need to define you overrides: These are both optional parameters if you do not define any, the pipeline will run with the defaults provided in the experimental pipeline.


* **iter_overrides**: These are parameters where you want all unique parameters in each list defined to be combined together. 

    * For example: if "var1" = ["a", "b"] and var2 = ["c", "d"]. It would produces combinations = [["a","c"], ["a", "d"], ["b", "c"], ["b","d"]]



* **combine_overrides**: These are parameters where you want to combine two various parameters and only affect extraction.gliner_features, extraction.ollama_features, and extraction.local_features. 

    * For example: if "model_name" = ["model1", "model2"] and "prompt_template" = ["promp_template1","prompt_template2"]. It would produce combinations [["model1", "prompt_template1"]["model2", "prompt_template2"]]
    * The values defined in each list need to be the same length otherwise you will get an error. 



You will **NEED** to define your experiment name as this defines where your experiment folder should sit.



In [None]:
# Define your iter overrides.
iter_overrides = {
    "outputs.experiment_name": "example_experiment_17_06_24",
    "extraction.server_model_type": ["gliner", "ollama", "local"],
}

# Define your combine overrides.
combine_overrides = {
    "generate.llm_model_features.llm_model_name": ["llama2", "llama3"],
    "generate.llm_model_features.prompt_template_path": [
        "llama2_template.json",
        "llama3_template.json",
    ],
}

# This initialises the experimental config handler.
config_handler = ExperimentalConfigHandler(
    default_config_path=default_config_path,
    iter_overrides=iter_overrides,
    combine_overrides=combine_overrides,
    global_config=global_config,
)

# This prints the config structures to the Users
print("---- SyntheaConfig ----")
for config in config_handler.load_component_experimental_config("synthea"):
    print(config)

print("\n---- GenerateConfig ----")
for config in config_handler.load_component_experimental_config("generate"):
    print(config)

print("\n---- ExtractionConfig ----")
for config in config_handler.load_component_experimental_config("extraction"):
    print(config)

## 1. Create Prompt Templates used in the Experimental Pipeline.

This pipeline uses three prompt templates that are required.
* Uses a prompt template for **LLama2** which is used to generate the medical notes.
* Uses a prompt template for **LLama3** which is used to generate the medical notes.
* Uses a prompt template for UniversalNER which is used to extract the entities via both local and ollama.

The below three cells define these templates and save them to a templates folder.

### LLama2

In [None]:
# Defines the path of where Llama2 template lives in the generate folder.
generate_template_path = (
    f"{global_config.output_paths.generate_template}llama2_template.json"
)

# This defines a template used by LLama2
generate_template = """[INST]
<<SYS>>
You are a medical student answering an exam question about writing clinical notes for patients.
<</SYS>>

Keep in mind that your answer will be accessed based on incorporating all the provided information and the quality of prose.

1. Use prose to write an example clinical note for this patient's doctor.
2. Use less than three sentences.
3. Do not provide recommendations.
4. Use the following information:

{data}
[/INST]
"""

# Saves the template to the path defined.
save_generate_template_to_json(
    template_str=generate_template, file_path=generate_template_path
)

# Loads the template so the user can inspect the template saved.
loaded_generate_template = load_and_validate_generate_prompt_template(
    filename=generate_template_path
)
print(loaded_generate_template)

### LLama3

In [None]:
# Defines the path of where Llama3 template lives in the generate folder.
generate_template_path = (
    f"{global_config.output_paths.generate_template}llama3_template.json"
)

# This defines a template used by LLama3
generate_template = """<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
You are a medical student answering an exam question about writing clinical notes for patients.
<|eot_id|>

<|start_header_id|>user<|end_header_id|>
Keep in mind that your answer will be accessed based on incorporating all the provided information and the quality of prose.

1. Use prose to write an example clinical note for this patient's doctor.
2. Use less than three sentences.
3. Do not provide recommendations.
4. Use the following information:

{data}
<|eot_id|>

<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

# Saves the template to the path defined.
save_generate_template_to_json(
    template_str=generate_template, file_path=generate_template_path
)

# Loads the template so the user can inspect the template saved.
loaded_generate_template = load_and_validate_generate_prompt_template(
    filename=generate_template_path
)
print(loaded_generate_template)

### UniversalNER

In [None]:
extraction_template_path = f"{global_config.output_paths.extraction_template}universal_ner_template.json"

universalner_prompt_template = """
    USER: Text: {input_text}
    ASSISTANT: I’ve read this text.
    USER: What describes {entity_name} in the text?
    ASSISTANT: (model's predictions in JSON format)
    """

save_extraction_template_to_json(
    template_str=universalner_prompt_template,
    file_path=extraction_template_path,
)

loaded_extraction_template = load_and_validate_extraction_prompt_template(
    filename=extraction_template_path
)
print(loaded_extraction_template)

## 2. GenerateSynthea: Generating Synthetic Patient Data using Synthea 

This extracts out all of the synthea defined configuration and then runs the configuration through the pipeline and saves the data to an ./example_output/example_experiment_name folder.

In [None]:
config_handler.run_component_experiment_config(component_type="synthea")

## 3. GenerateLLM: Generating Synthetic Patient Medical Notes 

This extracts out all of the generate defined configuration and then runs the configuration through the pipeline and saves the data to an ./example_output/experiment_name folder.

In [None]:
config_handler.run_component_experiment_config(component_type="generate")

## 4. Extraction: Re-extracting Entities from the Patient Medical Notes

This extracts out all of the extraction defined configuration and then runs the configuration through the pipeline and saves the data to an ./example_output/experiment_name folder.

In [None]:
config_handler.run_component_experiment_config(component_type="extraction")

## 5. Visualising the Experiment Workflow

This method on the config handler allows a user to inspect their workflows data. This allows the user to get an idea of which configuration type runs into which output type.

In [None]:
config_handler.load_pipeline_visualisation()

## Reload the Data

By using the above workflow, you can then specify which data you would like to reload back into the notebook.

In [None]:
config_handler.load_specified_data_file(filename="extraction_0")