# Parameter Experimentation with LastMile AI


In this notebook, we'll demonstrate how to run and evaluate experiments with the parameters of your RAG application (ex. model, temperature, chunk size, k). We will use the RAG Debugger UI to visualize and analyze our results.

## Notebook Outline
* [Introduction](#intro)
* [Step 1: Install and Setup](#step1)
* [Step 2: Generate summary with LLM](#step2)
* [Step 3: Run experiment #1](#step3)
* [Step 4: Run experiment #2](#step4)
* [Step 5: View Evaluation Results](#step5)

<a name="intro"></a>

# Introduction
**Parameters**, such as model, temperature, chunk size, or k, define the behavior of your RAG system. These parameters serve as the adjustable knobs that you can tune and test to optimize your RAG system's performance. In RAG Workbench, the parameter set used for each evaluation run is clearly displayed. This allows for easy comparison of different evaluation runs with varying parameter sets, enabling you to identify the optimal set of parameters for your RAG system.

In this example, we demonstrate how to define a parameter (e.g. model) for a simple non-RAG app and explore the impact of different parameter values through experimentation and evaluation.


<a name="step1"></a>

## Step 1: Install and Setup

Before we begin, we need to install the following packages:

In [None]:
!pip install openai
!pip install -q -U google-generativeai
!pip install lastmile-eval --upgrade

Import the necessary libraries.

In [61]:
import google.generativeai as genai
from openai import OpenAI
from lastmile_eval.rag.debugger.api.evaluation import run_and_evaluate

We also need the following API tokens/keys:

* **LastMile AI API Token:** Go to the [LastMile Settings page](https://lastmileai.dev/settings?page=tokens). You will need to first create a LastMile AI account.
* **OpenAI API Key:** Go to [OpenAI API Keys page](https://platform.openai.com/account/api-keys) to create and access your OpenAI API Key.
* **Google Gemini API Key:** Go to [Google Gemini API Keys](https://aistudio.google.com/app/apikey?_gl=1*xtckgy*_ga*MTQzNDQ1Mzk1NS4xNzE2OTE5NjYy*_ga_P1DBVKWT6V*MTcxNjkxOTY2Mi4xLjEuMTcxNjkxOTY4My4zOS4wLjE1MTYxODUzNTI.) to create and access your Google API Key.

Run the code cell below after setting the keys either in **Google Colab Secrets** or in `.env` in your directory. Avoid inputting keys directly in the notebook.

In [62]:
import os

try:
    # Reading secrets from Google Colab
    from google.colab import userdata
    os.environ['OPENAI_API_KEY'] =  userdata.get('OPENAI_API_KEY')
    os.environ['LASTMILE_API_TOKEN'] =  userdata.get('LASTMILE_API_TOKEN')
    os.environ['GOOGLE_API_KEY'] =  userdata.get('GOOGLE_API_KEY')
except ModuleNotFoundError:
    # Else reading environment variables
    import dotenv
    dotenv.load_dotenv()

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

<a name="step2"></a>
## Step 2: Rewrite a News Article Subject line with an LLM

First, define a function `alter_messaging` that takes in a `model_name`, `original_text`, `tempature` and generates a new Subject Line.

We will use the LastMile AI Tracing SDK to setup tracing and register the LLM as the parameter we want to track.

In [68]:
from lastmile_eval.rag.debugger.tracing import get_lastmile_tracer
from lastmile_eval.rag.debugger.api import LastMileTracer


# The Project Name is used to organize your distributed tracer, evaluation sets, test sets, and metrics
projectName="Change Subject Lines"

# Instantiate LastMile Tracer
tracer: LastMileTracer = get_lastmile_tracer(
    tracer_name="messaging-generator",
    project_name=projectName,
    lastmile_api_token= os.environ['LASTMILE_API_TOKEN'],
)

# Wrapper for Gemini Pro that takes a prompt and the temperature setting
def wrapper_gemini_pro(prompt, temp):
    model = genai.GenerativeModel('gemini-pro')

    # Register the temperature
    tracer.register_query_temperature(temp)
    response = model.generate_content(prompt, generation_config={"temperature":temp})
    return response.text

# Wrapper for OpenAI GPT3.5 that takes a prompt and the temperature setting
def wrapper_gpt(prompt, temp):
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

    # Register the temperature
    tracer.register_query_temperature(temp)
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt,}],
        temperature=temp,
        model="gpt-3.5-turbo",
    )
    return response.choices[0].message.content

# Decorate function with tracer
@tracer.start_as_current_span("alter_messaging")
def alter_messaging(model_name, temp, original_text):

    # Prompt for generating a new subject line
    prompt = f"Take the following text and change the text, while maintaining the same narrative: {original_text}"

    # Register model_name as parameter
    tracer.register_param("model_name", model_name)
    
    if model_name == "gemini-pro":
        return wrapper_gemini_pro(prompt, temp)
    elif model_name == "gpt-3.5-turbo":
        return wrapper_gpt(prompt, temp)
    else:
        raise ValueError(f"Unsupported model: {model_name}")


<a name="step3"></a>

## Step 3: Run Experiment #1

Our first experiment will be to generate new subject lines using `gpt-3.5-turbo` and get a semantic similarity score (i.e. How similar is the new text from the original text). We will be using a LastMile Evaluator - Similarity Score.

In [70]:
# Generate new Subject Lines for a List of Original Texts using gpt-3.5-turbo
gpt_altered_messages = []
original_texts = [
    """Heat Indexes Approach 100 Degrees as Temperatures Rise Across the U.S.""",
    """Boston Celtics capture historic 18th NBA title with 106-88 Game 5 victory over Dallas Mavericks""",
    """New high-speed sleeper train service connects Hong Kong with Beijing and Shanghai"""
]

dataset = {"input":original_texts,
           "output":gpt_altered_messages,
           "groundTruth":original_texts,
            }

# Evaluate LLM-generated responses with Similarity Evaluator
run_and_evaluate(
      run_query_fn=lambda x: alter_messaging("gpt-3.5-turbo", 0.75, x),
      project_name=projectName,
      inputs=original_texts,
      ground_truths=original_texts,
      evaluators={"similarity"},
)

CreateEvaluationResponse(evaluation_result_id='clxl124cl00bopbs9k74uoqbr', example_set_id='clxl11whs00b8pbs9beeuvzfx', success=True, message='{"id":"clxl124cl00bopbs9k74uoqbr","createdAt":"2024-06-18T23:20:31.365Z","updatedAt":"2024-06-18T23:20:31.365Z","name":"pseudoastringent-caltraps-474","paramSet":{"model_name":"gpt-3.5-turbo","query_temperature":0.75},"testSetId":"clxl11whs00b8pbs9beeuvzfx","creatorId":"clg70hw9q0004pk7kl9fo1u93","projectId":"clxl0z6gv00h0qjr8tsuizfdx","organizationId":null,"visibility":"MEMBER","metadata":null,"active":true}', df_metrics_example_level=                exampleSetId                  exampleId  metricName  value
0  clxl11whs00b8pbs9beeuvzfx  clxl11wib00bapbs9i09k0esl  similarity    0.8
1  clxl11whs00b8pbs9beeuvzfx  clxl11wib00bbpbs95mtjc33i  similarity    0.8
2  clxl11whs00b8pbs9beeuvzfx  clxl11wib00bcpbs9fvhsjg0p  similarity    0.8, df_metrics_aggregated=                exampleSetId        metricName     value
0  clxl11whs00b8pbs9beeuvzfx   similar

This creates an **Evaluation Run** with inputs, outputs, and evaluation metrics for experiment 1 which we can view in the RAG Debugger UI.

<a name="step4"></a>

## Step 4: Run Experiment #2

Our second experiment will be to generate new subject lines using `gemini-pro`.

We will evaluate with one of the Lastmile Evaluators - Simiarlity Score.

In [71]:
# Generate new Subject Lines for a List of Original Texts using gemini-pro
gemini_altered_messages = []

dataset = {"input":original_texts,
           "output":gemini_altered_messages,
           "groundTruth":original_texts,
            }


# Evaluate LLM-generated responses with Similarity Evaluator
run_and_evaluate(
      run_query_fn=lambda x: alter_messaging("gemini-pro", 1.0, x),
      project_name=projectName,
      inputs=original_texts,
      ground_truths=original_texts,
      evaluators={"similarity"},
)

CreateEvaluationResponse(evaluation_result_id='clxl12w0y00brpedxuwpgcty1', example_set_id='clxl12og600czqu1kh4zsi4xo', success=True, message='{"id":"clxl12w0y00brpedxuwpgcty1","createdAt":"2024-06-18T23:21:07.233Z","updatedAt":"2024-06-18T23:21:07.233Z","name":"alloplasmic-ratooners-988","paramSet":{"model_name":"gemini-pro","query_temperature":1},"testSetId":"clxl12og600czqu1kh4zsi4xo","creatorId":"clg70hw9q0004pk7kl9fo1u93","projectId":"clxl0z6gv00h0qjr8tsuizfdx","organizationId":null,"visibility":"MEMBER","metadata":null,"active":true}', df_metrics_example_level=                exampleSetId                  exampleId  metricName  value
0  clxl12og600czqu1kh4zsi4xo  clxl12ogp00d0qu1kotohvz1s  similarity   0.85
1  clxl12og600czqu1kh4zsi4xo  clxl12ogp00d1qu1ke9f2hd0t  similarity   0.90
2  clxl12og600czqu1kh4zsi4xo  clxl12ogp00d2qu1kxv2w1upb  similarity   0.80, df_metrics_aggregated=                exampleSetId        metricName     value
0  clxl12og600czqu1kh4zsi4xo   similarity_mean  

<a name="step5"></a>

## Step 5: View Evaluation Results

We can view the results in the RAG Workbench UI. From your terminal, export your LASTMILE_API_TOKEN

```bash
export LASTMILE_API_TOKEN="<your-api-token>"
```

Next, run the following command in your terminal to launch the UI:

```bash
rag-debug launch
```

The 'Evaluation Console' is the landing page of RAG Workbench. Here you can see all your Evaluation Sets (including the one we just made).

1. On the top-right, switch to your Project Name (which we set at the beginning of the notebook).

2. Select the two Evaluation Runs.

3. Click the Visualize Correlation!

<img width="1800" alt="Visualize-Evaluation-Runs" src="https://github.com/lastmile-ai/eval-cookbook/assets/129882602/2f9a0e6c-edb5-4177-9002-2af72274fb3a">

We can quickly see the results of our two experiments! We've logged our parameter `model_name` and `temperature` here and it makes it easy to differentiate on what changed in each experiment. You can have multiple parameters logged here (ex. model_name, temperature, chunk size, k).