# Getting started

## Before you start

### Custom Dataset
If you want to run prompt optimization on your own dataset, follow these steps:

1. Create a folder.
1. Create a .txt file in the folder named "prompts.txt". It should contain 8-12 initial prompts from where you can start the optimization. Add line breaks between each of the prompts
1. Create two .txt files in another folder, which contain the dev set "dev.txt" and test set "test.txt" of your data points. Convert the classes of your file into integers. 
Make sure to seperate the input from the expected output with a tab!
1. Create a description.json file that contains a dictionary, specifying:
    - "seed": the folder in which you find the dev and test files
    - "init_prompts": the name of the .txt file pointing to the prompts
    - "description": A short description of your task, that is fed to the meta-llm in order to optimize the prompts. 
    (TIP: Include "The class mentioned first in the response of the LLM will be the prediction." in the description if this is how you evaluate the models responses)
    - "classes": A list of the names of the classes you are trying to predict

You can find examples of how this needs to be set up in our repo at data_sets/

## Installs

In [1]:
# ! pip install promptolution

## Imports

In [2]:
from promptolution.helpers import run_experiment
from promptolution.utils.prompt_creation import create_prompts_from_samples
from promptolution.config import Config
from promptolution.tasks import ClassificationTask
from promptolution.llms.api_llm import APILLM
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## set up llms, predictor, tasks and optimizer

In [3]:
token = open("../deepinfratoken.txt", "r").read()

In [4]:
llm = APILLM("meta-llama/Meta-Llama-3-8B-Instruct", token=token)

In [5]:
dataset = pd.read_csv("hf://datasets/tasksource/subjectivity/train.csv")
dataset = dataset.rename(columns={"Sentence": "x", "Label": "y"})
dataset = dataset.replace({"OBJ": "objective", "SUBJ": "subjective"})

dataset_description = "The dataset contains sentences labeled as either subjective or objective. "\
        "The task is to classify each sentence as either subjective or objective. " \
        "The class mentioned first in the response of the LLM will be the prediction."

In [6]:
task = ClassificationTask.from_dataframe(dataset, dataset_description)

In [7]:
init_prompts = []
for _  in range(12):
    prompt = create_prompts_from_samples(task, llm, n_samples=3)
    init_prompts.append(prompt)

In [8]:
init_prompts

['Classify the given text as either an objective or subjective statement based on the tone and language used: e.g. the tone and language used should indicate whether the statement is a neutral, factual summary (objective) or an expression of opinion or emotional tone (subjective). Include the output classes "objective" or "subjective" in the prompt.',
 'What kind of statement is the following text: [Insert text here]? Is it <objective_statement> or <subjective_statement>?',
 'Identify whether a sentence is objective or subjective by analyzing the tone, language, and underlying perspective. Consider the emotion, opinion, and bias present in the sentence. Are the authors presenting objective facts or expressing a personal point of view? The output will be either "objective" (output class: objective) or "subjective" (output class: subjective).',
 'Classify the following sentences as either objective or subjective, indicating the name of the output classes: [input sentence]. Output classes

In [9]:
config = Config(
    task_name="subj",
    dataset=dataset,
    dataset_description=dataset_description,
    init_prompts=init_prompts,
    n_steps=8,
    optimizer="evopromptga",
    meta_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    evaluation_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    downstream_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    api_token=token,
    prepend_exemplars=True,
    exemplar_selector="random",
    n_exemplars=3,
)

In [11]:
df = run_experiment(config)

EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]


ClientConnectorDNSError: Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]

EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinfra.com:443 ssl:default [getaddrinfo failed]
EX Cannot connect to host stage.api.deepinf

In [21]:
df

Unnamed: 0,prompt,score
0,Identify the objective or subjective nature of...,0.75
8,"To generate a better prompt, I will follow the...",0.75
1,Identify the text's underlying tone and stance...,0.7
3,Delineate the tone and perspective in the text...,0.7
5,Analyze the author's intention and emotional u...,0.7
6,"<prompt name=""outputs"":[""objective"", ""objectiv...",0.7
9,Let's follow the instructions step-by-step to ...,0.7
10,Investigate and categorize the text's tone and...,0.7
2,Determine the underlying tone and intention of...,0.65
4,Analyze the written statement to determine if ...,0.6


In [22]:
df["prompt"].tolist()

['Identify the objective or subjective nature of each sentence in the provided text, taking into account tone, language, and intended purpose, and then simplify the text while maintaining its essence without altering its original meaning.',
 "To generate a better prompt, I will follow the instructions step-by-step.\n\n**Step 1: Crossover the prompts**\n\nThe crossover of the two prompts is:\n\n* Determine whether the text is an objective or subjective statement, and identify if it mentions a specific financial demand or threatened action, without altering its meaning.\n* Analyze the intended stance of the text (factual declaration or personal reflection) and identify if it's objective (neutral and factual) or subjective (expresses personal opinion or bias)\n\nThis crossover combines the relevant information from both prompts.",
 "Identify the text's underlying tone and stance, distinguishing between objective, subjective, or a combination, to create a clear understanding of the author'