# Getting started

## Before you start

### Custom Dataset
If you want to run prompt optimization on your own dataset, follow these steps:

1. Create a folder.
1. Create a .txt file in the folder named "prompts.txt". It should contain 8-12 initial prompts from where you can start the optimization. Add line breaks between each of the prompts
1. Create two .txt files in another folder, which contain the dev set "dev.txt" and test set "test.txt" of your data points. Convert the classes of your file into integers. 
Make sure to seperate the input from the expected output with a tab!
1. Create a description.json file that contains a dictionary, specifying:
    - "seed": the folder in which you find the dev and test files
    - "init_prompts": the name of the .txt file pointing to the prompts
    - "description": A short description of your task, that is fed to the meta-llm in order to optimize the prompts. 
    (TIP: Include "The class mentioned first in the response of the LLM will be the prediction." in the description if this is how you evaluate the models responses)
    - "classes": A list of the names of the classes you are trying to predict

You can find examples of how this needs to be set up in our repo at data_sets/

## Installs

In [1]:
! pip install promptolution

## Imports

In [1]:
from promptolution.helpers import run_experiment
from promptolution.config import Config

  from .autonotebook import tqdm as notebook_tqdm


## set up llms, predictor, tasks and optimizer

In [2]:
token = open("../deepinfratoken.txt", "r").read()

In [12]:
config = Config(
    task_name="agnews",
    ds_path="../data_sets/cls/agnews/",
    n_steps=8,
    optimizer="evopromptde",
    meta_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    evaluation_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    downstream_llm="meta-llama/Meta-Llama-3-8B-Instruct",
    api_token=token,
    prepend_examplars=True,
    exemplar_selector="random_search",
)

In [13]:
df = run_experiment(config)

In [14]:
df

Unnamed: 0,prompt,score
3,You will be required to classify a news articl...,0.7
2,Categorize the news article into one of four c...,0.58
4,Your task is to identify the primary topic of ...,0.5
1,Based on the main theme of given the news arti...,0.36
0,"Classify the topic of the following news as ""W...",0.3


In [15]:
for i in range(len(df   )): 
    print(df.loc[i, "score"])
    print(df.loc[i, "prompt"])
    print("======")

0.3
Classify the topic of the following news as "World", "Sports", "Tech" or "Business".
Inter Conquer Brussels Inter emerged as the winners over Anderlecht in Brussels to claim the first spot in the Champions League Group G. The nerazzurri took the lead nearly immediately when Adriano from the left served a low powerful 
I would classify the topic of this news as "Sports".
Cellphone That Detects Bad Breath Siemens Mobile, the German telecommunications company, has announced that it is working on a mobile phone that makes users aware when they have bad breath.
I would classify the topic of this news as "Tech". The article is about a new mobile phone feature being developed by a telecommunications company, which falls under the category of technology news.
Moody #39;s raises Dell #39;s senior unsecured debt rating NEW YORK, Aug 18 - Moody #39;s Investors Service said on Wednesday it raised the senior unsecured debt rating of Dell Inc. (DELL.O: Quote, Profile, Research) with a stable out

In [11]:
df

Unnamed: 0,prompt,score
0,Classify the news story into one of the follow...,0.95
7,You will be required to classify a news articl...,0.9
11,"Classify the topic of the following news as ""W...",0.9
12,"Classify news articles into categories (World,...",0.9
4,Classify the given news article into one of th...,0.85
6,Your job is to determine whether a news articl...,0.85
13,Categorize the provided news article according...,0.85
1,Categorize the news article into one of four c...,0.8
2,Your responsibility is to accurately categoriz...,0.8
3,Identify the primary theme of a news article a...,0.8
