In [1]:
import pickle
from tqdm.notebook import tqdm
from conversation import create_coder, create_reviewer, create_refiner, start_conversation

# Trabalho de RL - Qual o Melhor Prompt para Iterar sobre a Geração de Código?
Nosso trabalho de RL é sobre descobrir quais os melhores prompts para iterar sobre a geração de 
código de LLMs.

## Como funciona
Criamos uma conversa com 3 participantes: Coder, Reviewer e Refiner.
Cada participante é responsável por enviar um prompt para o LLM, e inputar os resultados do LLM no
ambiente (aqui chamado de 'conversa').

### Coder
O Coder é responsável por escrever o código.  
Ele envia o prompt inicial, descrevendo o problema a ser resolvido. Definimos que todos os nossos
problemas serão de limpeza de uma base de dados csv.  
O Coder só participa da conversa 1 vez (no início) e, por isso, não o definimos como um agente RL.
Ao invés disso, ele é programado para iterar por todos os prompts n vezes, e avaliamos os resultados
das conversas com cada prompt inicial posteriormente.

### Reviewer
O Reviewer é responsável por avaliar o código gerado pelo LLM.  
Ele envia um prompt solicitando a avaliação do código gerado pelo LLM. Ele pode essa avaliação
sempre após a geração de um código que não tem nota superior à nota terminal.  
O Reviewer é um agente RL, e seu objetivo é maximizar a nota do código gerado pelo LLM.

### Refiner
O Refiner é responsável por refinar o código gerado pelo LLM.
Ele envia um prompt solicitando a melhoria do código gerado pelo LLM. Ele pode essa avaliação
sempre após uma revisão do Reviewer.  
O Refiner é um agente RL, e seu objetivo é maximizar a nota do código gerado pelo LLM.

### Prompt
Para cada participante, geramos prompts que iam de 1 a $n$ nas **escalas** das seguintes **propriedades**:
- Clareza;
- Comprimento;
- Especificidade, e
- Complexidade.

Isso totalizou até 20 prompts diferentes para cada participante. Para diminuir o espaço de ações,
optamos por usar uma estratégia mais simples:

- Para cada prompt (**comprimento da escala** x **número de propriedades**) do **Coder**;
    - Para cada **propriedade** do **Reviewer**;
        - Para cada **propriedade** do **Refiner**;
            - Geramos $m$ conversas onde:
                1. O Coder envia o prompt e adiciona o código inicial;
                2. O código é avaliado (se a nota não for terminal, prossegue);
                3. O Reviewer escolhe um dos $n$ prompts da **propriedade** e adiciona a revisão;
                4. O Refiner escolhe um dos $n$ prompts da **propriedade** e adiciona a melhoria.
                5. Se o comprimento da conversa não for terminal, volta ao passo 2.

### Avaliação do Código
O código é avaliado por um LLM usando a bibliteca `instructor`. Pedimos que o código receba uma nota
de 0 a 100 para a sua corretude e legibilidade, bem como uma curta explicação do porquê da nota 
(esse comentário é adicionado posteriormente à conversa).  
Se a nota média for superior a 95, a conversa é terminada pois consideramos que o código é bom o
suficiente.

In [2]:
prompts_coder = [
    {   "index": 0,
        "prompt": "I have a dirty dataset. Develop a code to clean it."
    },
    {   "index": 1,
        "prompt": "I will provide a dataset with some issues that need to be addressed. You should import the CSV file located at 'csv_data/imdb_sample_10.csv' and develop a well structured code to perform the necessary corrections.",
    },
    {   "index": 2,
        "prompt": "I have a dataset that requires cleaning. The file path is 'csv_data/imdb_sample_10.csv'. Please ensure the following tasks are carried out: 1. Import the CSV file; 2. check for and address the following issues: missing data, duplicate record, formatting inconsistencies, typographical errors, data encoding issues, etc; 3. Save the cleaned dataset in a file called 'processed_imdb_sample_10.csv' and return it. Ensure that the code is well structured, efficient, and utilizes popular Python libraries such as pandas and numpy. The code needs to work perfectly when I run it with my database in the specified path.",
    },
    {   "index": 3,
        "prompt": "You will be provided with a dataset contained in a CSV file located at 'csv_data/imdb_sample_10.csv'. Your task is to import this CSV file and develop a well-structured Python script to address the issues within the dataset. To achieve this, follow these steps: 1. Import the necessary libraries (e.g., pandas, numpy); 2. Load the CSV file into a DataFrame; 3. Identify and document the issues present in the dataset (e.g., missing values, incorrect data types, duplicate entries); 4. Develop code to correct each identified issue systematically; 5. Ensure that your script is modular, separating the correction steps into different functions where appropriate; 6. Validate the corrections made to ensure data integrity; 7. Please complete the task by following the steps outlined above and provide comments within your code to explain each correction made."
    }
]

reviewer_questions = {
    "Question 1": [
        "You will assume the role of a code reviewer who will receive a script designed to clean a dataset and will need to evaluate the quality of this code. Focus on answering the following questions: Is the code functional? Are there any structural or syntax errors that would prevent its execution? Is the code well-documented? Is the code applicable to the dataset it is intended to clean? Be thorough in your review, imagining that you are responsible for a project and are guiding a programmer on your team.",
        "You will assume the role of a code reviewer who will receive a script designed to clean a dataset and will need to evaluate the quality of this code. Be concise in your evaluation, keeping it brief and condensed as much as possible."
    ],
    "Question 2": [
        "You will assume the role of a code reviewer who will receive a script designed to clean a dataset and will need to evaluate the quality of this code. Analyze the code and verify if it is consistent, functional, applicable to the dataset, and well-structured. Condense the possible areas for improvement and provide your feedback at the end.",
        "You will assume the role of a code reviewer who will receive a script designed to clean a dataset and will need to evaluate the quality of this code. Evaluate the following criteria and suggest possible improvements: 1. Is the code functional? 2. Is the code concise? 3. Is the code easy to interpret? 4. Is the code well-documented? 5. Is the path to the CSV file correct? (The correct path should be 'csv_data/imdb_sample_10.csv') 6. Is the returned code continuous, meaning it is not separated into multiple cells? 7. Is the code saving the modified CSV? 8. Do the columns referenced in the CSV actually exist?"
    ]
}

refiner_questions = {
    "Question 1": [
        "You will assume the role of a code refiner, receiving a script and a list of recommended corrections to be made in it. Ensure that all suggested changes are applied in a coherent way, preserving the functionality of the original code.",
        "You will assume the role of a code refiner, receiving a script and a list of recommended corrections to be made. The core of the script is the cleaning of a dirty dataset, and implementing the modifications is crucial to ensure that all steps, from exporting the CSV to returning the well modified CSV, are carried out correctly. Therefore, make sure that all suggested changes are applied coherently, preserving the functionality of the original code."
    ],
    "Question 2": [
        "You will assume the role of a code refiner, receiving a script and a list of recommended corrections to be made in it. Ensure that all suggested changes are applied in a coherent way, preserving the functionality of the original code.",
        ""
    ]
}           

In [3]:
# from rl.code_evaluator import CodeEvaluator
# coder = create_coder(prompts_coder)
# coder_selected_prompt = prompts_coder[0]
# reviewer = create_reviewer(reviewer_questions["Question 1"])
# refiner = create_refiner(refiner_questions["Question 1"])
# evaluator = CodeEvaluator(environment=None, prompt="Evaluate the code quality", name="Code Evaluator")

# MAX_TURNS = 3
# TOT_CONVERSATIONS = 1


In [4]:
# from tqdm.notebook import trange, tqdm
# from coder import Coder
# from rl.llm_agent import LLMAgent
# from rl.environment import Environment
# from rl.code_evaluator import CodeEvaluator, CSV_PATH
# from rl.policies import EpsilonGreedyPolicy
# from rl.utils import compute_delta_grade, is_terminate_grade

# with open(CSV_PATH, "r") as f:
#     csv_data = f.read()

In [5]:
# global evaluator, csv_data
# environment = Environment()

# # Set the environment for all agents
# for agent in [coder, reviewer, refiner, evaluator]:
#     agent.environment = environment

# # Adding csv in the first coder prompt
# coder_prompt_dict = coder_selected_prompt.copy()
# coder_prompt_dict["prompt"] += f"\n\nFile content:\n\n{csv_data}"
# coder.add_message(coder_prompt_dict)

# # Start the conversation
# last_grade = None
# for turn in tqdm(range(MAX_TURNS), desc="Conv. turns", position=0, leave=False):
#     print(f"Turn {turn + 1}")
#     # Evaluates the code and status of the CSV
#     grade = evaluator.evaluate_code()
    
#     # Check if the score is enough to close
#     if is_terminate_grade(grade):
#         break
#     # If it is the first turn, reward the coder
#     elif last_grade is None:
#         coder.first_reward(grade)
#     # If it is not the first turn, reward refiner and reviewer    
#     else:
#         delta_grade = compute_delta_grade(last_grade, grade)
#         refiner.reward(delta_grade)
#         reviewer.reward(delta_grade)
    
#     # Add reviewer and refiner messages to the conversation
#     reviewer.add_message()
#     refiner.add_message()

#     last_grade = grade

# # Reward the Coder with the latest review
# coder.final_reward(last_grade)


In [None]:
from rl.code_evaluator import CodeEvaluator


MAX_TURNS = 4
TOT_CONVERSATIONS = 1
coder = create_coder(prompts_coder)

comb = 0
total_combs = len(prompts_coder) * len(reviewer_questions) * len(refiner_questions)
for i, coder_prompt_dict in enumerate(prompts_coder):
    for j, rev_prompts in enumerate(reviewer_questions.values()):
        for k, ref_prompts in enumerate(refiner_questions.values()):
            reviewer = create_reviewer(rev_prompts)
            refiner = create_refiner(ref_prompts)
            evaluator = CodeEvaluator(environment=None, prompt="Evaluate the code quality", name="Code Evaluator")
            
            comb += 1
            print(f"Combination {comb}/{total_combs}", end="\r")
            for l in range(TOT_CONVERSATIONS):
                try:
                    environment, rev_hist, ref_hist, last_grade = start_conversation(
                        coder, 
                        coder_prompt_dict, 
                        reviewer, 
                        refiner, 
                        MAX_TURNS
                    )
                    # Salva a conversa
                    with open(f"conversations/conv_{i}_{j}_{k}_{l}.md", "w", encoding="utf-8") as file:
                        for message in environment.messages:
                            file.write(f"**{message['role']}**: {message['content']}\n")
                    with open(f"conversations/conversation_final_grades.txt", "a") as file:
                        file.write(f"Conversation {i}_{j}_{k}_{l}: {last_grade}\n")
                    
                    # Salva os históricos
                    with open(f"history/reviewer_{i}_{j}_{k}_{l}.txt", "w") as file:
                        file.write(str(rev_hist))
                    with open(f"history/refiner_{i}_{j}_{k}_{l}.txt", "w") as file:
                        file.write(str(ref_hist))
                except Exception as e:
                   print(f"Conversation Skipped: {e}")

with open(f"results/coder.txt", "w") as file:
    file.write("start rewards: "+str(coder.start_rewards))
    file.write("\n")
    file.write("end rewards: "+str(coder.end_rewards))


Combination 1/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 2/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 3/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 5/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 6/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 7/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 8/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 9/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Combination 11/16

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 8 validation errors for CodeEvaluation
is_code_functional
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_type
is_code_consise
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_type
is_code_easily_readable
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_type
is_code_documented
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_type
is_code_all_grouped
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_type
i

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

Conversation Skipped: 10 validation errors for CodeEvaluation
is_code_functional
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_consise
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_easily_readable
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_documented
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_csv_path_correct
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
is_code_all_grouped
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.p

Conv. turns:   0%|          | 0/4 [00:00<?, ?it/s]

FileNotFoundError: [Errno 2] No such file or directory: 'models/coder.txt'

In [7]:
with open(f"results/coder.txt", "w") as file:
    file.write("start rewards: "+str(coder.start_rewards))
    file.write("\n")
    file.write("end rewards: "+str(coder.end_rewards))