# Task 4 : Prompt Testing and Ranking
Goals
- Comprehensive Evaluation: Provide a robust system that uses various methodologies for a thorough assessment of prompts.
- Customizable and User-Centric: Allow users to choose or customize their preferred evaluation methods.
- Dynamic and Adaptive: Ensure the system remains flexible and adaptive, capable of incorporating new ranking methodologies as they emerge.

# Primary Methods

- Monte Carlo Matchmaking: This method is used to select and match different prompt candidates against each other. The Monte Carlo method, known for its applications in problem-solving and decision-making processes, helps in optimizing the information gained from each prompt battle. By simulating various matchups, it allows the system to test the effectiveness of each prompt in different scenarios.
- ELO Rating System:  This system, which is commonly used in chess and other competitive games, rates the prompts based on their performance in the battles. Each prompt candidate is assigned a rating that reflects its success in previous matchups. The system takes into account not just the number of wins but also the  

In [1]:
import sys
import os

rpath = os.path.abspath('./..')
if rpath not in sys.path:
    sys.path.insert(0, rpath)
    
from utility.testing_and_ranking import evaluate_prompt, elo_ratings_func


- Conduct multiple rounds of evaluation
- Sort prompts by their final Elo ratings and then Print the ranked prompts




In [14]:
questions = ['What are the key performance indicators for understanding the challenge?', 
          'What are some techniques to improve RAG in RAG?', 
          'What are some key areas of knowledge acquisition for prompt engineering?', 
          'Which companies are doing something similar to this project?', 
          'What are the tasks involved in developing the prompt generation system?']

elo_ratings = {prompt: 1500 for prompt in questions}  # Initial ratings

for _ in range(10):  # Number of rounds
    elo_ratings = elo_ratings_func(questions, elo_ratings)

sorted_prompts = sorted(questions, key=lambda x: elo_ratings[x], reverse=True)

for prompt in sorted_prompts:
    print(f"{prompt}: {elo_ratings[prompt]}")

What are some key areas of knowledge acquisition for prompt engineering?: 1548.7888289685131
Which companies are doing something similar to this project?: 1524.186472437636
What are some techniques to improve RAG in RAG?: 1523.9456928737486
What are the key performance indicators for understanding the challenge?: 1522.5243266611412
What are the tasks involved in developing the prompt generation system?: 1456.5759865967193


In [15]:
main_prompt = 'What are the key performance indicators for understanding the challenge?'
test_cases = ['What are the key performance indicators for understanding the challenge?', 
          'What are some techniques to improve RAG in RAG?', 
          'What are some key areas of knowledge acquisition for prompt engineering?', 
          'Which companies are doing something similar to this project?', 
          'What are the tasks involved in developing the prompt generation system?']
result = evaluate_prompt(main_prompt, test_cases)
print(result)

{'main_prompt': {'Monte Carlo Evaluation': 1.97, 'Elo Rating Evaluation': 1504.2019499940866}, 'test_case_1': {'Monte Carlo Evaluation': 2.04, 'Elo Rating Evaluation': 1489.2019499940866}, 'test_case_2': {'Monte Carlo Evaluation': 2.0, 'Elo Rating Evaluation': 1519.2019499940866}, 'test_case_3': {'Monte Carlo Evaluation': 2.04, 'Elo Rating Evaluation': 1519.2019499940866}, 'test_case_4': {'Monte Carlo Evaluation': 1.87, 'Elo Rating Evaluation': 1504.2019499940866}, 'test_case_5': {'Monte Carlo Evaluation': 2.25, 'Elo Rating Evaluation': 1504.2019499940866}}
