LLM Evaluation

This library provides a collection of classes and functions to evaluate and compare different large language models (LLMs). The main purpose of the library is to build chatbots and evaluate their responses based on given objectives.

Modules and Classes

LanguageModelWrapper A base class for wrapping different language models.
Prompt A class for managing prompt templates.
BinaryPreference A class for managing binary preferences between two different responses.
BinaryEvaluator A base class for evaluating binary preferences between two different responses.
GPT35Evaluator A class for evaluating binary preferences using the GPT-3.5 LLM.
OpenAIModel An enumeration class for listing available OpenAI LLM models.
OpenAIGPTWrapper A class for wrapping OpenAI's GPT models.
ClaudeWrapper A class for wrapping Anthropic's Claude LLM.
CohereWrapper A class for wrapping Cohere's LLM.
ChatBot A class for creating chatbot instances based on provided LLMs.

Required Setup

Install dotenv (linux & mac), or python-dotenv (Windows)

pip install python-dotenv

Install openai, cohere

pip install openai cohere

OR

Install all from requirements.txt

pip install -r requirements.txt

Example Usage

Import the required libraries

import os
from dotenv import load_dotenv
import llm_eval

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
cohere_api_key = os.getenv("COHERE_API_KEY")

We'll use GPT-3.5 as the evaluator.

e = llm_eval.GPT35Evaluator(openai_api_key)

Setup the Objective & Initial User Chats

objective = "We're building a chatbot to discuss a user's travel preferences and provide advice."

# Chats that have been launched by users.
travel_chat_starts = [
    "I'm planning to visit Tulsa in spring.",
    "I'm looking for the cheapest flight to Spain today."
]

Create the AI Models

cohere_model = llm_eval.CohereWrapper(cohere_api_key)
davinici3_model = llm_eval.OpenAIGPTWrapper(openai_api_key, model=llm_eval.OpenAIModel.DAVINCI3.value)
chatgpt35_model = llm_eval.OpenAIGPTWrapper(openai_api_key)

Run The Evaluator for Each User Chat

for tcs in travel_chat_starts:

    messages = [{"role":"system", "content":objective},
            {"role":"user", "content":tcs}]

    response_cohere = cohere_model.complete_chat(messages, "assistant")
    response_gpt35 = chatgpt35_model.complete_chat(messages, "assistant")

    response_davinvi3 = davinici3_model.complete_chat(messages, "assistant")

    pref = e.choose(objective, tcs, response_cohere, response_gpt35)
    print(f"1: {response_cohere}")
    print(f"2: {response_gpt35}")
    print(f"Preferred Choice: {pref}")

    pref2 = e.choose(objective, tcs, response_gpt35, response_davinvi3)
    print(f"1: {response_gpt35}")
    print(f"2: {response_davinvi3}")
    print(f"Preferred Choice: {pref2}")

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
llm_eval.py		llm_eval.py
main.py		main.py
requirements.txt		requirements.txt
sweep.yaml		sweep.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Evaluation

Modules and Classes

Required Setup

Example Usage

Import the required libraries

We'll use GPT-3.5 as the evaluator.

Setup the Objective & Initial User Chats

Create the AI Models

Run The Evaluator for Each User Chat

About

Releases

Packages

Contributors 2

Languages

spenceryonce/LLMeval

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluation

Modules and Classes

Required Setup

Example Usage

Import the required libraries

We'll use GPT-3.5 as the evaluator.

Setup the Objective & Initial User Chats

Create the AI Models

Run The Evaluator for Each User Chat

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages