Llama Index Evaluation Tool

This tool is designed to evaluate large language models (LLMs) and their embeddings, focusing on aspects such as faithfulness, relevancy, and correctness. It uses various components from the llama_index library, integrates OpenAI and Cohere APIs, and supports asynchronous operations for efficient processing.

Installation

Clone the repository:

git clone https://github.com/felipearosr/RAG-Evaluation

Navigate to the project directory:

cd RAG-Evaluation

Create a conda environment:

conda create -n rag-eval python==3.11 -y && source activate rag-eval

This creates a new Conda environment named rag-eval and activates it. Using a virtual environment is recommended to avoid conflicts with other Python projects or your system's Python.

Install the required packages:

pip install -r requirements.txt

Usage

Running the Evaluation Script:

Generate New Questions (Optional):

If you need to generate new questions for evaluation, use the --generate flag. This step is optional if you have already generated questions stored in questions.json:

python eval.py --generate

This command generates new questions based on the documents provided and saves them to questions.json. If you skip this step, the script will use existing questions from the file.

Run the Evaluation:

Once you have your questions ready (either newly generated or previously stored), run the script without the --generate flag to start the evaluation:

python eval.py

This command evaluates the set questions using the configured LLMs and embeddings, then outputs the results.

Features

Evaluation of LLMs based on faithfulness, relevancy, and correctness.
Integration with OpenAI and Cohere for embeddings and reranking.
Asynchronous question generation and evaluation.
Results output to CSV for analysis.

Dependencies

Python 3.x
openai
cohere-python
pandas
dotenv
asyncio
argparse
llama_index library

Configuration

Configure the tool by setting the following environment variables in your .env file, you can follow the example of .env.example:

OPENAI_API_KEY: Your OpenAI API key.
COHERE_API_KEY: Your Cohere API key.
MODEL: The LLM model name, default is "gpt-4-0125-preview".
EMBEDDING: The embedding model name, default is "text-embedding-3-large".

Troubleshooting

If you encounter issues, ensure your API keys are set correctly and that you have the correct permissions for accessing the APIs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama Index Evaluation Tool

Table of Contents

Installation

Usage

Features

Dependencies

Configuration

Troubleshooting

About

Releases

Packages

Languages

License

felipearosr/RAG-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Llama Index Evaluation Tool

Table of Contents

Installation

Usage

Features

Dependencies

Configuration

Troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages