Skip to content

Conversation

@andreacerasani
Copy link
Contributor

No description provided.

# RAG Evaluation
# Rag Evaluation

## Introduction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Puoi anche togliere il titoletto


RAG (Retrieval-Augmented Generation) is a way of building AI models that enhances their ability to generate accurate and contextually relevant responses by combining two main steps: **retrieval** and **generation**.

1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggiungerei: from a secific knowledge base defined by the system designer


Evaluating RAG involves assessing how well the model does in both retrieval and generation.

The RAG evaluation module analyzes the three main components of a RAG framework:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sarebbe carino mettere un blocco example oppure una colonna aggiuntiva. Scegli te

1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query.
2. **Generation**: It then uses these retrieved documents as context to generate a response, which is typically more accurate and aligned with the question than if it had generated text from scratch without specific guidance.

Evaluating RAG involves assessing how well the model does in both retrieval and generation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggiungerei una piccola motivazione del perché è importante fare la evaluation e dopo la tabella chiuderei l'introduzione.

| Context | The retrieved documents or information that the model uses to generate a response. |
| Response | The generated answer or output provided by the model. |

In particular, the analysis is performed on the relationships between these components:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Da qua farei partire una nuova sezione che parli esplicitamente del nostro modulo.

Partirei quindi con il dire che è disponibile solo per i nostri Task (con link) di tipo RAG, e che è strutturata come report di evaluation di un set di dati.

Aggiungerei un blocco info che spieghi che è possibile farlo sia web app che da sdk.

Infine, ripartirei con le la frase di questo commento (in particular, the analysis....).

Qualcosa sul fatto che il report è visualizzabile in web app ma che si può anche esportare in formato excel.

<figcaption>ML cube Platform RAG Evaluation</figcaption>
</figure>

The evaluation is performed through an LLM-as-a-Judge approach, where a Large Language Model (LLM) acts as a judge to evaluate the quality of a RAG model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questo potremmo ometterlo così da non svelare troppo che approccio utilizziamo

| Utilization | The percentage of the retrieved context that contains information for the response. A higher utilization score indicates that more of the retrieved context is useful for generating the response. | 0-100 |
| Attribution | Which of the chunks of the retrieved context can be used to generate the response. | List of indices of the used chunks, first chunk has index 1 |

!!! example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

più che example è un blocco di note

| Metric | Description | Score Range (Lowest-Highest) |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- |
| Satisfaction | How satisfied the user would be with the generated response. A low score indicates a response that does not address the user query, a high score indicates a response that fully addresses and answers the user query. | 1-5 |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

al termine della sezione farei un esempio di una query

@alelavml3 alelavml3 merged commit 22deb5c into dev Nov 20, 2024
@alelavml3 alelavml3 deleted the dev-rag-evaluation branch November 20, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants