-
Notifications
You must be signed in to change notification settings - Fork 0
Rag Evaluation #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rag Evaluation #24
Conversation
…add rag evaluation to nav
…eflect chosen palette
…d table, add SDK example
| # RAG Evaluation | ||
| # Rag Evaluation | ||
|
|
||
| ## Introduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puoi anche togliere il titoletto
|
|
||
| RAG (Retrieval-Augmented Generation) is a way of building AI models that enhances their ability to generate accurate and contextually relevant responses by combining two main steps: **retrieval** and **generation**. | ||
|
|
||
| 1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggiungerei: from a secific knowledge base defined by the system designer
|
|
||
| Evaluating RAG involves assessing how well the model does in both retrieval and generation. | ||
|
|
||
| The RAG evaluation module analyzes the three main components of a RAG framework: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sarebbe carino mettere un blocco example oppure una colonna aggiuntiva. Scegli te
| 1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query. | ||
| 2. **Generation**: It then uses these retrieved documents as context to generate a response, which is typically more accurate and aligned with the question than if it had generated text from scratch without specific guidance. | ||
|
|
||
| Evaluating RAG involves assessing how well the model does in both retrieval and generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggiungerei una piccola motivazione del perché è importante fare la evaluation e dopo la tabella chiuderei l'introduzione.
| | Context | The retrieved documents or information that the model uses to generate a response. | | ||
| | Response | The generated answer or output provided by the model. | | ||
|
|
||
| In particular, the analysis is performed on the relationships between these components: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Da qua farei partire una nuova sezione che parli esplicitamente del nostro modulo.
Partirei quindi con il dire che è disponibile solo per i nostri Task (con link) di tipo RAG, e che è strutturata come report di evaluation di un set di dati.
Aggiungerei un blocco info che spieghi che è possibile farlo sia web app che da sdk.
Infine, ripartirei con le la frase di questo commento (in particular, the analysis....).
Qualcosa sul fatto che il report è visualizzabile in web app ma che si può anche esportare in formato excel.
| <figcaption>ML cube Platform RAG Evaluation</figcaption> | ||
| </figure> | ||
|
|
||
| The evaluation is performed through an LLM-as-a-Judge approach, where a Large Language Model (LLM) acts as a judge to evaluate the quality of a RAG model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questo potremmo ometterlo così da non svelare troppo che approccio utilizziamo
| | Utilization | The percentage of the retrieved context that contains information for the response. A higher utilization score indicates that more of the retrieved context is useful for generating the response. | 0-100 | | ||
| | Attribution | Which of the chunks of the retrieved context can be used to generate the response. | List of indices of the used chunks, first chunk has index 1 | | ||
|
|
||
| !!! example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
più che example è un blocco di note
| | Metric | Description | Score Range (Lowest-Highest) | | ||
| | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | | ||
| | Satisfaction | How satisfied the user would be with the generated response. A low score indicates a response that does not address the user query, a high score indicates a response that fully addresses and answers the user query. | 1-5 | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
al termine della sezione farei un esempio di una query
No description provided.