# Plan 🧭
##### Choose the most critical metric for RAG for my use case (banking chatbot)
1. relevancy
2. groundedness
...

## Relevancy evaluation

#### Questions
- what is relevancy in LLM context?
- how to measure it?
- what are the common approaches?
- what are the common pitfalls?
- what are the best practices?
- how to implement it?
- how to validate it?
- how to benchmark it?
- how to improve it?

#### 1. Embedding based relevancy evaluator 

- To check relevancy, we can use cosine similarity between embeddings of the question and the answer.
- We can use OpenAI's embedding model to generate embeddings for both the question and the answer.
- Then, we can calculate the cosine similarity between the two embeddings.
- If the cosine similarity is above a certain threshold, we can consider the answer to be relevant to the question.
- We can use the following steps to implement this:
    - Generate embeddings for the question and the answer using OpenAI's embedding model
    - Calculate the cosine similarity between the two embeddings
#### 2. LLM-as-a-judge evaluator

- We can use a language model to evaluate the relevancy of the answer to the question
- by prompting the model with the question and the answer and asking it to rate the relevancy on a scale of 1 to 10.
- We can use the following steps to implement this:
    - Create a prompt that includes the question and the answer
    - Use the language model to generate a response to the prompt
    - Parse the response to extract the relevancy rating

## Sources 📚
### Microsoft
- https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-evaluation_1.0.0b5/sdk/evaluation/azure-ai-evaluation
- https://github.com/Azure/azure-sdk-for-python/blob/azure-ai-evaluation_1.0.0b5/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_relevance/relevance.prompty
- Azure AI search tailored relevance evaluator - https://github.com/Azure-Samples/azureai-samples/blob/main/scenarios/evaluate/Supported_Evaluation_Metrics/RAG_Evaluation/Optimize_RAG_with_Document_Retrieval_Evaluator.ipynb

### Ragas
- https://docs.ragas.io/en/stable/getstarted/rag_eval/#load-documents

## Groundedness

#### 1. Entity-based groundedness evaluator
- To check groundedness, we can extract entities from the answer and check if they are present in the context.
- We can use a named entity recognition (NER) model to extract entities from the answer.
- Then, we can check if the extracted entities are present in the context.
- If a certain percentage of the extracted entities are present in the context, we can consider the answer to be grounded.
- We can use the following steps to implement this:
    - Use a NER model to extract entities from the answer
    - Check if the extracted entities are present in the context
#### 2. LLM-as-a-judge evaluator
- We can use a language model to evaluate the groundedness of the answer by prompting the model with the context and the answer
- and asking it to rate the groundedness on a scale of 1 to 10.
- We can use the following steps to implement this:
    - Create a prompt that includes the context and the answer
    - Use the language model to generate a response to the prompt
    - Parse the response to extract the groundedness rating