### **Evaluation of Mistral Instruct using HuggingFace**

In this notebook, we explore the evaluation of the Mistral Instruct model using HuggingFace and BeyondLLM. Evaluation is essential for understanding how well a model performs on various tasks and its ability to generate relevant responses.

- **Objective:** Assess the Mistral Instruct model's performance using context relevancy, answer relevancy, and groundedness metrics.
- **Setup:**
  - Configure the retrieval pipeline for data processing.
  - Use HuggingFace and BeyondLLM for model integration.
- **Steps:**
  - Load and preprocess data.
  - Initialize the Mistral Instruct model and retrieval system.
  - Generate responses to sample queries.
  - Evaluate the responses using defined metrics.
- **Outcome:** Gain insights into the model’s effectiveness and areas for improvement in understanding and generating responses based on the given data.


## Setup and Installation

First, let's install the necessary packages:

In [None]:
!pip install beyondllm==0.2.2
!pip install huggingface_hub==0.20.1
!pip install llama-index-embeddings-fastembed==0.1.3

Now, let's import the required libraries and set up our environment:

In [None]:
from beyondllm.source import fit
from beyondllm.embeddings import FastEmbedEmbeddings
from beyondllm.retrieve import auto_retriever
from beyondllm.llms import HuggingFaceHubModel
from beyondllm.generator import Generate
import os
from getpass import getpass

# Set up HuggingFace API token
os.environ['HUGGINGFACE_ACCESS_TOKEN'] = getpass("Enter your HF API token:")

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Enter your HF API token:··········


## Data Loading and Model Configuration

Let's load our data and configure the embedding model:

In [None]:
# Load the PDF data
data = fit("/content/Shivaya Resume.pdf", dtype="pdf")

# Initialize the embedding model
embed_model = FastEmbedEmbeddings()

# Set up the retriever
retriever = auto_retriever(data=data, embed_model=embed_model, type="normal", top_k=3)

# Initialize the Mistral Instruct model
llm = HuggingFaceHubModel(model="mistralai/Mistral-7B-Instruct-v0.2")


100%|██████████| 76.7M/76.7M [00:01<00:00, 68.0MiB/s]


## Generate Response and Evaluate

Now, let's generate a response to a question and evaluate the model's performance:


In [None]:
# Set up the generation pipeline
pipeline = Generate(question="What are her hobbies?", llm=llm, retriever=retriever)

# Generate the response
response = pipeline.call()
print("AI Response:")
print(response)

# Get evaluation metrics
eval_metrics = pipeline.get_rag_triad_evals()
print("\nEvaluation Metrics:")
print(eval_metrics)

AI Response:

        ANSWER: Based on the provided context, Shivaya Pandey has shown interest and expertise in various areas related to Computer Science, Data Science, and Artificial Intelligence. Some of the projects mentioned in her portfolio include a Heart Disease Prediction System, Sign Language Detector, Video Text Overlay Tool, and Gold Price Prediction Model. These projects suggest that she enjoys working with Python, machine learning, and developing innovative solutions. Additionally, she has experience in coding in Java, C, and C++. Furthermore, she has published tech articles on Medium and is certified in Python programming. Therefore, her hobbies can be inferred to be related to computer science, data science, artificial intelligence, and writing.
Executing RAG Triad Evaluations...

Evaluation Metrics:
Context relevancy Score: 3.5
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.
Answer r

## Conclusion

In this notebook, we demonstrated how to evaluate the Mistral Instruct model using HuggingFace and BeyondLLM. We set up a retrieval pipeline, generated a response to a specific question, and obtained evaluation metrics for context relevancy, answer relevancy, and groundedness.

This approach allows us to assess the model's performance in understanding context, providing relevant answers, and ensuring the responses are grounded in the provided information. Such evaluation is crucial for improving and fine-tuning language models for specific tasks or domains.

Feel free to experiment with different questions, datasets, or model parameters to further explore the capabilities of the Mistral Instruct model.