# Using RAGAs to Evaluate Model

## _Why evaluate a RAG_?

Evaluation tells you what actually works and what fails. For a RAG system you’re not just testing an LLM — you’re testing a pipeline:

Retriever quality (are relevant chunks returned?)

Generator quality given context (is the answer correct/faithful?)

End-to-end user experience (latency, cost, robustness, safety)

Without evaluation you’ll be fixing the wrong thing (tuning prompts when retrieval is the real problem, or vice versa).

- [x] create a testset
- [ ] what all is needed in the test set? 
- [ ] why do we need the test set?
- [ ] 

### Getting Transcript Data of the video

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
ytt_api = YouTubeTranscriptApi()
video_id = "P14cRV-m6ZY"
fetched_transcript = ytt_api.fetch(video_id,languages=["en","hi"])

transcript = " ".join(snippet.text for snippet in fetched_transcript)

In [None]:
transcript

In [None]:
def get_transcript(video_id):
   ytt_api = YouTubeTranscriptApi()
   fetched_transcript = ytt_api.fetch(video_id,languages=["en","hi"])

   transcript = " ".join(snippet.text for snippet in fetched_transcript)
   return transcript

In [None]:
from langchain_community.document_loaders import TextLoader

transcript_file = "./transcript.txt"
loader = TextLoader(transcript_file)
documents = loader.load()

# so we basically need a `Document` object to proceed either create the object from the text you have or provide the file. i've choosen the latter way

In [None]:
documents

In [None]:
config = {
    "model": "llama-3.3-70b-versatile",  # or other model IDs
    "temperature": 0.7,
    "max_tokens": None,
    "top_p": 0.8,
}

In [None]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

from langchain_google_genai import GoogleGenerativeAIEmbeddings


In [None]:
from langchain_groq import ChatGroq
from ragas.embeddings import OpenAIEmbeddings
import os

# llm model is groq 
generator_llm = LangchainLLMWrapper(ChatGroq(
   model=config["model"],
   temperature=config['temperature']
))

# embedding model is google embedding model 
generator_embeddings = LangchainEmbeddingsWrapper(GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001"))

In [None]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)

dataset = generator.generate_with_langchain_docs(documents, testset_size=10)