## LLM
- the idea to experiment the same methods on multiple LLMs to make sure the result are LLM agnostic, ideally. Practically measure the std b/w different models

In [26]:
!pwd

/Users/shahules/ragas/alingment-exp


In [1]:
from langchain_openai.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

llm_4o = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
llm_4o_mini = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from langchain_aws import ChatBedrockConverse
from langchain_aws import BedrockEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

config = {
    "credentials_profile_name": "default",  # E.g "default"
    "region_name": "us-east-1",  # E.g. "us-east-1"
    "llm": "anthropic.claude-3-haiku-20240307-v1:0",  # E.g "anthropic.claude-3-5-sonnet-20240620-v1:0"
}

bedrock_llm = ChatBedrockConverse(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    base_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model=config["llm"],
)

bedrock_llm = LangchainLLMWrapper(bedrock_llm)

## tracing

In [22]:
import os
os.environ["LANGCHAIN_PROJECT"]= "Alignment"
os.environ["LANGCHAIN_TRACING_V2"] = "true"

## Evaluation

In [10]:
from ragas.metrics._aspect_critic import AspectCriticWithReference
from ragas import EvaluationDataset
from datasets import load_dataset, Dataset
dataset = Dataset.from_json("datasets/dataset_v3.json")

## Aspect Critic

In [37]:
critic = AspectCriticWithReference(name="answer correctness",
                      definition="Given the user_input, reference and response. Is the response factually accurate compared to reference",)


In [38]:
critic.llm = llm_4o_mini


In [39]:
result = evaluate(dataset=dataset,metrics=[critic])

Evaluating: 100%|███████████████████████████████████████| 50/50 [00:10<00:00,  4.55it/s]


In [40]:
result

{'answer correctness': 0.2800}

In [41]:
df = result.to_pandas()

In [42]:
df

Unnamed: 0,user_input,response,reference,answer correctness
0,How did the invention of the wheel impact anci...,# The Revolutionary Impact of the Wheel on Anc...,The invention of the wheel was a pivotal momen...,0
1,How did the discovery of fire impact early hum...,# The Transformative Power of Fire in Early Hu...,The discovery of fire was a pivotal moment in ...,1
2,What were the major impacts of the Agricultura...,# The Transformative Impacts of the Agricultur...,"The Agricultural Revolution, which began aroun...",1
3,What were the key events and consequences of t...,# The Fall of Constantinople: A Turning Point ...,"The Fall of Constantinople occurred on May 29,...",0
4,How did the birth of democracy in Athens shape...,# The Birth of Democracy in Athens: A Revoluti...,"The birth of democracy in Athens, around the 5...",0
5,What were the significant impacts of the signi...,# The Magna Carta: A Cornerstone of Constituti...,The signing of the Magna Carta in 1215 had pro...,0
6,What were the causes and impacts of The Black ...,# The Black Death: Causes and Impacts on Medie...,"The Black Death, which swept through Europe in...",1
7,What were the significant impacts of Columbus'...,# The Transformative Impact of Columbus' 1492 ...,Columbus' voyage to the Americas in 1492 had p...,1
8,What was the Protestant Reformation?,# The Protestant Reformation: A Revolution in ...,The Protestant Reformation was a 16th-century ...,0
9,What were the major impacts of the Industrial ...,# The Transformative Impacts of the Industrial...,"The Industrial Revolution, which began in the ...",0
