# Using Amazon Bedrock

This tutorial will show you how to use Amazon Bedrock endpoints and LangChain.

In [1]:
%pip install git+https://github.com/austinmw/ragas@bedrock

### Load sample dataset

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# Sample dataset

from datasets import load_dataset
fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
ds = fiqa_eval['baseline']
ds

Found cached dataset fiqa (/home/ec2-user/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)


  0%|          | 0/1 [00:00<?, ?it/s]

Dataset({
    features: ['question', 'ground_truths', 'answer', 'contexts'],
    num_rows: 30
})

Or use your own Parquet dataset. Required columns are:
- `question`: `str` — original question
- `ground_truths`: `List[str]` — ground truth answer(s) (accepts a list in case you'd like to provide multiple answer variations)
- `answer`: `str` — generated answer
- `contexts`: `List[str]` — retrieved document chunks

In [4]:
# from datasets import Dataset

# ds = Dataset.from_parquet('/path/to/data/dataset.parquet')
# ds

In [5]:
# Inspect the dataset
df = ds.to_pandas()
df.head()

Unnamed: 0,question,ground_truths,answer,contexts
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...
3,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...
4,401k Transfer After Business Closure,[You should probably consult an attorney. Howe...,\nIf your employer has closed and you need to ...,[The time horizon for your 401K/IRA is essenti...


Lets import metrics that we are going to use

In [6]:
from ragas.metrics import (
    context_recall,
    context_precision,
    answer_relevancy,
    faithfulness,
    answer_similarity,
    answer_correctness,
)
from ragas.metrics.critique import (
    coherence,
    conciseness,
)

# list of metrics we're going to use
metrics = [
    context_recall,
    context_precision,
    answer_relevancy,
    faithfulness,
    answer_similarity,
    answer_correctness,
    coherence,
    conciseness,
]

Now lets swap out the default `ChatOpenAI` with `BedrockChat`. Init a new instance of `BedrockChat` with the `model_id` of the model you want to use. You will also have to change the `BedrockEmbeddings` in the metrics that use them, which in our case is `answer_relevance`.

Now in order to use the new `BedrockChat` llm instance with Ragas metrics, you have to create a new instance of `RagasLLM` using the `ragas.llms.LangchainLLM` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them.

In [7]:
from ragas.llms import LangchainLLM
from langchain.chat_models import BedrockChat
from langchain.embeddings import BedrockEmbeddings

config = {
    "credentials_profile_name": "your-profile-name", # E.g "default"
    "region_name": "your-region-name", # E.g. "us-east-1"

    # NOTE: This has only been tested with Claude models!
    "model_id": "anthropic.claude-instant-v1", # E.g "anthropic.claude-v2"

    # NOTE: No need to set Temperature: it is set by the individual metrics (to 0.0 or 0.2 depending on the metric)
    "model_kwargs": {
        "max_tokens_to_sample": 1000,
    }
}

# Initialize BedrockChat
bedrock_model = BedrockChat(
    #credentials_profile_name=config["credentials_profile_name"],
    #region_name=config["region_name"],
    #endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model_id=config["model_id"],
    model_kwargs=config["model_kwargs"],
)

# wrapper around BedrockChat
ragas_bedrock_model = LangchainLLM(bedrock_model)

# Initialize BedrockEmbeddings
bedrock_embeddings = BedrockEmbeddings(
    #credentials_profile_name=config["credentials_profile_name"], 
    #region_name=config["region_name"]
)

# Set attributes on metrics
for m in metrics:
    m.__setattr__("llm", ragas_bedrock_model)
    m.__setattr__("embeddings", bedrock_embeddings)
    m.__setattr__("batch_size", 15)

In [8]:
from ragas.prompts.langchain import anthropic_prompts

context_recall.human_template = anthropic_prompts.CONTEXT_RECALL_HUMAN
context_recall.ai_template = anthropic_prompts.CONTEXT_RECALL_AI

context_precision.human_template = anthropic_prompts.CONTEXT_PRECISION_HUMAN
context_precision.ai_template = anthropic_prompts.CONTEXT_PRECISION_AI

answer_relevancy.human_template = anthropic_prompts.ANSWER_RELEVANCY_HUMAN
answer_relevancy.ai_template = anthropic_prompts.ANSWER_RELEVANCY_AI

faithfulness.statements_human_template = anthropic_prompts.FAITHFULNESS_STATEMENTS_HUMAN
faithfulness.statements_ai_template = anthropic_prompts.FAITHFULNESS_STATEMENTS_AI
faithfulness.verdict_human_template = anthropic_prompts.FAITHFULNESS_VERDICTS_HUMAN
faithfulness.verdict_ai_template = anthropic_prompts.FAITHFULNESS_VERDICTS_AI

answer_similarity.threshold = None

answer_correctness.answer_similarity = answer_similarity
answer_correctness.faithfulness = faithfulness

coherence.human_template = anthropic_prompts.ASPECT_CRITIQUE_HUMAN
coherence.ai_template = anthropic_prompts.ASPECT_CRITIQUE_AI
coherence.definition = (
    "Does the submission present ideas, information, or arguments in a "
    "logically sequential manner, clearly distinguishing main points from "
    "supporting details? Evaluate the structure rigorously, ensuring each "
    "part contributes directly to the overall message or argument. "
    "Disregard submissions with any tangential or poorly connected content. "
    "Be very strict!"
)
coherence.strictness = 3

conciseness.human_template = anthropic_prompts.ASPECT_CRITIQUE_HUMAN
conciseness.ai_template = anthropic_prompts.ASPECT_CRITIQUE_AI
conciseness.definition = (
    "Evaluate if the submission communicates its ideas or information using "
    "the fewest possible words, without loss of clarity. Reject submissions with "
    "any redundant, repetitive, cut-off, or extraneous details, regardless of their "
    "relevance to the main topic. The focus should be on brevity and directness. "
    "Be very strict!"
)

conciseness.strictness = 3

### Evaluation

Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice.

In [9]:
%%time

import warnings
from ragas import evaluate
import nest_asyncio

warnings.filterwarnings('ignore', message=".*promote has been superseded by mode='default'.*")

# NOTES: Only used when running on a jupyter notebook, otherwise comment or remove this function.
nest_asyncio.apply()

result = evaluate(ds, metrics=metrics)
result

evaluating with [context_recall]


100%|██████████| 2/2 [00:44<00:00, 22.21s/it]


evaluating with [context_precision]


100%|██████████| 2/2 [00:03<00:00,  1.83s/it]


evaluating with [answer_relevancy]


100%|██████████| 2/2 [00:19<00:00,  9.53s/it]


evaluating with [faithfulness]


100%|██████████| 2/2 [00:28<00:00, 14.23s/it]


evaluating with [answer_similarity]


100%|██████████| 2/2 [00:05<00:00,  2.75s/it]


evaluating with [answer_correctness]


100%|██████████| 2/2 [00:36<00:00, 18.05s/it]


evaluating with [coherence]


100%|██████████| 2/2 [00:36<00:00, 18.22s/it]


evaluating with [conciseness]


100%|██████████| 2/2 [00:30<00:00, 15.27s/it]

CPU times: user 2.26 s, sys: 183 ms, total: 2.45 s
Wall time: 3min 24s





{'context_recall': 0.7011, 'context_precision': 0.8000, 'answer_relevancy': 0.8441, 'faithfulness': 0.9178, 'answer_similarity': 0.6880, 'answer_correctness': 0.6784, 'coherence': 1.0000, 'conciseness': 0.3333}

In [10]:
average_results = result.copy()
average_results

{'context_recall': 0.701111111111111,
 'context_precision': 0.8,
 'answer_relevancy': 0.8441328053920018,
 'faithfulness': 0.9177777777777778,
 'answer_similarity': 0.6879575691920328,
 'answer_correctness': 0.6783835465007783,
 'coherence': 1.0,
 'conciseness': 0.3333333333333333}

and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

In [11]:
df = result.to_pandas()
df

Unnamed: 0,question,contexts,answer,ground_truths,context_recall,context_precision,answer_relevancy,faithfulness,answer_similarity,answer_correctness,coherence,conciseness
0,How to deposit a cheque issued to an associate...,[Just have the associate sign the back and the...,\nThe best way to deposit a cheque issued to a...,[Have the check reissued to the proper payee.J...,0.5,1.0,0.941635,0.666667,0.740307,0.53682,1,0
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...,1.0,1.0,0.95033,1.0,0.871439,0.935719,1,0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...,0.8,1.0,0.917308,1.0,0.610433,0.63855,1,0
3,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,\nApplying for and receiving business credit c...,"[""I'm afraid the great myth of limited liabili...",1.0,1.0,0.633024,1.0,0.662891,0.831446,1,1
4,401k Transfer After Business Closure,[The time horizon for your 401K/IRA is essenti...,\nIf your employer has closed and you need to ...,[You should probably consult an attorney. Howe...,0.0,1.0,0.814123,1.0,0.259844,0.254922,1,0
5,What are the ins/outs of writing equipment pur...,[You would report it as business income on Sch...,\nWriting equipment purchases off as business ...,[Most items used in business have to be deprec...,0.333333,1.0,0.893774,1.0,0.720885,0.717585,1,0
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,"\nYes, an entrepreneur can hire a self-employe...",[Yes. I can by all means start my own company ...,1.0,1.0,0.939841,1.0,0.645683,0.822841,1,0
7,Intentions of Deductible Amount for Small Busi...,"[""Short answer, yes. But this is not done thro...",\nThe intention of deductible amounts for smal...,"[""If your sole proprietorship losses exceed al...",0.0,0.0,0.76645,1.0,0.425558,0.212779,1,0
8,How can I deposit a check made out to my busin...,"[""I have checked with Bank of America, and the...",\nYou can deposit a check made out to your bus...,[You should have a separate business account. ...,0.6,0.0,0.934145,1.0,0.759963,0.504982,1,0
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,[Depends whom the 1099 was issued to. If it wa...,1.0,1.0,0.895516,1.0,0.739187,0.869593,1,0


### Logs

You can access the logs for each metric from metric objects themselves

In [12]:
# For example,
context_recall.logs.keys()

dict_keys(['prompts', 'responses', 'sentences', 'scores'])