# Using Amazon Bedrock

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.

This tutorial will show you how to use Amazon Bedrock endpoints and LangChain.

:::{Note}
this guide is for folks who are using the Amazon Bedrock endpoints. Check the [evaluation guide](../../getstarted/evaluation.md) if your using OpenAI endpoints.
:::

### Load sample dataset

In [None]:
# data
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

Lets import metrics that we are going to use

In [3]:
from bisheng_ragas.metrics import (
    context_precision,
    answer_relevancy,  # AnswerRelevancy
    faithfulness,
    context_recall,
)
from bisheng_ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]

Now lets swap out the default `ChatOpenAI` with `BedrockChat`. Init a new instance of `BedrockChat` with the `model_id` of the model you want to use. You will also have to change the `BedrockEmbeddings` in the metrics that use them, which in our case is `answer_relevance`.

Now in order to use the new `BedrockChat` llm instance with Ragas metrics, you have to create a new instance of `RagasLLM` using the `ragas.llms.LangchainLLM` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them.

In [4]:
from bisheng_ragas.llms import LangchainLLM
from langchain.chat_models import BedrockChat
from langchain.embeddings import BedrockEmbeddings

config = {
    "credentials_profile_name": "your-profile-name",  # E.g "default"
    "region_name": "your-region-name",  # E.g. "us-east-1"
    "model_id": "your-model-id",  # E.g "anthropic.claude-v2"
    "model_kwargs": {"temperature": 0.4},
}

bedrock_model = BedrockChat(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model_id=config["model_id"],
    model_kwargs=config["model_kwargs"],
)
# wrapper around bedrock_model
ragas_bedrock_model = LangchainLLM(bedrock_model)
# patch the new RagasLLM instance
answer_relevancy.llm = ragas_bedrock_model

# init and change the embeddings
# only for answer_relevancy
bedrock_embeddings = BedrockEmbeddings(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
)
# embeddings can be used as it is
answer_relevancy.embeddings = bedrock_embeddings

This replaces the default llm of `answer_relevancy` with the Amazon Bedrock endpoint. Now with some `__setattr__` magic lets change it for all other metrics.

In [5]:
for m in metrics:
    m.__setattr__("llm", ragas_bedrock_model)

### Evaluation

Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice.

In [8]:
from bisheng_ragas import evaluate
import nest_asyncio  # CHECK NOTES

# NOTES: Only used when running on a jupyter notebook, otherwise comment or remove this function.
nest_asyncio.apply()

result = evaluate(
    fiqa_eval["baseline"],
    metrics=metrics,
)

result

evaluating with [faithfulness]


100%|█████████████████████████████████████████████████████████████| 2/2 [01:22<00:00, 41.24s/it]


evaluating with [answer_relevancy]


100%|█████████████████████████████████████████████████████████████| 2/2 [01:21<00:00, 40.59s/it]


evaluating with [context_recall]


100%|█████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.22s/it]


evaluating with [context_precision]


100%|█████████████████████████████████████████████████████████████| 2/2 [00:59<00:00, 29.85s/it]


evaluating with [harmfulness]


100%|█████████████████████████████████████████████████████████████| 2/2 [00:33<00:00, 16.96s/it]


{'faithfulness': 0.9428, 'answer_relevancy': 0.7860, 'context_recall': 0.2296, 'context_precision': 0.0000, 'harmfulness': 0.0000}

and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

In [9]:
df = result.to_pandas()
df.head()

Unnamed: 0,question,contexts,answer,ground_truths,faithfulness,answer_relevancy,context_recall,context_precision,harmfulness
0,How to deposit a cheque issued to an associate...,[Just have the associate sign the back and the...,\nThe best way to deposit a cheque issued to a...,[Have the check reissued to the proper payee.J...,1.0,0.930311,0.263158,0.0,0
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...,1.0,0.984122,0.363636,0.0,0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...,1.0,0.883872,0.363636,0.0,0
3,Applying for and receiving business credit,[Set up a meeting with the bank that handles y...,\nApplying for and receiving business credit c...,"[""I'm afraid the great myth of limited liabili...",1.0,0.518287,0.363636,0.0,0
4,401k Transfer After Business Closure,[The time horizon for your 401K/IRA is essenti...,\nIf your employer has closed and you need to ...,[You should probably consult an attorney. Howe...,1.0,0.779471,0.0,0.0,0


And thats it!

if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁