# Using Azure OpenAI

This tutorial will show you how to use Azure OpenAI endpoints instead of OpenAI endpoints.

:::{Note}
this guide is for folks who are using the Azure OpenAI endpoints. Check the [evaluation guide](../../getstarted/evaluation.md) if your using OpenAI endpoints.
:::

In [None]:
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["AZURE_OPENAI_ENDPOINT"] = ""
os.environ["OPENAI_API_KEY"] = ""

### Load sample dataset

In [2]:
# data
from datasets import load_dataset

eval_dataset = load_dataset("explodinggradients/amnesty_qa", "english")
eval_dataset

  from .autonotebook import tqdm as notebook_tqdm
Found cached dataset amnesty_qa (/Users/shahules/.cache/huggingface/datasets/explodinggradients___amnesty_qa/english/1.0.0/f7ca2ec9440beb13bf4f80092e9574ee0aa220cc6f37a702e55514dad061d6b6)
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 227.37it/s]


DatasetDict({
    train: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 20
    })
})

### Configuring them for Azure OpenAI endpoints

Ragas also uses AzureOpenAI for running some metrics so make sure you have your Azure OpenAI key, base URL and other information available in your environment. You can check the [langchain docs](https://python.langchain.com/docs/integrations/llms/azure_openai) or the [Azure docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/switching-endpoints) for more information.

Lets import metrics that we are going to use

In [4]:
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
    answer_correctness,
)
from ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
    harmfulness,
    
]



Make sure you have installed `langchain-openai`. Now lets swap out the default `ChatOpenAI` with `AzureChatOpenAI`. Init a new instance of `AzureChatOpenAI` with the `deployment_name` of the model you want to use. You will also have to change the `OpenAIEmbeddings` in the metrics that use them, which in our case is `answer_relevance`.

Now in order to use the new `AzureChatOpenAI` llm instance with Ragas metrics, you have to create a new instance of `BaseRagasLLM` using the `ragas.llms.LangchainLLMWrapper` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them.

In [10]:


from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper

# Import evaluate before patching the RagasLLM instance
from ragas import evaluate

azure_model = AzureChatOpenAI(
    name="gpt-35-16k",
    model="gpt-35-turbo-16k",
    openai_api_type="azure",
)
# wrapper around azure_model
ragas_azure_model = LangchainLLMWrapper(azure_model)
# patch the new RagasLLM instance
answer_relevancy.llm = ragas_azure_model

# init and change the embeddings
# only for answer_relevancy
azure_embeddings = AzureOpenAIEmbeddings(
    deployment="ada2",
    model="text-embedding-ada-002",
    openai_api_type="azure",
)

This replaces the default llm of `answer_relevency` with the Azure OpenAI endpoint. Now with some `__setattr__` magic lets change it for all other metrics.

In [11]:
for m in metrics:
    m.__setattr__("llm", ragas_azure_model)

Some of the metrics need embeddings to calculate the score, we will also change them to use azure_embeddings

In [12]:
# embeddings can be used as it is
answer_relevancy.embeddings = azure_embeddings
answer_correctness.answer_similarity.embeddings = azure_embeddings

### Evaluation

Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice.

In [13]:
result = evaluate(
    eval_dataset["train"].select(range(0,2)),
    metrics=metrics,
)

result

  warn_deprecated(
Evaluating:   0%|                                                | 0/12 [00:00<?, ?it/s]


IndexError: Invalid key: 0 is out of bounds for size 0

and there you have the it, all the scores you need.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

In [None]:
df = result.to_pandas()
df.head()

And thats it!

if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you üòÅ