Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clean up embeddings for ragas and add docs for azure embeddings #477

Merged
merged 9 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
218 changes: 106 additions & 112 deletions docs/howtos/customisations/azure-openai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 1,
"id": "b658e02f",
"metadata": {},
"outputs": [
Expand All @@ -44,7 +44,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5584ed49477f4e788fc4e9b8ab3dc50d",
"model_id": "8cfcf43797d746c6a35d2c9eb9512abc",
"version_major": 2,
"version_minor": 0
},
Expand All @@ -66,7 +66,7 @@
"})"
]
},
"execution_count": 11,
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -79,37 +79,12 @@
"fiqa_eval"
]
},
{
"cell_type": "markdown",
"id": "c77789bb",
"metadata": {},
"source": [
"### Configuring them for Azure OpenAI endpoints\n",
"\n",
"Ragas also uses AzureOpenAI for running some metrics so make sure you have your Azure OpenAI key, base URL and other information available in your environment. You can check the [langchain docs](https://python.langchain.com/docs/integrations/llms/azure_openai) or the [Azure docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/switching-endpoints) for more information."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0b7179f7",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_TYPE\"] = \"azure\"\n",
"os.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\"\n",
"os.environ[\"OPENAI_API_BASE\"] = \"...\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your-openai-key\""
]
},
{
"cell_type": "markdown",
"id": "d4b8a69c",
"metadata": {},
"source": [
"Lets import metrics that we are going to use"
"Lets import metrics that we are going to use. To learn more about what each metrics do, check out this [doc](https://docs.ragas.io/en/latest/concepts/metrics/index.html)"
]
},
{
Expand Down Expand Up @@ -139,68 +114,88 @@
},
{
"cell_type": "markdown",
"id": "f1201199",
"id": "c77789bb",
"metadata": {},
"source": [
"Now lets swap out the default `ChatOpenAI` with `AzureChatOpenAI`. Init a new instance of `AzureChatOpenAI` with the `deployment_name` of the model you want to use. You will also have to change the `OpenAIEmbeddings` in the metrics that use them, which in our case is `answer_relevance`.\n",
"### Configuring them for Azure OpenAI endpoints\n",
"\n",
"Ragas also uses AzureOpenAI for running some metrics so make sure you have your Azure OpenAI key, base URL and other information available in your environment. You can check the [langchain docs](https://python.langchain.com/docs/integrations/llms/azure_openai) or the [Azure docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/switching-endpoints) for more information.\n",
"\n",
"Now in order to use the new `AzureChatOpenAI` llm instance with Ragas metrics, you have to create a new instance of `RagasLLM` using the `ragas.llms.LangchainLLM` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them."
"\n",
"But basically you need the following information."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "68c6f3be-7935-401a-abdf-5eab73d7fe41",
"metadata": {},
"outputs": [],
"source": [
"azure_configs = {\n",
" \"base_url\": \"https://<your-endpoint>.openai.azure.com/\",\n",
" \"model_deployment\": \"your-deployment-name\",\n",
" \"model_name\": \"your-model-name\",\n",
" \"embedding_deployment\": \"your-deployment-name\",\n",
" \"embedding_name\": \"text-embedding-ada-002\", # most likely\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "40406a26",
"id": "be0540bb-98c5-4bc9-89dc-0ee10a330e2c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import AzureChatOpenAI\n",
"from langchain.embeddings import AzureOpenAIEmbeddings\n",
"from ragas.llms import LangchainLLM\n",
"\n",
"# Import evaluate before patching the RagasLLM instance\n",
"from ragas import evaluate\n",
"\n",
"azure_model = AzureChatOpenAI(\n",
" deployment_name=\"your-deployment-name\",\n",
" model=\"your-model-name\",\n",
" openai_api_base=\"https://your-endpoint.openai.azure.com/\",\n",
" openai_api_type=\"azure\",\n",
")\n",
"# wrapper around azure_model\n",
"ragas_azure_model = LangchainLLM(azure_model)\n",
"# patch the new RagasLLM instance\n",
"answer_relevancy.llm = ragas_azure_model\n",
"import os\n",
"\n",
"# init and change the embeddings\n",
"# only for answer_relevancy\n",
"azure_embeddings = AzureOpenAIEmbeddings(\n",
" deployment=\"your-embeddings-deployment-name\",\n",
" model=\"your-embeddings-model-name\",\n",
" openai_api_base=\"https://your-endpoint.openai.azure.com/\",\n",
" openai_api_type=\"azure\",\n",
")\n",
"# embeddings can be used as it is\n",
"answer_relevancy.embeddings = azure_embeddings"
"# assuming you already have you key available via your environment variable. If not use this\n",
"# os.environ[\"AZURE_OPENAI_API_KEY\"] = \"...\""
]
},
{
"cell_type": "markdown",
"id": "44641e41",
"id": "36bb5f1e-14bd-4648-a6e2-72ff980550e0",
"metadata": {},
"source": [
"This replaces the default llm of `answer_relevency` with the Azure OpenAI endpoint. Now with some `__setattr__` magic lets change it for all other metrics."
"Now lets create the chat model and embedding model instances so that ragas can use it for evaluation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52d9f5f3",
"execution_count": 6,
"id": "50110d32-8ac7-47ae-a75f-53f4dee694e3",
"metadata": {},
"outputs": [],
"source": [
"for m in metrics:\n",
" m.__setattr__(\"llm\", ragas_azure_model)"
"from langchain_openai.chat_models import AzureChatOpenAI\n",
"from langchain_openai.embeddings import AzureOpenAIEmbeddings\n",
"from ragas import evaluate\n",
"\n",
"azure_model = AzureChatOpenAI(\n",
" openai_api_version=\"2023-05-15\",\n",
" azure_endpoint=azure_configs[\"base_url\"],\n",
" azure_deployment=azure_configs[\"model_deployment\"],\n",
" model=azure_configs[\"model_name\"],\n",
" validate_base_url=False,\n",
")\n",
"\n",
"# init the embeddings for answer_relevancy, answer_correctness and answer_similarity\n",
"azure_embeddings = AzureOpenAIEmbeddings(\n",
" openai_api_version=\"2023-05-15\",\n",
" azure_endpoint=azure_configs[\"base_url\"],\n",
" azure_deployment=azure_configs[\"embedding_deployment\"],\n",
" model=azure_configs[\"embedding_name\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "44641e41",
"metadata": {},
"source": [
"In case of any doubts on how to configure the Azure endpont through langchain do reffer to the [AzureChatOpenai](https://python.langchain.com/docs/integrations/chat/azure_chat_openai) and [AzureOpenAIEmbeddings](https://python.langchain.com/docs/integrations/text_embedding/azureopenai) documentations from the langchain docs."
]
},
{
Expand All @@ -215,39 +210,38 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 8,
"id": "22eb6f97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"evaluating with [context_recall]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 1/1 [00:51<00:00, 51.87s/it]\n"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e87bf94ac59448c48faf5bcc03667522",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Evaluating: 0%| | 0/150 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{'context_recall': 0.7750}"
"{'faithfulness': 0.7083, 'answer_relevancy': 0.9416, 'context_recall': 0.7762, 'context_precision': 0.8000, 'harmfulness': 0.0000}"
]
},
"execution_count": 20,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = evaluate(\n",
" fiqa_eval[\"baseline\"],\n",
" metrics=metrics,\n",
" fiqa_eval[\"baseline\"], metrics=metrics, llm=azure_model, embeddings=azure_embeddings\n",
")\n",
"\n",
"result"
Expand Down Expand Up @@ -294,10 +288,10 @@
" <th>ground_truths</th>\n",
" <th>answer</th>\n",
" <th>contexts</th>\n",
" <th>context_relevancy</th>\n",
" <th>faithfulness</th>\n",
" <th>answer_relevancy</th>\n",
" <th>context_recall</th>\n",
" <th>context_precision</th>\n",
" <th>harmfulness</th>\n",
" </tr>\n",
" </thead>\n",
Expand All @@ -308,10 +302,10 @@
" <td>[Have the check reissued to the proper payee.J...</td>\n",
" <td>\\nThe best way to deposit a cheque issued to a...</td>\n",
" <td>[Just have the associate sign the back and the...</td>\n",
" <td>0.088301</td>\n",
" <td>0.666667</td>\n",
" <td>0.976247</td>\n",
" <td>0.111111</td>\n",
" <td>1.0</td>\n",
" <td>0.982491</td>\n",
" <td>0.888889</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
Expand All @@ -320,10 +314,10 @@
" <td>[Sure you can. You can fill in whatever you w...</td>\n",
" <td>\\nYes, you can send a money order from USPS as...</td>\n",
" <td>[Sure you can. You can fill in whatever you w...</td>\n",
" <td>0.191611</td>\n",
" <td>1.0</td>\n",
" <td>0.995249</td>\n",
" <td>1.000000</td>\n",
" <td>0.883586</td>\n",
" <td>0.800000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
Expand All @@ -332,10 +326,10 @@
" <td>[You're confusing a lot of things here. Compan...</td>\n",
" <td>\\nYes, it is possible to have one EIN doing bu...</td>\n",
" <td>[You're confusing a lot of things here. Compan...</td>\n",
" <td>0.069420</td>\n",
" <td>1.000000</td>\n",
" <td>0.928548</td>\n",
" <td>1.0</td>\n",
" <td>0.948876</td>\n",
" <td>1.000000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
Expand All @@ -344,10 +338,10 @@
" <td>[\"I'm afraid the great myth of limited liabili...</td>\n",
" <td>\\nApplying for and receiving business credit c...</td>\n",
" <td>[Set up a meeting with the bank that handles y...</td>\n",
" <td>0.408924</td>\n",
" <td>1.0</td>\n",
" <td>0.813285</td>\n",
" <td>1.000000</td>\n",
" <td>0.906223</td>\n",
" <td>0.187500</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
Expand All @@ -356,10 +350,10 @@
" <td>[You should probably consult an attorney. Howe...</td>\n",
" <td>\\nIf your employer has closed and you need to ...</td>\n",
" <td>[The time horizon for your 401K/IRA is essenti...</td>\n",
" <td>0.064802</td>\n",
" <td>0.666667</td>\n",
" <td>0.889312</td>\n",
" <td>0.0</td>\n",
" <td>0.894836</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
Expand Down Expand Up @@ -388,19 +382,19 @@
"3 \\nApplying for and receiving business credit c... \n",
"4 \\nIf your employer has closed and you need to ... \n",
"\n",
" contexts context_relevancy \\\n",
"0 [Just have the associate sign the back and the... 0.088301 \n",
"1 [Sure you can. You can fill in whatever you w... 0.191611 \n",
"2 [You're confusing a lot of things here. Compan... 0.069420 \n",
"3 [Set up a meeting with the bank that handles y... 0.408924 \n",
"4 [The time horizon for your 401K/IRA is essenti... 0.064802 \n",
" contexts faithfulness \\\n",
"0 [Just have the associate sign the back and the... 1.0 \n",
"1 [Sure you can. You can fill in whatever you w... 1.0 \n",
"2 [You're confusing a lot of things here. Compan... 1.0 \n",
"3 [Set up a meeting with the bank that handles y... 1.0 \n",
"4 [The time horizon for your 401K/IRA is essenti... 0.0 \n",
"\n",
" faithfulness answer_relevancy context_recall harmfulness \n",
"0 0.666667 0.976247 0.111111 0 \n",
"1 1.000000 0.883586 0.800000 0 \n",
"2 1.000000 0.928548 1.000000 0 \n",
"3 1.000000 0.906223 0.187500 0 \n",
"4 0.666667 0.889312 0.000000 0 "
" answer_relevancy context_recall context_precision harmfulness \n",
"0 0.982491 0.888889 1.0 0 \n",
"1 0.995249 1.000000 1.0 0 \n",
"2 0.948876 1.000000 1.0 0 \n",
"3 0.813285 1.000000 1.0 0 \n",
"4 0.894836 0.000000 0.0 0 "
]
},
"execution_count": 9,
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ dependencies = [
"tiktoken",
"langchain",
"langchain-core",
"langchain_openai",
"openai>1",
"pysbd>=0.3.4",
"nest-asyncio",
Expand Down
11 changes: 1 addition & 10 deletions src/ragas/embeddings/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,6 @@
from ragas.embeddings.base import (
AzureOpenAIEmbeddings,
BaseRagasEmbeddings,
FastEmbedEmbeddings,
HuggingfaceEmbeddings,
OpenAIEmbeddings,
)
from ragas.embeddings.base import BaseRagasEmbeddings, HuggingfaceEmbeddings

__all__ = [
"HuggingfaceEmbeddings",
"OpenAIEmbeddings",
"AzureOpenAIEmbeddings",
"BaseRagasEmbeddings",
"FastEmbedEmbeddings",
]
Loading
Loading