explodinggradients · shahules786 · Dec 17, 2023 · Dec 15, 2023 · Dec 16, 2023 · Dec 16, 2023
diff --git a/docs/howtos/customisations/embeddings.ipynb b/docs/howtos/customisations/embeddings.ipynb
@@ -0,0 +1,381 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0174eb96",
+   "metadata": {},
+   "source": [
+    "# Using different Embedding Models\n",
+    "\n",
+    "Ragas allows users to change the default embedding model used in the evaluation task.\n",
+    "\n",
+    "This guide will show you how to use different embedding models for evaluation in Ragas."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55f0f9b9",
+   "metadata": {},
+   "source": [
+    "## Evaluating with Azure Open AI Embeddings\n",
+    "\n",
+    "Ragas uses open-ai embeddings by default. In this example we can use Azure Open AI Embeddings from langchain with the embedding model text-embedding-ada-002. We will be using gpt-35-turbo-16k from Azure OpenAI as the llm for evaluation and `AnswerSimilarity` as the metric\n",
+    "\n",
+    "To start-off, we initialise the gpt-35-turbo-16k from Azure and create a chat_model using langchain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "25c72521-3372-4663-81e4-c159b0edde40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from langchain.chat_models import AzureChatOpenAI\n",
+    "from ragas.llms import LangchainLLM\n",
+    "\n",
+    "os.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\"\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"your-openai-key\"\n",
+    "\n",
+    "azure_model = AzureChatOpenAI(\n",
+    "    deployment_name=\"your-deployment-name\",\n",
+    "    model=\"gpt-35-turbo-16k\",\n",
+    "    azure_endpoint=\"https://your-endpoint.openai.azure.com/\",\n",
+    "    openai_api_type=\"azure\",\n",
+    ")\n",
+    "# wrapper around azure_model\n",
+    "ragas_azure_model = LangchainLLM(azure_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1fdb48b",
+   "metadata": {},
+   "source": [
+    "In order to use the Azure Open AI embedding, we have to instantiate an object of the `AzureOpenAIEmbeddings` class in Ragas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import AzureOpenAIEmbeddings\n",
+    "\n",
+    "azure_embeddings = AzureOpenAIEmbeddings(\n",
+    "    deployment=\"your-deployment-name\",\n",
+    "    model=\"text-embedding-ada-002\",\n",
+    "    azure_endpoint=\"https://your-endpoint.openai.azure.com/\",\n",
+    "    openai_api_type=\"azure\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62645da8-6a52-4cbb-bec7-59f7e153cd38",
+   "metadata": {},
+   "source": [
+    "To use the AzureOpenAIEmbeddings with the AnswerSimilarity metric, create an object of AnswerSimilarity by passing the azure_embeddings and llm as parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "307321ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ragas.metrics import AnswerSimilarity\n",
+    "\n",
+    "answer_similarity = AnswerSimilarity(llm=ragas_azure_model, embeddings=azure_embeddings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1930dd49",
+   "metadata": {},
+   "source": [
+    "That's it! answer_similarity will now be using AzureOpenAIEmbeddings under the hood for evaluations.\n",
+    "\n",
+    "Now lets run the evaluations using the example from [quickstart](../../getstarted/evaluation.md)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "62c0eadb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DatasetDict({\n",
+       "    baseline: Dataset({\n",
+       "        features: ['question', 'ground_truths', 'answer', 'contexts'],\n",
+       "        num_rows: 30\n",
+       "    })\n",
+       "})"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# data\n",
+    "from datasets import load_dataset\n",
+    "\n",
+    "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
+    "fiqa_eval"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "c4396f6e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "evaluating with [answer_similarity]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 1/1 [00:01<00:00,  1.04s/it]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'answer_similarity': 0.8878}"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# evaluate\n",
+    "from ragas import evaluate\n",
+    "\n",
+    "result = evaluate(\n",
+    "    fiqa_eval[\"baseline\"].select(range(5)),  # showing only 5 for demonstration\n",
+    "    metrics=[answer_similarity]\n",
+    ")\n",
+    "\n",
+    "result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f490031e-fb73-4170-8762-61cadb4031e6",
+   "metadata": {},
+   "source": [
+    "## Evaluating with FastEmbed Embeddings\n",
+    "\n",
+    "`FastEmbed` is a Python library built for embedding generation and has support for popular text models. Ragas has integration with FastEmbed and can be used by instantiating an object of the FastEmbedEmbeddings class. More information regarding FastEmbed and supported models can be found [here](https://github.com/qdrant/fastembed)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "85e313f2-e45c-4551-ab20-4e526e098740",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 252M/252M [00:10<00:00, 25.0MiB/s] \n"
+     ]
+    }
+   ],
+   "source": [
+    "from ragas.embeddings import FastEmbedEmbeddings\n",
+    "\n",
+    "fast_embeddings = FastEmbedEmbeddings(model_name=\"BAAI/bge-base-en\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9ddf74a-9830-4e1a-a4dd-7e5ec17a71e4",
+   "metadata": {},
+   "source": [
+    "Now lets create the metric object for AnswerSimilarity by passing the llm and embedding as the `FastEmbedEmbeddings` object that we created."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "2fd4adf3-db15-4c95-bf7c-407266517214",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer_similarity2 = AnswerSimilarity(llm=ragas_azure_model, embeddings=fast_embeddings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58a610f2-19e5-40ec-bb7d-760c1d608a85",
+   "metadata": {},
+   "source": [
+    "Now you can run the evaluations with and analyse the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "20882d05-1b54-4d17-88a0-f7ada2d6a576",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "evaluating with [answer_similarity]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 1/1 [00:03<00:00,  3.26s/it]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'answer_similarity': 0.8938}"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result2 = evaluate(\n",
+    "    fiqa_eval[\"baseline\"].select(range(5)),  # showing only 5 for demonstration\n",
+    "    metrics=[answer_similarity2],\n",
+    ")\n",
+    "\n",
+    "result2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Evaluating with HuggingFace Embeddings\n",
+    "\n",
+    "Ragas has support for using embedding models using HuggingFace. Using the `HuggingfaceEmbeddings` class in Ragas, the embedding models supported by HuggingFace can directly be used for the evaluation task."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# To use embedding models from HuggingFace, you need to install the following\n",
+    "%pip install sentence-transformers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ragas.embeddings import HuggingfaceEmbeddings\n",
+    "\n",
+    "hf_embeddings = HuggingfaceEmbeddings(model_name=\"BAAI/bge-small-en\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we follow the same steps as above to use the HuggingFace Embeddings in the ragas metrics and evaluate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer_similarity3 = AnswerSimilarity(llm=ragas_azure_model, embeddings=hf_embeddings)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "evaluating with [answer_similarity]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 1/1 [00:01<00:00,  1.35s/it]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'answer_similarity': 0.9156}"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result3 = evaluate(\n",
+    "    fiqa_eval[\"baseline\"].select(range(5)),  # showing only 5 for demonstration\n",
+    "    metrics=[answer_similarity3],\n",
+    ")\n",
+    "\n",
+    "result3"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/howtos/customisations/index.md b/docs/howtos/customisations/index.md
@@ -4,6 +4,7 @@ How to customize Ragas for your needs
 
 :::{toctree}
 llms.ipynb
+embeddings.ipynb
 azure-openai.ipynb
 aws-bedrock.ipynb
 gcp-vertexai.ipynb