explodinggradients · jjmachan · Aug 1, 2024 · Jul 27, 2024 · Jul 27, 2024 · Jul 30, 2024
diff --git a/docs/howtos/applications/cost.ipynb b/docs/howtos/applications/cost.ipynb
@@ -0,0 +1,213 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Understand Cost and Usage of Operations\n",
+    "\n",
+    "When using LLMs for evaluation and test set generation, cost will be an important factor. Ragas provides you some tools to help you with that."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Understanding `TokenUsageParser`\n",
+    "\n",
+    "By default Ragas does not calculate the usage of tokens for `evaluate()`. This is because langchain's LLMs do not always return information about token usage in a uniform way. So in order to get the usage data, we have to implement a `TokenUsageParser`. \n",
+    "\n",
+    "A `TokenUsageParser` is function that parses the `LLMResult` or `ChatResult` from langchain models `generate_prompt()` function and outputs `TokenUsage` which Ragas expects.\n",
+    "\n",
+    "For an example here is one that will parse OpenAI by using a parser we have defined."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "TokenUsage(input_tokens=9, output_tokens=9, model='')"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_openai.chat_models import ChatOpenAI\n",
+    "from langchain_core.prompt_values import StringPromptValue\n",
+    "\n",
+    "gpt4o = ChatOpenAI(model=\"gpt-4o\")\n",
+    "p = StringPromptValue(text=\"hai there\")\n",
+    "llm_result = gpt4o.generate_prompt([p])\n",
+    "\n",
+    "# lets import a parser for OpenAI\n",
+    "from ragas.cost import get_token_usage_for_openai\n",
+    "\n",
+    "get_token_usage_for_openai(llm_result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.\n",
+    "\n",
+    "You can use it for evaluations as so. Using example from [get started](get-started-evaluation) here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Repo card metadata block was not found. Setting CardData to empty.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "DatasetDict({\n",
+       "    eval: Dataset({\n",
+       "        features: ['question', 'ground_truth', 'answer', 'contexts'],\n",
+       "        num_rows: 20\n",
+       "    })\n",
+       "})"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "from ragas.metrics import (\n",
+    "    answer_relevancy,\n",
+    "    faithfulness,\n",
+    "    context_recall,\n",
+    "    context_precision,\n",
+    ")\n",
+    "\n",
+    "amnesty_qa = load_dataset(\"explodinggradients/amnesty_qa\", \"english_v2\")\n",
+    "amnesty_qa"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "086c27f6a34d4843be5fa367e913b87a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Evaluating:   0%|          | 0/80 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from ragas import evaluate\n",
+    "from ragas.cost import get_token_usage_for_openai\n",
+    "\n",
+    "result = evaluate(\n",
+    "    amnesty_qa[\"eval\"],\n",
+    "    metrics=[\n",
+    "        context_precision,\n",
+    "        faithfulness,\n",
+    "        answer_relevancy,\n",
+    "        context_recall,\n",
+    "    ],\n",
+    "    llm=gpt4o,\n",
+    "    token_usage_parser=get_token_usage_for_openai,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "TokenUsage(input_tokens=112736, output_tokens=30306, model='')"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result.total_tokens()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.\n",
+    "\n",
+    "In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1.01827"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ragas",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/howtos/applications/index.md b/docs/howtos/applications/index.md
@@ -7,6 +7,7 @@ usecases to solve problems you might encounter when your building.
 :caption: Applications
 
 data_preparation
+cost
 compare_embeddings
 compare_llms
 custom_prompts

diff --git a/src/experimental/ragas_experimental/testset/extractors/regex_based.py b/src/experimental/ragas_experimental/testset/extractors/regex_based.py
@@ -77,7 +77,7 @@ def merge_extractors(self, *extractors) -> t.List[Extractor]:
                     if pattern.startswith("(?"):
                         flag_end = pattern.index(")")
                         flags = pattern[2:flag_end]
-                        pattern = pattern[flag_end + 1:]
+                        pattern = pattern[flag_end + 1 :]
                     # Wrap the pattern in a non-capturing group with flags
                     processed_patterns.append(f"(?{flags}:{pattern})")
 
@@ -97,6 +97,7 @@ def merge_extractors(self, *extractors) -> t.List[Extractor]:
             )
         return extractors_to_return
 
+
 links_extractor_pattern = r"(?i)\b(?:https?://|www\.)\S+\b"
 emails_extractor_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
 markdown_headings_pattern = r"^(#{1,6})\s+(.*)"
@@ -109,5 +110,5 @@ def merge_extractors(self, *extractors) -> t.List[Extractor]:
 )
 markdown_headings = RulebasedExtractor(
     regex=Regex(name="markdown_headings", pattern=markdown_headings_pattern),
-    is_multiline=True
+    is_multiline=True,
 )
diff --git a/src/experimental/ragas_experimental/testset/loaders/__init__.py b/src/experimental/ragas_experimental/testset/loaders/__init__.py
@@ -1,3 +1,3 @@
 from ragas_experimental.testset.loaders.ragas_loader import RAGASLoader
 
-__all__ = ["RAGASLoader"]
+__all__ = ["RAGASLoader"]