Arcee.ai LLM & Retriever integration (langchain-ai#11579)

- **Description:** This PR introduces a new LLM and Retriever API to https://arcee.ai for the python client - **Issue:** implements the integrations as requested in langchain-ai#11578 , - **Dependencies:** no dependencies are required, - **Tag maintainer:** @hwchase17 - **Twitter handle:** shwooobham **✅ `make format`, `make lint` and `make test` runs locally.** ```shell =========== 1245 passed, 277 skipped, 20 warnings in 16.26s =========== ./scripts/check_pydantic.sh . ./scripts/check_imports.sh poetry run ruff . [ "." = "" ] || poetry run black . --check All done! ✨ 🍰 ✨ 1818 files would be left unchanged. [ "." = "" ] || poetry run mypy . Success: no issues found in 1815 source files [ "." = "" ] || poetry run black . All done! ✨ 🍰 ✨ 1818 files left unchanged. [ "." = "" ] || poetry run ruff --select I --fix . poetry run codespell --toml pyproject.toml poetry run codespell --toml pyproject.toml -w ``` **Contributions** 1. Arcee (langchain/llms), ArceeRetriever (langchain/retrievers), ArceeWrapper (langchain/utilities) 2. docs for Arcee (llms/arcee.py) and ArceeRetriever(retrievers/arcee.py) 3. cc: @Jacobsolawetz @Ben-Epstein --------- Co-authored-by: Shubham <shubham@sORo.local>
hoanq1811 · Feb 2, 2024 · 18e3b0b · 18e3b0b
1 parent cc4cfb8
commit 18e3b0b
Show file tree

Hide file tree

Showing 8 changed files with 773 additions and 0 deletions.
diff --git a/docs/docs_skeleton/docs/integrations/llms/arcee.ipynb b/docs/docs_skeleton/docs/integrations/llms/arcee.ipynb
@@ -0,0 +1,146 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Arcee\n",
+    "This notebook demonstrates how to use the `Arcee` class for generating text using Arcee's Domain Adapted Language Models (DALMs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setup\n",
+    "\n",
+    "Before using Arcee, make sure the Arcee API key is set as `ARCEE_API_KEY` environment variable. You can also pass the api key as a named parameter."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.llms import Arcee\n",
+    "\n",
+    "# Create an instance of the Arcee class\n",
+    "arcee = Arcee(\n",
+    "    model=\"DALM-PubMed\",\n",
+    "    # arcee_api_key=\"ARCEE-API-KEY\" # if not already set in the environment\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional Configuration\n",
+    "\n",
+    "You can also configure Arcee's parameters such as `arcee_api_url`, `arcee_app_url`, and `model_kwargs` as needed.\n",
+    "Setting the `model_kwargs` at the object initialization uses the parameters as default for all the subsequent calls to the generate response."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "arcee = Arcee(\n",
+    "    model=\"DALM-Patent\",\n",
+    "    # arcee_api_key=\"ARCEE-API-KEY\", # if not already set in the environment\n",
+    "    arcee_api_url=\"https://custom-api.arcee.ai\", # default is https://api.arcee.ai\n",
+    "    arcee_app_url=\"https://custom-app.arcee.ai\", # default is https://app.arcee.ai\n",
+    "    model_kwargs={\n",
+    "        \"size\": 5,\n",
+    "        \"filters\": [\n",
+    "            {\n",
+    "                \"field_name\": \"document\",\n",
+    "                \"filter_type\": \"fuzzy_search\",\n",
+    "                \"value\": \"Einstein\"\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Generating Text\n",
+    "\n",
+    "You can generate text from Arcee by providing a prompt. Here's an example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate text\n",
+    "prompt = \"Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?\"\n",
+    "response = arcee(prompt)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional parameters\n",
+    "\n",
+    "Arcee allows you to apply `filters` and set the `size` (in terms of count) of retrieved document(s) to aid text generation. Filters help narrow down the results. Here's how to use these parameters:\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define filters\n",
+    "filters = [\n",
+    "    {\n",
+    "        \"field_name\": \"document\",\n",
+    "        \"filter_type\": \"fuzzy_search\",\n",
+    "        \"value\": \"Einstein\"\n",
+    "    },\n",
+    "    {\n",
+    "        \"field_name\": \"year\",\n",
+    "        \"filter_type\": \"strict_search\",\n",
+    "        \"value\": \"1905\"\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "# Generate text with filters and size params\n",
+    "response = arcee(prompt, size=5, filters=filters)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/docs_skeleton/docs/integrations/retrievers/arcee.ipynb b/docs/docs_skeleton/docs/integrations/retrievers/arcee.ipynb
@@ -0,0 +1,141 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Arcee Retriever\n",
+    "This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's Domain Adapted Language Models (DALMs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setup\n",
+    "\n",
+    "Before using `ArceeRetriever`, make sure the Arcee API key is set as `ARCEE_API_KEY` environment variable. You can also pass the api key as a named parameter."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.retrievers import ArceeRetriever\n",
+    "\n",
+    "retriever = ArceeRetriever(\n",
+    "    model=\"DALM-PubMed\",\n",
+    "    # arcee_api_key=\"ARCEE-API-KEY\" # if not already set in the environment\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional Configuration\n",
+    "\n",
+    "You can also configure `ArceeRetriever`'s parameters such as `arcee_api_url`, `arcee_app_url`, and `model_kwargs` as needed.\n",
+    "Setting the `model_kwargs` at the object initialization uses the filters and size as default for all the subsequent retrievals."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retriever = ArceeRetriever(\n",
+    "    model=\"DALM-PubMed\",\n",
+    "    # arcee_api_key=\"ARCEE-API-KEY\", # if not already set in the environment\n",
+    "    arcee_api_url=\"https://custom-api.arcee.ai\", # default is https://api.arcee.ai\n",
+    "    arcee_app_url=\"https://custom-app.arcee.ai\", # default is https://app.arcee.ai\n",
+    "    model_kwargs={\n",
+    "        \"size\": 5,\n",
+    "        \"filters\": [\n",
+    "            {\n",
+    "                \"field_name\": \"document\",\n",
+    "                \"filter_type\": \"fuzzy_search\",\n",
+    "                \"value\": \"Einstein\"\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Retrieving documents\n",
+    "You can retrieve relevant documents from uploaded contexts by providing a query. Here's an example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?\"\n",
+    "documents = retriever.get_relevant_documents(query=query)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional parameters\n",
+    "\n",
+    "Arcee allows you to apply `filters` and set the `size` (in terms of count) of retrieved document(s). Filters help narrow down the results. Here's how to use these parameters:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define filters\n",
+    "filters = [\n",
+    "    {\n",
+    "        \"field_name\": \"document\",\n",
+    "        \"filter_type\": \"fuzzy_search\",\n",
+    "        \"value\": \"Music\"\n",
+    "    },\n",
+    "    {\n",
+    "        \"field_name\": \"year\",\n",
+    "        \"filter_type\": \"strict_search\",\n",
+    "        \"value\": \"1905\"\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "# Retrieve documents with filters and size params\n",
+    "documents = retriever.get_relevant_documents(query=query, size=5, filters=filters)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/libs/langchain/langchain/llms/__init__.py b/libs/langchain/langchain/llms/__init__.py
@@ -52,6 +52,12 @@ def _import_anyscale() -> Any:
     return Anyscale
 
 
+def _import_arcee() -> Any:
+    from langchain.llms.arcee import Arcee
+
+    return Arcee
+
+
 def _import_aviary() -> Any:
     from langchain.llms.aviary import Aviary
 
@@ -479,6 +485,8 @@ def __getattr__(name: str) -> Any:
         return _import_anthropic()
     elif name == "Anyscale":
         return _import_anyscale()
+    elif name == "Arcee":
+        return _import_arcee()
     elif name == "Aviary":
         return _import_aviary()
     elif name == "AzureMLOnlineEndpoint":
@@ -633,6 +641,7 @@ def __getattr__(name: str) -> Any:
     "AmazonAPIGateway",
     "Anthropic",
     "Anyscale",
+    "Arcee",
     "Aviary",
     "AzureMLOnlineEndpoint",
     "AzureOpenAI",
@@ -713,6 +722,7 @@ def get_type_to_cls_dict() -> Dict[str, Callable[[], Type[BaseLLM]]]:
         "amazon_bedrock": _import_bedrock,
         "anthropic": _import_anthropic,
         "anyscale": _import_anyscale,
+        "arcee": _import_arcee,
         "aviary": _import_aviary,
         "azure": _import_azure_openai,
         "azureml_endpoint": _import_azureml_endpoint,