From 7ad4cfeeb9b319a27a7180954f84646f4ce21bdc Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Thu, 3 Aug 2023 08:14:26 -0700 Subject: [PATCH 1/4] updated getting started documentation --- .../integrations/vectorstores/vectara.ipynb | 74 ++++++++++++------- 1 file changed, 49 insertions(+), 25 deletions(-) diff --git a/docs/extras/integrations/vectorstores/vectara.ipynb b/docs/extras/integrations/vectorstores/vectara.ipynb index 5a6aa0f1746440..6b757bc8be82a1 100644 --- a/docs/extras/integrations/vectorstores/vectara.ipynb +++ b/docs/extras/integrations/vectorstores/vectara.ipynb @@ -8,32 +8,43 @@ "source": [ "# Vectara\n", "\n", - ">[Vectara](https://vectara.com/) is a API platform for building LLM-powered applications. It provides a simple to use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n", + ">[Vectara](https://vectara.com/) is a API platform for building GenAI applications. It provides an easy-to-use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n", + "See the [Vectara API documentation ](https://docs.vectara.com/docs/) for more information on how to use the API.\n", "\n", + "This notebook shows how to use functionality related to the `Vectara`'s integration with langchain.\n", + "Note that unlike many other integrations in this category, Vectara provides an end-to-end managed service for [Grounded Generation](https://vectara.com/grounded-generation/) (aka retrieval agumented generation), which includes:\n", + "1. A way to extract text from document files and chunk them into sentences.\n", + "2. Its own embeddings model and vector store - each text segment is encoded into a vector embedding and stored in the Vectara internal vector store\n", + "3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching))\n", "\n", - "This notebook shows how to use functionality related to the `Vectara` vector database or the `Vectara` retriever. \n", - "\n", - "See the [Vectara API documentation ](https://docs.vectara.com/docs/) for more information on how to use the API." + "All of these are supported in this LangChain integration." ] }, { - "cell_type": "code", - "execution_count": 2, - "id": "aac9563e", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:22.282884Z", - "start_time": "2023-04-04T10:51:21.408077Z" - }, - "tags": [] - }, - "outputs": [], + "cell_type": "markdown", + "id": "dc0f4344", + "metadata": {}, "source": [ - "import os\n", - "from langchain.embeddings import FakeEmbeddings\n", - "from langchain.text_splitter import CharacterTextSplitter\n", - "from langchain.vectorstores import Vectara\n", - "from langchain.document_loaders import TextLoader" + "# Setup\n", + "\n", + "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:\n", + "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.

\n", + "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.

\n", + "3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n", + "\n", + "To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n", + "You can provide those to LangChain in two ways:\n", + "1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`

\n", + "2. Add them to the Vectara vectorstore object constructor as follows:\n", + "\n", + "\n", + "```\n", + "vectorstore = Vectara(\n", + " vectara_customer_id=vectara_customer_id,\n", + " vectara_corpus_id=vectara_corpus_id,\n", + " vectara_api_key=vectara_api_key\n", + " )\n", + "```" ] }, { @@ -44,8 +55,21 @@ "source": [ "## Connecting to Vectara from LangChain\n", "\n", - "The Vectara API provides simple API endpoints for indexing and querying, which is encapsulated in the Vectara integration.\n", - "First let's ingest the documents using the from_documents() method:" + "To get started, let's ingest the documents using the from_documents() method.
\n", + "We assume here that you've added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and query+indexing VECTARA_API_KEY as environment variables." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04a1f1a0", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings import FakeEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Vectara\n", + "from langchain.document_loaders import TextLoader" ] }, { @@ -87,8 +111,8 @@ "id": "90dbf3e7", "metadata": {}, "source": [ - "Vectara's indexing API provides a file upload API where the file is handled directly by Vectara - pre-processed, chunked optimally and added to the Vectara vector store.\n", - "To use this, we added the add_files() method (and from_files()). \n", + "Vectara's indexing API provides a file upload API where the file is handled directly by Vectara - pre-processed, chunked optimally and added to the Vectara vector store.
\n", + "To use this, we added the add_files() method (as well as from_files()). \n", "\n", "Let's see this in action. We pick two PDF documents to upload: \n", "1. The \"I have a dream\" speech by Dr. King\n", @@ -296,7 +320,7 @@ "source": [ "## Vectara as a Retriever\n", "\n", - "Vectara, as all the other vector stores, can be used also as a LangChain Retriever:" + "Vectara, as all the other LangChain vectorstores, is most often used as a LangChain Retriever:" ] }, { From 3cc9dea7fadba699be5441780b80d7272ae23a21 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Thu, 3 Aug 2023 08:51:00 -0700 Subject: [PATCH 2/4] updates to doc --- .../integrations/vectorstores/vectara.ipynb | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/extras/integrations/vectorstores/vectara.ipynb b/docs/extras/integrations/vectorstores/vectara.ipynb index 6b757bc8be82a1..008fe706bd01a6 100644 --- a/docs/extras/integrations/vectorstores/vectara.ipynb +++ b/docs/extras/integrations/vectorstores/vectara.ipynb @@ -34,11 +34,23 @@ "\n", "To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n", "You can provide those to LangChain in two ways:\n", - "1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`

\n", - "2. Add them to the Vectara vectorstore object constructor as follows:\n", "\n", + "1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n", "\n", + "> For example, you can set these variables using os.environ and getpass as follows:\n", + "\n", + "```python\n", + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"VECTARA_CUSTOMER_ID\"] = getpass.getpass(\"Vectara Customer ID:\")\n", + "os.environ[\"VECTARA_CORPUS_ID\"] = getpass.getpass(\"Vectara Corpus ID:\")\n", + "os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n", "```\n", + "\n", + "2. Add them to the Vectara vectorstore constructor:\n", + "\n", + "```python\n", "vectorstore = Vectara(\n", " vectara_customer_id=vectara_customer_id,\n", " vectara_corpus_id=vectara_corpus_id,\n", From fc8a235bc041f99e7ba19642fa3095e3e8d4e083 Mon Sep 17 00:00:00 2001 From: Bagatur Date: Thu, 3 Aug 2023 10:33:54 -0700 Subject: [PATCH 3/4] nit --- docs/extras/integrations/vectorstores/vectara.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/extras/integrations/vectorstores/vectara.ipynb b/docs/extras/integrations/vectorstores/vectara.ipynb index 008fe706bd01a6..15918419a02934 100644 --- a/docs/extras/integrations/vectorstores/vectara.ipynb +++ b/docs/extras/integrations/vectorstores/vectara.ipynb @@ -28,8 +28,8 @@ "# Setup\n", "\n", "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:\n", - "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.

\n", - "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.

\n", + "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n", + "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n", "3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n", "\n", "To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n", @@ -414,7 +414,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.9.1" } }, "nbformat": 4, From c1b56c471eb059ee757526771e8def9495cd4546 Mon Sep 17 00:00:00 2001 From: Bagatur Date: Thu, 3 Aug 2023 14:20:24 -0700 Subject: [PATCH 4/4] nit --- docs/extras/integrations/vectorstores/vectara.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/extras/integrations/vectorstores/vectara.ipynb b/docs/extras/integrations/vectorstores/vectara.ipynb index 15918419a02934..05382091b99c8a 100644 --- a/docs/extras/integrations/vectorstores/vectara.ipynb +++ b/docs/extras/integrations/vectorstores/vectara.ipynb @@ -67,7 +67,7 @@ "source": [ "## Connecting to Vectara from LangChain\n", "\n", - "To get started, let's ingest the documents using the from_documents() method.
\n", + "To get started, let's ingest the documents using the from_documents() method.\n", "We assume here that you've added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and query+indexing VECTARA_API_KEY as environment variables." ] }, @@ -123,7 +123,7 @@ "id": "90dbf3e7", "metadata": {}, "source": [ - "Vectara's indexing API provides a file upload API where the file is handled directly by Vectara - pre-processed, chunked optimally and added to the Vectara vector store.
\n", + "Vectara's indexing API provides a file upload API where the file is handled directly by Vectara - pre-processed, chunked optimally and added to the Vectara vector store.\n", "To use this, we added the add_files() method (as well as from_files()). \n", "\n", "Let's see this in action. We pick two PDF documents to upload: \n",