Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add baidu cloud vectorsearch document #12928

Merged
merged 2 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/api_reference/guide_imports.json

Large diffs are not rendered by default.

160 changes: 160 additions & 0 deletions docs/docs/integrations/vectorstores/baiducloud_vector_search.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Biadu Cloud ElasticSearch VectorSearch\n",
"\n",
">[Baidu Cloud VectorSearch](https://cloud.baidu.com/doc/BES/index.html?from=productToDoc) is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods. \n",
"\n",
">`Baidu Cloud ElasticSearch` provides a privilege management mechanism, for you to configure the cluster privileges freely, so as to further ensure data security.\n",
"\n",
"This notebook shows how to use functionality related to the `Baidu Cloud ElasticSearch VectorStore`.\n",
"To run, you should have an [Baidu Cloud ElasticSearch](https://cloud.baidu.com/product/bes.html) instance up and running:\n",
"\n",
"Read the [help document](https://cloud.baidu.com/doc/BES/s/8llyn0hh4 ) to quickly familiarize and configure Baidu Cloud ElasticSearch instance."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"After the instance is up and running, follow these steps to split documents, get embeddings, connect to the baidu cloud elasticsearch instance, index documents, and perform vector retrieval."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to install the following Python packages first."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install elasticsearch == 7.11.0"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we want to use `QianfanEmbeddings` so we have to get the Qianfan AK and SK. Details for QianFan is related to [Baidu Qianfan Workshop](https://cloud.baidu.com/product/wenxinworkshop)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['QIANFAN_AK'] = getpass.getpass(\"Your Qianfan AK:\")\n",
"os.environ['QIANFAN_SK'] = getpass.getpass(\"Your Qianfan SK:\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Secondly, split documents and get embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"from langchain.embeddings import QianfanEmbeddingsEndpoint\n",
"embeddings = QianfanEmbeddingsEndpoint()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, create a Baidu ElasticeSearch accessable instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a bes instance and index docs.\n",
"from langchain.vectorstores import BESVectorStore\n",
"bes = BESVectorStore.from_documents(\n",
" documents=docs, embedding=embeddings, bes_url=\"your bes cluster url\", index_name=\"your vector index\"\n",
")\n",
"bes.client.indices.refresh(index=\"your vector index\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, Query and retrive data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = bes.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Please feel free to contact <liuboyao@baidu.com> if you encounter any problems during use, and we will do our best to support you."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.17"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading