# OpenSearch

> [OpenSearch](https://opensearch.org/)是一款可扩展的、灵活的、可扩展的开源软件套件，用于搜索、分析和可观察应用程序，采用Apache 2.0许可证。`OpenSearch`是基于`Apache Lucene`的分布式搜索和分析引擎。


本笔记本演示了如何使用与`OpenSearch`数据库相关的功能。

要运行，您应该有一个正在运行的OpenSearch实例：[在此处查看简易的Docker安装方法](https://hub.docker.com/r/opensearchproject/opensearch)。

`similarity_search`默认执行近似k-NN搜索，使用多个算法之一，例如lucene、nmslib、faiss，适用于大型数据集。要执行蛮力搜索，我们有其他搜索方法，称为脚本评分和无痛脚本。有关更多详细信息，请参见[这里](https://opensearch.org/docs/latest/search-plugins/knn/index/)。


## 安装
安装Python客户端。

In [None]:
!pip install opensearch-py

我们想使用OpenAIEmbeddings，所以我们必须获取OpenAI API密钥。

In [None]:
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import OpenSearchVectorSearch
from langchain.document_loaders import TextLoader

In [None]:
from langchain.document_loaders import TextLoader
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

### 使用近似k-NN进行相似性搜索

`similarity_search`使用自定义参数使用`Approximate k-NN`搜索。

In [None]:
docsearch = OpenSearchVectorSearch.from_documents(
    docs, 
    embeddings, 
    opensearch_url="http://localhost:9200"
)

# If using the default Docker installation, use this instantiation instead:
# docsearch = OpenSearchVectorSearch.from_documents(
#     docs, 
#     embeddings, 
#     opensearch_url="https://localhost:9200", 
#     http_auth=("admin", "admin"),     
#     use_ssl = False,
#     verify_certs = False,
#     ssl_assert_hostname = False,
#     ssl_show_warn = False,
# )

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query, k=10)

In [None]:
print(docs[0].page_content)

In [None]:
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200", engine="faiss", space_type="innerproduct", ef_construction=256, m=48)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)

In [None]:
print(docs[0].page_content)

### 使用脚本评分进行相似性搜索

`similarity_search`使用自定义参数使用`Script Scoring`进行搜索。

In [None]:
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200", is_appx_search=False)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search("What did the president say about Ketanji Brown Jackson", k=1, search_type="script_scoring")

In [None]:
print(docs[0].page_content)

### 使用无痛脚本进行相似性搜索

`similarity_search`使用自定义参数使用`Painless Scripting`进行搜索。

In [None]:
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200", is_appx_search=False)
filter = {"bool": {"filter": {"term": {"text": "smuggling"}}}}
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search("What did the president say about Ketanji Brown Jackson", search_type="painless_scripting", space_type="cosineSimilarity", pre_filter=filter)

In [None]:
print(docs[0].page_content)

### 使用现有的OpenSearch实例

还可以使用预先存在向量的文档使用现有的OpenSearch实例。

In [None]:
# this is just an example, you would need to change these values to point to another opensearch instance
docsearch = OpenSearchVectorSearch(index_name="index-*", embedding_function=embeddings, opensearch_url="http://localhost:9200")

# you can specify custom field names to match the fields you're using to store your embedding, document text value, and metadata
docs = docsearch.similarity_search("Who was asking about getting lunch today?", search_type="script_scoring", space_type="cosinesimil", vector_field="message_embedding", text_field="message", metadata_field="message_metadata")