vectorstores error: "search_phase_execution_exceptionm" after using elastic search #2386

longgui0318 · 2023-04-04T10:53:28Z

Hi

I'm using elasticsearch as Vectorstores, just a simple call, but it's reporting an error, I've called add_documents beforehand and it's working. But calling similarity_search is giving me an error. Thanks for checking

Related Environment

docker >> image elasticsearch:7.17.0
python >> elasticsearch==7.17.0

Test code

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import ElasticVectorSearch

if __name__ == "__main__":
    embeddings = OpenAIEmbeddings()
    elastic_vector_search = ElasticVectorSearch(
        elasticsearch_url="http://192.168.1.2:9200",
        index_name="test20222",
        embedding=embeddings
    )
    searchResult = elastic_vector_search.similarity_search("What are the characteristics of sharks")

Error

(.venv) apple@xMacBook-Pro ai-chain % python test.py
Traceback (most recent call last):
  File "/Users/apple/work/x/ai-chain/test.py", line 14, in <module>
    result = elastic_vector_search.client.search(index="test20222",query={
  File "/Users/apple/work/x/ai-chain/.venv/lib/python3.9/site-packages/elasticsearch/_sync/client/utils.py", line 414, in wrapped
    return api(*args, **kwargs)
  File "/Users/apple/work/x/ai-chain/.venv/lib/python3.9/site-packages/elasticsearch/_sync/client/__init__.py", line 3798, in search
    return self.perform_request(  # type: ignore[return-value]
  File "/Users/apple/work/x/ai-chain/.venv/lib/python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_request
    raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, 'search_phase_execution_exception', 'runtime error')

The text was updated successfully, but these errors were encountered:

sergerdn · 2023-04-04T11:11:10Z

@longgui0318

Can you confirm that the index has that name? Have you checked Kibana for information on it?
I am asking you because we have some buggy code. When you inserted docs into Elastic, your index name was ignored and a new index was created instead of the one you provided.
Also, we lack a test for it.

I believe that fixing it can be easy, and the test should also be improved. In my opinion, this occurred because the GitHub workflow only ran unit tests and not any functional tests at the moment.

    elastic_search.from_documents(
        documents=get_documents(),
        embedding=embedding,
        index_name="my_cool_name",  # the index name did not work as expected, so a new random name was created.
    )

https://github.com/hwchase17/langchain/blob/fe1eb8ca5f57fcd7c566adfc01fa1266349b72f3/langchain/vectorstores/elastic_vector_search.py#L244

longgui0318 · 2023-04-04T13:48:20Z

This is the way I called the save, so, I made sure the index_name was the same and checked it on kibana

    elastic_vector_search = ElasticVectorSearch(
        elasticsearch_url="http://192.168.1.2:9200",
        index_name="test20222",
        embedding=embeddings
    )
    elastic_vector_search.add_documents(docs);

sergerdn · 2023-04-04T13:52:58Z

@longgui0318

Please remove all indexes from ElasticSearch and then run your script to recreate the index with the newly created documents. Once the process is completed, kindly confirm that the index has been created with the required name.

Additionally, I request that you provide a screenshot from Kibana, if possible, that shows all of your indexes.

P.S. Right now, I am working on improving some tests with ElasticVectorSearch to make sure that everything is going as expected.

longgui0318 · 2023-04-04T15:12:02Z

Thank you for your attention, here is my Kibana information

longgui0318 · 2023-04-04T15:20:20Z

@sergerdn all of my indexes

longgui0318 · 2023-04-04T17:14:16Z

Determine the problem, when using from_documents to build, and then use ElasticVectorSearch to build the object is normally accessible. But if we directly use ElasticVectorSearch to build the object and then add the content by way of add_documents, we will get an error when similarity_search

longgui0318 · 2023-04-04T17:16:08Z

The difference is in this line of code

client.indices.create(index=index_name, mappings=mapping)

sergerdn · 2023-04-04T18:30:53Z

@longgui0318

To ensure we fully understand the problem, could you please provide code snippets that reproduce the issue? From the description provided, it seems like a familiar bug to me.

Additionally, please share a code snippet that demonstrates that everything is functioning as expected. This will help confirm that you are only seeing the expected index and not an arbitrary one.

longgui0318 · 2023-04-04T18:44:34Z

@sergerdn

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import UnstructuredWordDocumentLoader
from langchain.vectorstores import ElasticVectorSearch

if __name__ == "__main__":

    loader = UnstructuredWordDocumentLoader("test.docx",mode="elements")
    data = loader.load()
    text_spitter = CharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
    docs = text_spitter.split_documents(data)
    embeddings = OpenAIEmbeddings()
    ## Case 1 start :this code is ok
    elastic_vector_search = ElasticVectorSearch.from_documents(
        docs,
        embedding=embeddings,
        elasticsearch_url="http://192.168.1.110:9200"        
    )
    ## Case 1 END
    searchResult = elastic_vector_search.similarity_search("What are the characteristics of sharks")
    ## Case 2 start:this code is error,Data saved successfully, but query has exceptions
    elastic_vector_search = ElasticVectorSearch(
        elasticsearch_url="http://192.168.1.110:9200",
        index_name="test20222",
        embedding=embeddings
    )
    elastic_vector_search.add_documents(docs)
    searchResult = elastic_vector_search.similarity_search("What are the characteristics of sharks")
    ## Case 2 END

sergerdn · 2023-04-04T18:53:35Z

Okay, it seems that it is a bug that I have described above, which can be found at #2386 (comment).

I am having difficulty updating the tests properly, as it will be more challenging for me than fixing the bug. Please be patient, I will work on fixing it.

longgui0318 · 2023-04-04T18:57:07Z

tks

- Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes #2386

sergerdn · 2023-04-06T17:13:58Z

I made a mistake on the test and fixed another bug, but not the one we originally talked about.

longgui0318 · 2023-04-06T17:21:53Z

@sergerdn I think it may be that the information I gave is not accurate enough

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import UnstructuredWordDocumentLoader
from langchain.vectorstores import ElasticVectorSearch

if __name__ == "__main__":

    loader = UnstructuredWordDocumentLoader("test.docx",mode="elements")
    data = loader.load()
    text_spitter = CharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
    docs = text_spitter.split_documents(data)
    embeddings = OpenAIEmbeddings()
    ## Please note that test20222 was not created before this
    ## this code is error,Data saved successfully
    elastic_vector_search = ElasticVectorSearch(
        elasticsearch_url="http://192.168.1.110:9200",
        index_name="test20222",
        embedding=embeddings
    )
    ## Only now is the documents added
    elastic_vector_search.add_documents(docs)
    ## query has exceptions 'search_phase_execution_exceptionm'.Because of this approach, no client.indices.create(index=index_name, mappings=mapping) has been executed before add_documents
    searchResult = elastic_vector_search.similarity_search("What are the characteristics of sharks")

sergerdn · 2023-04-06T17:40:01Z

It appears that it was executed before adding:

   raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, 'resource_already_exists_exception', 'index [custom_index_68ae159b3ddc4c02b12cf6660e2e0499/6glhH4tQRlOzC6WKZQrwdg] already exists')

longgui0318 · 2023-04-06T17:43:45Z

No, it was executed after adding the data and confirming that kibana saw the data, so I guess some key data initialization was missing that caused the inconsistency between the two structures

sergerdn · 2023-04-06T17:51:27Z

No, it was executed after adding the data and confirming that kibana saw the data, so I guess some key data initialization was missing that caused the inconsistency between the two structures

I believe you are correct. An index was created, but with incorrect mappings.

Right mappings:

{
  "mappings": {
    "properties": {
      "metadata": {
        "properties": {
          "source": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "text": {
        "type": "text"
      },
      "vector": {
        "type": "dense_vector",
        "dims": 1536
      }
    }
  }
}

Wrong mappings:

{
  "mappings": {
    "properties": {
      "metadata": {
        "properties": {
          "source": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "text": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "vector": {
        "type": "float"
      }
    }
  }
}

sergerdn · 2023-04-06T18:07:24Z

@longgui0318

Thank you very much for your help. I have fixed it!

When it will be merged, please test it on your end to ensure that the new changes is working properly.
#2445

…ests (#2445) Using `pytest-vcr` in integration tests has several benefits. Firstly, it removes the need to mock external services, as VCR records and replays HTTP interactions on the fly. Secondly, it simplifies the integration test setup by eliminating the need to set up and tear down external services in some cases. Finally, it allows for more reliable and deterministic integration tests by ensuring that HTTP interactions are always replayed with the same response. Overall, `pytest-vcr` is a valuable tool for simplifying integration test setup and improving their reliability This commit adds the `pytest-vcr` package as a dependency for integration tests in the `pyproject.toml` file. It also introduces two new fixtures in `tests/integration_tests/conftest.py` files for managing cassette directories and VCR configurations. In addition, the `tests/integration_tests/vectorstores/test_elasticsearch.py` file has been updated to use the `@pytest.mark.vcr` decorator for recording and replaying HTTP interactions. Finally, this commit removes the `documents` fixture from the `test_elasticsearch.py` file and replaces it with a new fixture defined in `tests/integration_tests/vectorstores/conftest.py` that yields a list of documents to use in any other tests. This also includes my second attempt to fix issue : #2386 Maybe related #2484

carcaussa · 2023-05-31T18:47:29Z

hi, I'm using version 0.0.186 and having this error, apparently I'm experiencing the same mapping issue, do you know how can I know when this is merged and included in a release of langchain?
Thank you very much

Baro1502 · 2023-08-02T03:35:19Z

Hi, I am currently using version 0.0.248 and I still encounter this issue. Is there any way that I can address this? Thank you

luccafabro · 2023-08-02T19:20:08Z

@Baro1502 try create your index with this structure before inserts
PUT /your_index { "mappings": { "properties": { "metadata": { "properties": { "source": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "text": { "type": "text" }, "vector": { "type": "dense_vector", "dims": 1536 } } } }

rizwanalvi1 · 2024-02-21T16:14:07Z

in my case, it happened when I used different embedding models for storing and retrieving/searching.
I was using OpenAIEmbeddings() for storing and by mistake using instructor-large for searching

hope this feedback would be of some use as well.

sergerdn mentioned this issue Apr 4, 2023

fix: elasticsearch #2402

Merged

hwchase17 closed this as completed in #2402 Apr 5, 2023

sergerdn mentioned this issue Apr 6, 2023

feat: add pytest-vcr for recording HTTP interactions in integration tests #2445

Merged

dosubot bot mentioned this issue Aug 7, 2023

RequestError: RequestError(400, 'search_phase_execution_exception...) with ElasticSearch as Vector store when querying #8849

Closed

14 tasks

dosubot bot mentioned this issue Aug 25, 2023

Issue: Error in Similarity Search with MongoDB Vector Store in Langchain #9735

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorstores error: "search_phase_execution_exceptionm" after using elastic search #2386

vectorstores error: "search_phase_execution_exceptionm" after using elastic search #2386

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023 •

edited

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023 •

edited

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023 •

edited

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 6, 2023

longgui0318 commented Apr 6, 2023 •

edited

sergerdn commented Apr 6, 2023

longgui0318 commented Apr 6, 2023

sergerdn commented Apr 6, 2023

sergerdn commented Apr 6, 2023 •

edited

carcaussa commented May 31, 2023

Baro1502 commented Aug 2, 2023

luccafabro commented Aug 2, 2023 •

edited

rizwanalvi1 commented Feb 21, 2024

vectorstores error: "search_phase_execution_exceptionm" after using elastic search #2386

vectorstores error: "search_phase_execution_exceptionm" after using elastic search #2386

Comments

longgui0318 commented Apr 4, 2023

Hi

I'm using elasticsearch as Vectorstores, just a simple call, but it's reporting an error, I've called add_documents beforehand and it's working. But calling similarity_search is giving me an error. Thanks for checking

Related Environment

Test code

Error

sergerdn commented Apr 4, 2023 • edited

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023 • edited

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 4, 2023 • edited

longgui0318 commented Apr 4, 2023

sergerdn commented Apr 6, 2023

longgui0318 commented Apr 6, 2023 • edited

sergerdn commented Apr 6, 2023

longgui0318 commented Apr 6, 2023

sergerdn commented Apr 6, 2023

sergerdn commented Apr 6, 2023 • edited

carcaussa commented May 31, 2023

Baro1502 commented Aug 2, 2023

luccafabro commented Aug 2, 2023 • edited

rizwanalvi1 commented Feb 21, 2024

sergerdn commented Apr 4, 2023 •

edited

sergerdn commented Apr 4, 2023 •

edited

sergerdn commented Apr 4, 2023 •

edited

longgui0318 commented Apr 6, 2023 •

edited

sergerdn commented Apr 6, 2023 •

edited

luccafabro commented Aug 2, 2023 •

edited