Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleCloudEnterpriseSearchRetriever consistently returns no results #8219

Closed
2 of 14 tasks
mattsday opened this issue Jul 25, 2023 · 10 comments
Closed
2 of 14 tasks

GoogleCloudEnterpriseSearchRetriever consistently returns no results #8219

mattsday opened this issue Jul 25, 2023 · 10 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module

Comments

@mattsday
Copy link

System Info

Versions:

langchain==0.0.240
google-cloud-discoveryengine==0.9.1
google-cloud-aiplatform==1.28.1

GoogleCloudEnterpriseSearchRetriever it consistently returns zero results without error.

Workarounds / Validations attempted:

  • If I put invalid values (e.g. an invalid engine) it causes an error
  • Searching with these terms from the console works as expected
  • Using discoveryengine_v1beta.SearchServiceClient() directly works as expected and provides results

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

Given the following code:

retriever = GoogleCloudEnterpriseSearchRetriever(
    project_id=my-project-id,
    search_engine_id=my-engine,
    max_documents=3,
)
query = "Where I can fly to Spain?"
result = retriever.get_relevant_documents(query)
print(result)

This will always print [] no matter how I tweak the search query etc.

Expected behavior

Would expect at least one search result and the result dictionary not to be empty.

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jul 25, 2023
@dosubot
Copy link

dosubot bot commented Jul 25, 2023

Answer generated by a 🤖

Answer

I understand that you're experiencing an issue where the GoogleCloudEnterpriseSearchRetriever is consistently returning no results, despite your validation attempts. You've confirmed that using invalid values triggers an error, and that using the same search terms from the console or directly with discoveryengine_v1beta.SearchServiceClient() works as expected.

One potential cause of this issue could be the filter parameter in the GoogleCloudEnterpriseSearchRetriever class. This parameter is an optional string that represents a filter expression. If it's not provided or if it's set to a value that doesn't match any documents, the retriever might return zero results.

In your provided code, I noticed that the filter parameter is not set when initializing the GoogleCloudEnterpriseSearchRetriever. You might want to try setting the filter parameter to a value that matches the documents you want to retrieve. Here's an example:

retriever = GoogleCloudEnterpriseSearchRetriever(
    project_id=my-project-id,
    search_engine_id=my-engine,
    filter="your_filter_expression",
    max_documents=3,
)

Please replace "your_filter_expression" with a valid filter expression that matches your documents.

If this doesn't resolve the issue, it could be related to other parameters in the GoogleCloudEnterpriseSearchRetriever class, such as get_extractive_answers, max_extractive_answer_count, max_extractive_segment_count, and query_expansion_condition. You might need to adjust these parameters according to your needs.

Please note that this is just one potential solution. If you find that this solution is correct and it's a bug that could impact other users, we encourage you to make a pull request. If this solution doesn't align with your experience, you might want to consider upgrading to the latest version of the software.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@pepecura
Copy link

Adding on Matt's question - no response is returned even when the filter is empty: Can it be the case that GoogleCloudEnterpriseSearchRetriever in langchain.retrievers does not return the document snippets for the engines built on a "website" yet, and only supports engines built on "unstructured data"?

@kamikasky
Copy link

@mattsday @pepecura — I also ran into this issue. It's definitely returning document results. If you're looking to return web search results, you can extend the original class and override _convert_search_response like such:

from langchain.schema import Document
from google.protobuf.json_format import MessageToDict

class GoogleSearchWebRetriever(GoogleCloudEnterpriseSearchRetriever):
    def _convert_search_response(
        self, results
    ):
        documents = []
        for result in results:
            document_dict = MessageToDict(result.document._pb)
            derived_struct_data = document_dict.get("derivedStructData", None)
            if derived_struct_data:
                doc_metadata = document_dict.get("structData", {})
                doc_metadata.update(derived_struct_data)

                chunks = derived_struct_data.get('snippets', [])
                for chunk in chunks:
                    doc_metadata[
                        "source"
                    ] = f"{derived_struct_data.get('link', '')}"
                    doc_metadata["id"] = document_dict["id"]
                    document = Document(
                        page_content=chunk.get("snippet", ""), metadata=doc_metadata
                    )
                    documents.append(document)

        return documents

@mattsday
Copy link
Author

mattsday commented Aug 3, 2023

Thank you @kamikasky - this worked well

@tomaspiaggio
Copy link
Contributor

I'm having the same problem with unstructured data. I just copied the example on the documentation and it didn't work.
image

@tomaspiaggio
Copy link
Contributor

I just found the problem. In this function _convert_structured_search_response on line 153, the getattr function is not working. Here's the fix:

def _convert_unstructured_search_response(
    self, results: Sequence[SearchResponse]
) -> List[Document]:
    """Converts a sequence of search results to a list of LangChain documents."""
    from google.protobuf.json_format import MessageToDict

    documents: List[Document] = []

    for result in results:
        document_dict = MessageToDict(
            result.document._pb, preserving_proto_field_name=True
        )

        derived_struct_data = document_dict.get("derived_struct_data", None)
        if not derived_struct_data:
            continue

        doc_metadata = document_dict.get("struct_data", {})
        doc_metadata["id"] = document_dict["id"]

        chunk_type = (
            "extractive_answers"
            if self.get_extractive_answers
            else "extractive_segments"
        )

        if chunk_type not in derived_struct_data:
            continue

        for chunk in derived_struct_data[chunk_type]:
            doc_metadata["source"] = derived_struct_data.get("link", "")

            if chunk_type == "extractive_answers":
                doc_metadata["source"] += f":{chunk.get('pageNumber', '')}"

            documents.append(
                Document(
                    page_content=chunk.get("content", ""), metadata=doc_metadata
                )
            )

    return documents

I can do a PR with the fix.

@junyakitady
Copy link

I'm having the same issue with unstructured data.
langchain==0.0.273

@tomaspiaggio
Copy link
Contributor

Just made a PR here for those interested!

baskaryan pushed a commit that referenced this issue Sep 13, 2023
Replace this entire comment with:
- Description: fixed Google Enterprise Search Retriever where it was
consistently returning empty results,
- Issue: related to [issue
8219](#8219),
  - Dependencies: no dependencies,
  - Tag maintainer: @hwchase17 ,
  - Twitter handle: [Tomas Piaggio](https://twitter.com/TomasPiaggio)!
@holtskinner
Copy link
Contributor

Should this issue be marked as fixed after #10546

MarkEdmondson1234 pushed a commit to MarkEdmondson1234/edmonbrain that referenced this issue Sep 16, 2023
enterprise search was returning no results langchain-ai/langchain#8219
Copy link

dosubot bot commented Dec 14, 2023

Hi, @mattsday,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you reported regarding the GoogleCloudEnterpriseSearchRetriever consistently returning zero results without error has received attention from other users. One user, tomaspiaggio, has identified a specific problem in the code and has submitted a pull request to address it. Another user, holtskinner, has suggested marking the issue as fixed after the pull request is merged.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 14, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

6 participants