Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similarity Search Issue #2225

Closed
mohitraj opened this issue Mar 31, 2023 · 4 comments
Closed

similarity Search Issue #2225

mohitraj opened this issue Mar 31, 2023 · 4 comments

Comments

@mohitraj
Copy link

We are using Chroma for storing the records in vector form. When searching the query, the return documents do not give accurate results.
c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)

What is the solution to get accurate results?

@ghost
Copy link

ghost commented Mar 31, 2023

can tuke chucksize and overlaping paramter when you splitting the text and see will it improve acc. In my case it actually work

@mohitraj
Copy link
Author

What is your chunk size and overlapping parameter?

@khimaros
Copy link
Contributor

for me, when using LlamaCppEmbedding, chunk and overlap was not helpful. the results returned are almost in reverse order of what they should be with the best results almost dead last.

@dosubot
Copy link

dosubot bot commented Sep 10, 2023

Hi, @mohitraj! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you raised is about the return documents from a similarity search using Chroma not giving accurate results. In the comments, there were suggestions to try different chunk sizes and overlapping parameters, but it seems that these parameters did not help in improving the accuracy of the search. Unfortunately, there doesn't appear to be a resolution to this issue at the moment.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. If you have any further questions or concerns, please don't hesitate to reach out.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 10, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants