Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectara #5069

Merged
merged 26 commits into from May 24, 2023
Merged

Vectara #5069

merged 26 commits into from May 24, 2023

Conversation

ofermend
Copy link
Contributor

Vectara Integration

This PR provides integration with Vectara. Implemented here are:

  • langchain/vectorstore/vectara.py
  • tests/integration_tests/vectorstores/test_vectara.py
  • langchain/retrievers/vectara_retriever.py
    And two IPYNB notebooks to do more testing:
  • docs/modules/chains/index_examples/vectara_text_generation.ipynb
  • docs/modules/indexes/vectorstores/examples/vectara.ipynb

@dev2049 dev2049 added the 03 enhancement Enhancement of existing functionality label May 22, 2023
Copy link
Contributor

@dev2049 dev2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two notebooks in chains/index_examples are a bit out place - we can't really have that for all integrations. if you want to keep around, i would move into the integrations directory and link to from the integrations page

from langchain.embeddings.base import Embeddings
from langchain.vectorstores.base import VectorStore, VectorStoreRetriever

"""Implementation of Vector Store using Vectara (https://vectara.com)"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring a bit out of place (if here should be comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two notebooks in chains/index_examples are a bit out place - we can't really have that for all integrations. if you want to keep around, i would move into the integrations directory and link to from the integrations page

Moved the notebooks to integrations. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring a bit out of place (if here should be comment)

Yep. fixed!

from langchain.schema import BaseRetriever, Document
from langchain.vectorstores.vectara import Vectara

class VectaraRetriever(BaseRetriever):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to make Vectara.as_retriever() return this - so i would move it into that file (similar to how we do with redis)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so just move VectaraRetriever to vectara.py and get rid of vectara_retriever.py altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Done!

ofermend and others added 8 commits May 22, 2023 13:26
Moved vectara_retriever code inside vectara.py per suggestion
Moved IPYNB files to docs/integration (and changed name of one of them)
Updated formatting to be compatible with "black"
Copy link
Contributor

@dev2049 dev2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two quick comments

"""Initialize with Vectara API."""
self._vectara_customer_id = vectara_customer_id
self._vectara_corpus_id = vectara_corpus_id
self._vectara_api_key = vectara_api_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually make it possible to set credentials (like api key and other id's) via environment variables. if we wanted to support that here could d something like

def __init__(vectara_api_key: Optional[str]=None, ...):
  self.vectara_api_key = vectara_api_key or get_from_env("vectara_api_key", "VECTARA_API_KEY")
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion - updated!

new_texts = ["foobar", "foobaz"]
docsearch.add_documents([Document(page_content=content) for content in new_texts])
output = docsearch.similarity_search("foobar", k=1)
assert output == [Document(page_content="foobar")] or output == [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated the test to not only be deterministic but also more comprehensive and test metadatas.

Copy link
Contributor

@dev2049 dev2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two quick comments

dev2049 and others added 5 commits May 22, 2023 15:22
updated Vectara class to get api_keys, corpus_id and customer_id optionally from env
updated example notebooks
@dev2049 dev2049 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 23, 2023
@dev2049 dev2049 merged commit c81fb88 into langchain-ai:master May 24, 2023
13 checks passed
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 enhancement Enhancement of existing functionality lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants