New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VectorStore integration for Vespa #13213
Add VectorStore integration for Vespa #13213
Conversation
…//github.com/thomasht86/llama_index into thomasht86/add-vespa-vectorstore-integration
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a tonne @thomasht86! The code looks pretty good. I left a few minor comments.
Looks like we just need to get checks to pass. Can you run make lint
and make format
and the commit and push again?
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/pyproject.toml
Outdated
Show resolved
Hide resolved
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/poetry.lock
Outdated
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
Outdated
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
Outdated
Show resolved
Hide resolved
...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
Outdated
Show resolved
Hide resolved
Thanks!
I'll have a look at the comments and checks tomorrow. 🙂
tor. 2. mai 2024, 16:49 skrev Andrei Fajardo ***@***.***>:
… ***@***.**** approved this pull request.
Thanks a tonne @thomasht86 <https://github.com/thomasht86>! The code
looks pretty good. I left a few minor comments.
Looks like we just need to get checks to pass. Can you run make lint and make
format and the commit and push again?
------------------------------
In
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/pyproject.toml
<#13213 (comment)>
:
> @@ -0,0 +1,64 @@
+[build-system]
+build-backend = "poetry.core.masonry.api"
+requires = ["poetry-core"]
+
+[tool.codespell]
+check-filenames = true
+check-hidden = true
+skip = "*.csv,*.html,*.json,*.jsonl,*.pdf,*.txt,*.ipynb"
+
+[tool.llamahub]
+contains_example = false
+import_path = "llama_index.vector_stores.vespa"
+
+[tool.llamahub.class_authors]
+VespaVectorStore = "llama-index"
feel free to use your own github username rather than llama-index
(default placeholder).
------------------------------
On
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/poetry.lock
<#13213 (comment)>
:
can we please remove this lock file -- we typically don't keep these
around except for the core library
------------------------------
In
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
<#13213 (comment)>
:
> + app = self._deploy_app_cloud()
+ elif self.deployment_target == "local":
+ app = self._deploy_app_local()
+ else:
+ raise ValueError(
+ f"Deployment target {self.deployment_target} not supported. Please choose either `local` or `cloud`."
+ )
+ return app
+
+ def _deploy_app_local(self) -> Vespa:
+ return VespaDocker(port=8080).deploy(self.application_package)
+
+ def _deploy_app_cloud(self) -> Vespa:
+ return VespaCloud(
+ tenant=self.tenant,
+ application="hybridsearch",
ooc: is this not typically a param that a user might want to customize?
------------------------------
In
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
<#13213 (comment)>
:
> + data_id=doc["id"],
+ fields=doc["fields"],
+ schema=schema or self.default_schema_name,
+ namespace=self.namespace,
+ timeout=10,
+ )
+ )
+ tasks.append(task)
+
+ results = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)
+ for result in results:
+ if result.exception():
+ raise result.exception
+ return ids
+
+ def delete(
can we also add to a namespace?
------------------------------
In
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
<#13213 (comment)>
:
> + """
+ Delete nodes using with ref_doc_id.
+ """
+ response: VespaResponse = self.app.delete_data(
+ schema=self.default_schema_name,
+ namespace=namespace or self.namespace,
+ data_id=ref_doc_id,
+ kwargs=delete_kwargs,
+ )
+ if not response.is_successful():
+ raise ValueError(
+ f"Delete request failed: {response.status_code}, response payload: {response.json}"
+ )
+ logger.info(f"Deleted node with id {ref_doc_id}")
+
+ async def adelete(self, ref_doc_id: str, **delete_kwargs: Any) -> None:
should also have namespace here?
------------------------------
In
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py
<#13213 (comment)>
:
> + similarities: List[float] = []
+ for hit in response.hits:
+ response_fields: dict = hit.get("fields", {})
+ metadata = response_fields.get("metadata", {})
+ metadata = json.loads(metadata)
+ logger.debug(f"Metadata: {metadata}")
+ node = metadata_dict_to_node(metadata)
+ text = response_fields.get("body", "")
+ node.set_content(text)
+ nodes.append(node)
+ ids.append(response_fields.get("id"))
+ similarities.append(hit["relevance"])
+ return VectorStoreQueryResult(nodes=nodes, ids=ids, similarities=similarities)
+
+ async def aquery(
+ self, query: VectorStoreQuery, **kwargs: Any
should we match the signature of query?
—
Reply to this email directly, view it on GitHub
<#13213 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3M74GFNCSB5VGABFYEUN3ZAJG57AVCNFSM6AAAAABHDFBSJSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZVHEZDCMBSGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks for great review @nerdai. Fixes should have been made now. Let`s see if checks pass :) |
raise result.exception | ||
return ids | ||
|
||
def delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment on delete -- its meant to be a filtered delete, deleting any node that has ref_doc_id == ref_doc_id
in the metadata -- is this possible with vespa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so if you look in the add
-method, you'll see that the node_id
(which is the same as the rec_doc_id
) is added as Vespa internal doc-id, so this will delete only those.
Metadata filtering is also possible, but will require a more complex Schema-definition.
Adding more templates and examples for this will be on the list for next iterations.
I could use another pair of eyes 👀 on why the unit tests fail. llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py:18: in <module>
from llama_index.vector_stores.vespa.templates import hybrid_template
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/templates.py:1: in <module>
from vespa.package import (
E ModuleNotFoundError: No module named 'vespa' This module should have been installed through |
@logan-markewich might this have anything to do with |
@thomasht86 looks like we need to rerun |
@thomasht86 it could. Let me take a peek |
Ok, I ran the linting and formatting only in the integration-directory, forgetting about the example notebook. Sorry about that. Should be fixed now. |
Ok, only unit tests left. ==================================== ERRORS ====================================
_ ERROR collecting llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py _
ImportError while importing test module 'llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py:9: in <module>
from vespa.application import ApplicationPackage
E ModuleNotFoundError: No module named 'vespa' As I mentioned, this module should be available through
Do you have a suggestion for debugging actions? |
I tried to reinstall environment, ran |
Description
This PR adds integration for Vespa as llama-index-vector-store.
New dependencies introduced are pyvespa.
Fixes #8099
New Package?
Yes.
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods