Add VectorStore integration for Vespa #13213

thomasht86 · 2024-05-02T07:15:02Z

Description

This PR adds integration for Vespa as llama-index-vector-store.
New dependencies introduced are pyvespa.

New Package?

Yes.

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added new unit/integration tests
Added new notebook (that tests end-to-end)
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

…//github.com/thomasht86/llama_index into thomasht86/add-vespa-vectorstore-integration

review-notebook-app · 2024-05-02T07:15:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nerdai

Thanks a tonne @thomasht86! The code looks pretty good. I left a few minor comments.

Looks like we just need to get checks to pass. Can you run make lint and make format and the commit and push again?

llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/pyproject.toml

llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/poetry.lock

...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py

thomasht86 · 2024-05-02T15:13:52Z

Thanks! I'll have a look at the comments and checks tomorrow. 🙂 tor. 2. mai 2024, 16:49 skrev Andrei Fajardo ***@***.***>:

…

***@***.**** approved this pull request. Thanks a tonne @thomasht86 <https://github.com/thomasht86>! The code looks pretty good. I left a few minor comments. Looks like we just need to get checks to pass. Can you run make lint and make format and the commit and push again? ------------------------------ In llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/pyproject.toml <#13213 (comment)> : > @@ -0,0 +1,64 @@ +[build-system] +build-backend = "poetry.core.masonry.api" +requires = ["poetry-core"] + +[tool.codespell] +check-filenames = true +check-hidden = true +skip = "*.csv,*.html,*.json,*.jsonl,*.pdf,*.txt,*.ipynb" + +[tool.llamahub] +contains_example = false +import_path = "llama_index.vector_stores.vespa" + +[tool.llamahub.class_authors] +VespaVectorStore = "llama-index" feel free to use your own github username rather than llama-index (default placeholder). ------------------------------ On llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/poetry.lock <#13213 (comment)> : can we please remove this lock file -- we typically don't keep these around except for the core library ------------------------------ In llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py <#13213 (comment)> : > + app = self._deploy_app_cloud() + elif self.deployment_target == "local": + app = self._deploy_app_local() + else: + raise ValueError( + f"Deployment target {self.deployment_target} not supported. Please choose either `local` or `cloud`." + ) + return app + + def _deploy_app_local(self) -> Vespa: + return VespaDocker(port=8080).deploy(self.application_package) + + def _deploy_app_cloud(self) -> Vespa: + return VespaCloud( + tenant=self.tenant, + application="hybridsearch", ooc: is this not typically a param that a user might want to customize? ------------------------------ In llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py <#13213 (comment)> : > + data_id=doc["id"], + fields=doc["fields"], + schema=schema or self.default_schema_name, + namespace=self.namespace, + timeout=10, + ) + ) + tasks.append(task) + + results = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED) + for result in results: + if result.exception(): + raise result.exception + return ids + + def delete( can we also add to a namespace? ------------------------------ In llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py <#13213 (comment)> : > + """ + Delete nodes using with ref_doc_id. + """ + response: VespaResponse = self.app.delete_data( + schema=self.default_schema_name, + namespace=namespace or self.namespace, + data_id=ref_doc_id, + kwargs=delete_kwargs, + ) + if not response.is_successful(): + raise ValueError( + f"Delete request failed: {response.status_code}, response payload: {response.json}" + ) + logger.info(f"Deleted node with id {ref_doc_id}") + + async def adelete(self, ref_doc_id: str, **delete_kwargs: Any) -> None: should also have namespace here? ------------------------------ In llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py <#13213 (comment)> : > + similarities: List[float] = [] + for hit in response.hits: + response_fields: dict = hit.get("fields", {}) + metadata = response_fields.get("metadata", {}) + metadata = json.loads(metadata) + logger.debug(f"Metadata: {metadata}") + node = metadata_dict_to_node(metadata) + text = response_fields.get("body", "") + node.set_content(text) + nodes.append(node) + ids.append(response_fields.get("id")) + similarities.append(hit["relevance"]) + return VectorStoreQueryResult(nodes=nodes, ids=ids, similarities=similarities) + + async def aquery( + self, query: VectorStoreQuery, **kwargs: Any should we match the signature of query? — Reply to this email directly, view it on GitHub <#13213 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF3M74GFNCSB5VGABFYEUN3ZAJG57AVCNFSM6AAAAABHDFBSJSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZVHEZDCMBSGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

thomasht86 · 2024-05-03T04:48:28Z

Thanks for great review @nerdai. Fixes should have been made now. Let`s see if checks pass :)

logan-markewich · 2024-05-03T15:37:51Z

...ations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py

+                    raise result.exception
+        return ids
+
+    def delete(


One more comment on delete -- its meant to be a filtered delete, deleting any node that has ref_doc_id == ref_doc_id in the metadata -- is this possible with vespa?

Yeah, so if you look in the add-method, you'll see that the node_id (which is the same as the rec_doc_id) is added as Vespa internal doc-id, so this will delete only those.
Metadata filtering is also possible, but will require a more complex Schema-definition.

Adding more templates and examples for this will be on the list for next iterations.

thomasht86 · 2024-05-03T16:28:19Z

I could use another pair of eyes 👀 on why the unit tests fail.
The error is an import failure during test collection:

llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py:18: in <module>
    from llama_index.vector_stores.vespa.templates import hybrid_template
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/templates.py:1: in <module>
    from vespa.package import (
E   ModuleNotFoundError: No module named 'vespa'

This module should have been installed through pyvespa-package, which is added both in [tool.poetry.dependencies] and [tool.poetry.group.dev.dependencies], which I thought should be sufficient..?

nerdai · 2024-05-03T19:08:38Z

I could use another pair of eyes 👀 on why the unit tests fail. The error is an import failure during test collection:
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py:18: in <module>
    from llama_index.vector_stores.vespa.templates import hybrid_template
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/templates.py:1: in <module>
    from vespa.package import (
E   ModuleNotFoundError: No module named 'vespa'
This module should have been installed through pyvespa-package, which is added both in [tool.poetry.dependencies] and [tool.poetry.group.dev.dependencies], which I thought should be sufficient..?

@logan-markewich might this have anything to do with pants?

nerdai · 2024-05-03T19:09:16Z

@thomasht86 looks like we need to rerun make lint and make format as well 🙂

logan-markewich · 2024-05-03T22:03:50Z

@thomasht86 it could. Let me take a peek

thomasht86 · 2024-05-04T04:22:15Z

Ok, I ran the linting and formatting only in the integration-directory, forgetting about the example notebook. Sorry about that. Should be fixed now.
I also made an attempt at making the imports more explicit and wrapped in try/except to find out why the test fail.
Could we try to run checks again @nerdai / @logan-markewich ?🙏

thomasht86 · 2024-05-04T07:03:31Z

Ok, only unit tests left.
Full error:

==================================== ERRORS ====================================
_ ERROR collecting llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py _
ImportError while importing test module 'llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py:9: in <module>
    from vespa.application import ApplicationPackage
E   ModuleNotFoundError: No module named 'vespa'

As I mentioned, this module should be available through pyvespa installation, so either one of two causes seem possible to me:

The module is not installed properly.
There are some strange things going on with regards to the paths for the test runner, possibly related to one of the directories named vespa? (Although not a place in the directory that should interfere as I can see)

Do you have a suggestion for debugging actions?

thomasht86 · 2024-05-04T07:19:08Z

I tried to reinstall environment, ran poetry run make -s test from directory llama-index-vector-stores-vespa, and it runs fine.
Probably, docker isn`t available in the runner anyway, so crude solution would probably to move import after docker check statement, but I guess that's not what we want.

thomasht86 added 27 commits May 1, 2024 08:45

working

b624bcc

fixes to base

491c08e

add notebook draft

7f5fd2f

more text

7c46e5e

better doc and fix targetHits

8dd6985

fix

723d754

improve docstring

c6aa2d9

improve docs

ed1f213

update notebook

19fd485

add guard against empty embeddings

38ab426

rename templates

66a6898

templates

aadee3f

base update

c59dc59

bug in emb

5752c55

textnodes

ffe3cdd

fix hybrid

e02d321

make tests pass

90de643

more tests

9a6b011

more tests

830aa70

Merge branch 'thomasht86/add-vespa-vectorstore-integration' of https:…

aca6e6b

…//github.com/thomasht86/llama_index into thomasht86/add-vespa-vectorstore-integration

update README

285cdc2

clean pyproj

1d18504

working tests

6bfc62d

add pytest-asyncio

a7f328f

add skips if not docker available

67b6a72

log

b84d2fd

Better demo

8f9c0de

thomasht86 marked this pull request as ready for review May 2, 2024 09:34

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 2, 2024

thomasht86 added 2 commits May 2, 2024 12:52

make format

087a9b1

make lint

56e51ea

nerdai approved these changes May 2, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 2, 2024

thomasht86 added 6 commits May 2, 2024 17:26

fix lint

b9830d6

add pyvespa as dev-dep

f840b2c

remove poetry lockfile

0e745a7

aquery match query

5640298

Make use of namespace more clear

c214df1

arg to adelete

c6e0905

logan-markewich reviewed May 3, 2024

View reviewed changes

strip output from demonotebook

f74f96e

thomasht86 added 3 commits May 4, 2024 05:58

lint

ccde612

black reformat

be0206d

more explicit imports

2904772

logan-markewich added 3 commits May 4, 2024 14:59

fix tests

0711775

fix tests

4873c4b

fix tests

61ca433

logan-markewich approved these changes May 4, 2024

View reviewed changes

logan-markewich merged commit a613231 into run-llama:main May 4, 2024
8 checks passed

thomasht86 mentioned this pull request May 6, 2024

Vespa updates superlinked/VectorHub#375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VectorStore integration for Vespa #13213

Add VectorStore integration for Vespa #13213

thomasht86 commented May 2, 2024 •

edited

review-notebook-app bot commented May 2, 2024

nerdai left a comment

thomasht86 commented May 2, 2024 via email

thomasht86 commented May 3, 2024

logan-markewich May 3, 2024

thomasht86 May 3, 2024 •

edited

thomasht86 commented May 3, 2024

nerdai commented May 3, 2024 •

edited

nerdai commented May 3, 2024

logan-markewich commented May 3, 2024

thomasht86 commented May 4, 2024 •

edited

thomasht86 commented May 4, 2024

thomasht86 commented May 4, 2024

Add VectorStore integration for Vespa #13213

Add VectorStore integration for Vespa #13213

Conversation

thomasht86 commented May 2, 2024 • edited

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

review-notebook-app bot commented May 2, 2024

nerdai left a comment

Choose a reason for hiding this comment

thomasht86 commented May 2, 2024 via email

thomasht86 commented May 3, 2024

logan-markewich May 3, 2024

Choose a reason for hiding this comment

thomasht86 May 3, 2024 • edited

Choose a reason for hiding this comment

thomasht86 commented May 3, 2024

nerdai commented May 3, 2024 • edited

nerdai commented May 3, 2024

logan-markewich commented May 3, 2024

thomasht86 commented May 4, 2024 • edited

thomasht86 commented May 4, 2024

thomasht86 commented May 4, 2024

thomasht86 commented May 2, 2024 •

edited

thomasht86 May 3, 2024 •

edited

nerdai commented May 3, 2024 •

edited

thomasht86 commented May 4, 2024 •

edited