Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VectorStore integration for Vespa #13213

Conversation

thomasht86
Copy link
Contributor

@thomasht86 thomasht86 commented May 2, 2024

Description

This PR adds integration for Vespa as llama-index-vector-store.
New dependencies introduced are pyvespa.

Fixes #8099

New Package?

Yes.

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@thomasht86 thomasht86 marked this pull request as ready for review May 2, 2024 09:34
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 2, 2024
Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a tonne @thomasht86! The code looks pretty good. I left a few minor comments.

Looks like we just need to get checks to pass. Can you run make lint and make format and the commit and push again?

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 2, 2024
@thomasht86
Copy link
Contributor Author

thomasht86 commented May 2, 2024 via email

@thomasht86
Copy link
Contributor Author

Thanks for great review @nerdai. Fixes should have been made now. Let`s see if checks pass :)

raise result.exception
return ids

def delete(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment on delete -- its meant to be a filtered delete, deleting any node that has ref_doc_id == ref_doc_id in the metadata -- is this possible with vespa?

Copy link
Contributor Author

@thomasht86 thomasht86 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so if you look in the add-method, you'll see that the node_id (which is the same as the rec_doc_id) is added as Vespa internal doc-id, so this will delete only those.
Metadata filtering is also possible, but will require a more complex Schema-definition.

Adding more templates and examples for this will be on the list for next iterations.

@thomasht86
Copy link
Contributor Author

I could use another pair of eyes 👀 on why the unit tests fail.
The error is an import failure during test collection:

llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py:18: in <module>
    from llama_index.vector_stores.vespa.templates import hybrid_template
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/templates.py:1: in <module>
    from vespa.package import (
E   ModuleNotFoundError: No module named 'vespa'

This module should have been installed through pyvespa-package, which is added both in [tool.poetry.dependencies] and [tool.poetry.group.dev.dependencies], which I thought should be sufficient..?

@nerdai
Copy link
Contributor

nerdai commented May 3, 2024

I could use another pair of eyes 👀 on why the unit tests fail. The error is an import failure during test collection:

llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/base.py:18: in <module>
    from llama_index.vector_stores.vespa.templates import hybrid_template
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/llama_index/vector_stores/vespa/templates.py:1: in <module>
    from vespa.package import (
E   ModuleNotFoundError: No module named 'vespa'

This module should have been installed through pyvespa-package, which is added both in [tool.poetry.dependencies] and [tool.poetry.group.dev.dependencies], which I thought should be sufficient..?

@logan-markewich might this have anything to do with pants?

@nerdai
Copy link
Contributor

nerdai commented May 3, 2024

@thomasht86 looks like we need to rerun make lint and make format as well 🙂

@logan-markewich
Copy link
Collaborator

@thomasht86 it could. Let me take a peek

@thomasht86
Copy link
Contributor Author

thomasht86 commented May 4, 2024

Ok, I ran the linting and formatting only in the integration-directory, forgetting about the example notebook. Sorry about that. Should be fixed now.
I also made an attempt at making the imports more explicit and wrapped in try/except to find out why the test fail.
Could we try to run checks again @nerdai / @logan-markewich ?🙏

@thomasht86
Copy link
Contributor Author

Ok, only unit tests left.
Full error:

==================================== ERRORS ====================================
_ ERROR collecting llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py _
ImportError while importing test module 'llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
llama-index-integrations/vector_stores/llama-index-vector-stores-vespa/tests/test_vespavectorstore.py:9: in <module>
    from vespa.application import ApplicationPackage
E   ModuleNotFoundError: No module named 'vespa'

As I mentioned, this module should be available through pyvespa installation, so either one of two causes seem possible to me:

  1. The module is not installed properly.
  2. There are some strange things going on with regards to the paths for the test runner, possibly related to one of the directories named vespa? (Although not a place in the directory that should interfere as I can see)

Do you have a suggestion for debugging actions?

@thomasht86
Copy link
Contributor Author

I tried to reinstall environment, ran poetry run make -s test from directory llama-index-vector-stores-vespa, and it runs fine.
Probably, docker isn`t available in the runner anyway, so crude solution would probably to move import after docker check statement, but I guess that's not what we want.

@logan-markewich logan-markewich merged commit a613231 into run-llama:main May 4, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Vespa VectorDB Connection
3 participants