New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating AWS DocumentDB as a vector storage method #12217
Conversation
…es-awsdocdb/llama_index/vector_stores/awsdocdb.py
Our unit tests require one to input their own AWS credentials to run. In order to avoid exposing AWS secrets, we didn't include ours as part of the PR. Is there some way we can get approved without having these tests pass? |
...tor_stores/llama-index-vector-stores-awsdocdb/llama_index/vector_stores/awsdocdb/awsdocdb.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be good to add an example notebook in docs/docs/examples/vector_stores
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but still would be nice to have an example so people know how to use this :)
The tests should mock out API requests, or should be skipped properly if creds are missing |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@logan-markewich is there anything else we need to add so that this can be merged? thanks for guiding us through the process |
@prajwalsaokar the tests were a little wonky -- we need to mark them as skip (which is what I did), if we aren't going to mock the API calls. Also cleaned up a few other things, should be good to go now |
@prajwalsaokar ngl the tests are pretty borked -- do you mind taking a look and ensuring either |
We just tried the tests on two machines, they should work it's just a bit of hassle to set them up because you need a DocumentDB instance and an EC2 instance thats connected to the DB instance with the correct networking config for the tests to run properly. Could you send over the errors you're getting? |
We also think we implemented skipping correctly, but we didn't come up with a way to skip the tests if the EC2 configuration isn't correct since that's not visible to LlamaIndex. Can you clarify what changes you think we should make on the skip/mock front? |
It looks like we should follow something like Cosmos or Postgres. They simply try to connect to DB and if there is a failure, they set a "*_not_available" variable which is used to skip the test. From a quick glance at tests written for a few vector stores, it doesn't look like mocking the API calls is typically done. I'll leave it up to Logan to provide requirements for the merge, but just wanted to leave this comment first. |
Correct, usually you can do some sort of check and based on the condition, skip or not. Mocking would allow the tests to actually run, in cicd, but it's usually more work (hence why other vectordbs don't have it) I'll leave it up to you since this is your integration 👍🏻 either is fine by me Once cicd passes here I will merge |
Seems there is an incorrect import, maybe take a double check For example, should be |
Tests are still failing with the same import error 😓 Does these actually run locally for you? I'd be very surprised if they did. I'll push a fix |
Thanks Logan for the help. Thank you Prajwal and Vidur for your contribution. I will try this out soon. |
Thank you so much for all the help Logan, this was my first open source contribution and I learned a lot throughout this process. I'm very grateful for the time you spent ensuring that this integration could go through. |
@prajwalsaokar thank you for the integration, seriously saved me a ton of work. Quick question though: I noticed you added the pre-filter in your code, but I couldn't find anything about pre-filtering in DocumentDB docs for vector search. Did you check if it is working on DocDB? Thanks again! |
Description
This change adds a vector store integration for AWS Document DB. @vidur2 and I saw an issue opened to ask for this integration in the LangChain repository and believed that LLamaIndex users would find it useful as well given that we were using it for a project. There are no new dependencies, this integration just requires the pymongo library.
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods