Integrating AWS DocumentDB as a vector storage method #12217

prajwalsaokar · 2024-03-24T20:50:04Z

Description

This change adds a vector store integration for AWS Document DB. @vidur2 and I saw an issue opened to ask for this integration in the LangChain repository and believed that LLamaIndex users would find it useful as well given that we were using it for a project. There are no new dependencies, this integration just requires the pymongo library.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added new unit/integration tests
[] Added new notebook (that tests end-to-end)
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

…es-awsdocdb/llama_index/vector_stores/awsdocdb.py

vidur2 · 2024-03-26T03:40:35Z

Our unit tests require one to input their own AWS credentials to run. In order to avoid exposing AWS secrets, we didn't include ours as part of the PR. Is there some way we can get approved without having these tests pass?

...tor_stores/llama-index-vector-stores-awsdocdb/llama_index/vector_stores/awsdocdb/awsdocdb.py

logan-markewich · 2024-03-26T16:09:12Z

docs/docs/community/integrations/vector_stores.md

It might also be good to add an example notebook in docs/docs/examples/vector_stores ?

logan-markewich

lgtm, but still would be nice to have an example so people know how to use this :)

logan-markewich · 2024-03-29T15:45:28Z

The tests should mock out API requests, or should be skipped properly if creds are missing

vidur2 · 2024-04-01T22:27:11Z

The tests should mock out API requests, or should be skipped properly if creds are missing
Should be fixed.

review-notebook-app · 2024-04-13T08:11:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

prajwalsaokar · 2024-04-13T08:12:10Z

@logan-markewich is there anything else we need to add so that this can be merged? thanks for guiding us through the process

logan-markewich · 2024-04-15T16:52:31Z

@prajwalsaokar the tests were a little wonky -- we need to mark them as skip (which is what I did), if we aren't going to mock the API calls. Also cleaned up a few other things, should be good to go now

logan-markewich · 2024-04-15T17:27:14Z

@prajwalsaokar ngl the tests are pretty borked -- do you mind taking a look and ensuring either
a) things are properly mocked (api calls, etc.)
b) things are properly skipped when not mocked and things are missing
c) that tests actually work (I was hitting all kinds of errors actually haha)

prajwalsaokar · 2024-04-15T21:56:16Z

@prajwalsaokar ngl the tests are pretty borked -- do you mind taking a look and ensuring either a) things are properly mocked (api calls, etc.) b) things are properly skipped when not mocked and things are missing c) that tests actually work (I was hitting all kinds of errors actually haha)

We just tried the tests on two machines, they should work it's just a bit of hassle to set them up because you need a DocumentDB instance and an EC2 instance thats connected to the DB instance with the correct networking config for the tests to run properly. Could you send over the errors you're getting?

prajwalsaokar · 2024-04-15T22:00:19Z

We also think we implemented skipping correctly, but we didn't come up with a way to skip the tests if the EC2 configuration isn't correct since that's not visible to LlamaIndex. Can you clarify what changes you think we should make on the skip/mock front?

samkhano1 · 2024-04-19T17:23:33Z

It looks like we should follow something like Cosmos or Postgres. They simply try to connect to DB and if there is a failure, they set a "*_not_available" variable which is used to skip the test. From a quick glance at tests written for a few vector stores, it doesn't look like mocking the API calls is typically done. I'll leave it up to Logan to provide requirements for the merge, but just wanted to leave this comment first.

logan-markewich · 2024-04-19T18:46:46Z

Correct, usually you can do some sort of check and based on the condition, skip or not.

Mocking would allow the tests to actually run, in cicd, but it's usually more work (hence why other vectordbs don't have it)

I'll leave it up to you since this is your integration 👍🏻 either is fine by me

Once cicd passes here I will merge

logan-markewich · 2024-04-20T03:34:47Z

Seems there is an incorrect import, maybe take a double check

For example, should be from llama_index.core.schema import...

logan-markewich · 2024-04-22T20:41:13Z

Tests are still failing with the same import error 😓 Does these actually run locally for you? I'd be very surprised if they did. I'll push a fix

ip9inderpreet · 2024-04-24T19:40:42Z

Thanks Logan for the help. Thank you Prajwal and Vidur for your contribution. I will try this out soon.

prajwalsaokar · 2024-04-26T00:09:47Z

Thank you so much for all the help Logan, this was my first open source contribution and I learned a lot throughout this process. I'm very grateful for the time you spent ensuring that this integration could go through.

jpcoutinho · 2024-05-06T08:43:58Z

@prajwalsaokar thank you for the integration, seriously saved me a ton of work. Quick question though: I noticed you added the pre-filter in your code, but I couldn't find anything about pre-filtering in DocumentDB docs for vector search. Did you check if it is working on DocDB? Thanks again!

prajwalsaokar and others added 13 commits March 1, 2024 21:06

add docdb tests to new llama_index fork

af2d12f

Update test_docdb.py

548c478

add awsdocdb.py to new fork

827d242

Create awsdocdb.py

69f2858

Delete llama-index-integrations/vector_stores/llama-index-vector-stor…

4020b4b

…es-awsdocdb/llama_index/vector_stores/awsdocdb.py

Update awsdocdb.py

91d621f

added stuff

2608462

Update README.md

0f39026

Update pyproject.toml

05cb1a6

Update pyproject.toml

113df86

Merge branch 'run-llama:main' into main

60897b7

Update vector_stores.md with AWS DocDB

3e3adc8

updated euclidean for lint

9980d38

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 24, 2024

ran pants tailor

ff0d12b

logan-markewich reviewed Mar 26, 2024

View reviewed changes

...tor_stores/llama-index-vector-stores-awsdocdb/llama_index/vector_stores/awsdocdb/awsdocdb.py Outdated Show resolved Hide resolved

logan-markewich reviewed Mar 26, 2024

View reviewed changes

vidur2 added 2 commits March 27, 2024 22:11

better

e82ad9d

merged

adfa612

logan-markewich reviewed Mar 29, 2024

View reviewed changes

skip tests if no config

64fc41e

fixed up example nb

edfd54c

changes

175c0bc

changes

25b31a1

logan-markewich added 2 commits April 15, 2024 11:01

changes

b1f140a

final changes

004171d

vidur2 added 2 commits April 19, 2024 15:05

added skipping

2e66bca

fixed bad skipping

ec014ba

prajwalsaokar and others added 3 commits April 20, 2024 01:31

fixed import

a928746

fixed import possibly

6994934

fixed import again

2772a3e

fixes

1166985

logan-markewich approved these changes Apr 22, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 22, 2024

logan-markewich merged commit 3232a1f into run-llama:main Apr 22, 2024
8 checks passed

chrisalexiuk-nvidia pushed a commit to chrisalexiuk-nvidia/llama_index that referenced this pull request Apr 25, 2024

Integrating AWS DocumentDB as a vector storage method (run-llama#12217)

69c27c1

mattf pushed a commit to mattf/llama_index that referenced this pull request Apr 25, 2024

Integrating AWS DocumentDB as a vector storage method (run-llama#12217)

b97133f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating AWS DocumentDB as a vector storage method #12217

Integrating AWS DocumentDB as a vector storage method #12217

prajwalsaokar commented Mar 24, 2024 •

edited

vidur2 commented Mar 26, 2024

logan-markewich Mar 26, 2024

logan-markewich left a comment

logan-markewich commented Mar 29, 2024

vidur2 commented Apr 1, 2024

review-notebook-app bot commented Apr 13, 2024

prajwalsaokar commented Apr 13, 2024

logan-markewich commented Apr 15, 2024

logan-markewich commented Apr 15, 2024

prajwalsaokar commented Apr 15, 2024

prajwalsaokar commented Apr 15, 2024

samkhano1 commented Apr 19, 2024

logan-markewich commented Apr 19, 2024

logan-markewich commented Apr 20, 2024

logan-markewich commented Apr 22, 2024

ip9inderpreet commented Apr 24, 2024

prajwalsaokar commented Apr 26, 2024

jpcoutinho commented May 6, 2024

Integrating AWS DocumentDB as a vector storage method #12217

Integrating AWS DocumentDB as a vector storage method #12217

Conversation

prajwalsaokar commented Mar 24, 2024 • edited

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

vidur2 commented Mar 26, 2024

logan-markewich Mar 26, 2024

Choose a reason for hiding this comment

logan-markewich left a comment

Choose a reason for hiding this comment

logan-markewich commented Mar 29, 2024

vidur2 commented Apr 1, 2024

review-notebook-app bot commented Apr 13, 2024

prajwalsaokar commented Apr 13, 2024

logan-markewich commented Apr 15, 2024

logan-markewich commented Apr 15, 2024

prajwalsaokar commented Apr 15, 2024

prajwalsaokar commented Apr 15, 2024

samkhano1 commented Apr 19, 2024

logan-markewich commented Apr 19, 2024

logan-markewich commented Apr 20, 2024

logan-markewich commented Apr 22, 2024

ip9inderpreet commented Apr 24, 2024

prajwalsaokar commented Apr 26, 2024

jpcoutinho commented May 6, 2024

prajwalsaokar commented Mar 24, 2024 •

edited