Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LanceDB vectorstore #1597

Closed

Conversation

gsilvestrin
Copy link
Contributor

@gsilvestrin gsilvestrin commented Jun 9, 2023

Adds support to LanceDB, an embedded vector database.

It should behave as the Python Integration

I believe the Vercel failure is unrelated to my change, is caused by langchainplus:

[INFO] [en] Creating an optimized production build...
../langchain/src/client/langchainplus.ts:1:46 - error TS2307: Cannot find module 'langchainplus-sdk' or its corresponding type declarations.
1 import { Example, LangChainPlusClient } from "langchainplus-sdk";
                                               ~~~~~~~~~~~~~~~~~~~
../langchain/src/client/langchainplus.ts:175:25 - error TS7006: Parameter 'example' implicitly has an 'any' type.
175     examples.map(async (example) => {
                            ~~~~~~~
../langchain/src/client/tests/langchainplus.int.test.ts:3:37 - error TS2307: Cannot find module 'langchainplus-sdk' or its corresponding type declarations.
3 import { LangChainPlusClient } from "langchainplus-sdk";
                                      ~~~~~~~~~~~~~~~~~~~
../langchain/src/client/tests/langchainplus.int.test.ts:17:22 - error TS7006: Parameter 'd' implicitly has an 'any' type.
17   if (!datasets.map((d) => d.name).includes(datasetName)) {
                        ~
../langchain/src/client/tests/langchainplus.int.test.ts:51:22 - error TS7006: Parameter 'd' implicitly has an 'any' type.
51   if (!datasets.map((d) => d.name).includes(datasetName)) {
                        ~
../langchain/src/client/tests/langchainplus.int.test.ts:77:24 - error TS7006: Parameter 's' implicitly has an 'any' type.
77   expect(sessions.map((s) => s.name)).toContain(sessionName);
                          ~
../langchain/src/client/tests/langchainplus.int.test.ts:102:22 - error TS7006: Parameter 'd' implicitly has an 'any' type.
102   if (!datasets.map((d) => d.name).includes(fileName)) {
                         ~
../langchain/src/client/tests/langchainplus.int.test.ts:143:22 - error TS7006: Parameter 'd' implicitly has an 'any' type.
143   if (!datasets.map((d) => d.name).includes(datasetName)) {

@vercel
Copy link

vercel bot commented Jun 9, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ❌ Failed (Inspect) Jun 14, 2023 4:22pm

Copy link
Collaborator

@dqbd dqbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution and sorry for the delay! Just minor nitpicks here and there, but lgtm otherwise :)

"ts-jest": "^29.1.0",
"typeorm": "^0.3.12",
"typescript": "^5.0.0",
"vectordb": "^0.1.4",
"typesense": "^1.5.3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like Yarn reorders the dependencies and creates a new lockfile, would it be possible to run yarn once again?

@@ -57,6 +57,7 @@ const entrypoints = {
"vectorstores/hnswlib": "vectorstores/hnswlib",
"vectorstores/faiss": "vectorstores/faiss",
"vectorstores/weaviate": "vectorstores/weaviate",
"vectorstores/lancedb": "vectorstores/lancedb",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like node langchain/scripts/create-entrypoints.js has not been executed yet. Would it be possible to run these commands?

const data: Array<Record<string, unknown>> = [];
for (let i = 0; i < documents.length; i += 1) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const record: Record<string, any> = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: preferably use Record<string, unknown> instead.

const docsAndScore: [Document, number][] = [];
results.forEach((item) => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const metadata: Record<string, any> = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: preferably, use unknown instead of any whenever possible.

test("constructor works", async () => {
const lancedb = new LanceDB(new FakeEmbeddings(), {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
table: new Table(null as any, "vectors"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: afaict first argument of Table is already any?


test("should add vectors to the table", async () => {
const embeddings = new FakeEmbeddings();
const dir = await track().mkdir("lancedb");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to import track as an another dev-dependency? Could we just remove the temp directory after the test completes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @dqbd! I can refactor the code and remove the track dependency. I saw that the unit test was not included in #1787, were there any problems other the two comments raised here? I can clean up and submit it in another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not pass our CI, build seemed to fail on old Macs and Windows. It's all good to have it as an integration test IMO.

@jacoblee93
Copy link
Collaborator

Thanks @gsilvestrin and @dqbd - I've made a few fixes and taken the comments into account. Will aim to have the extended PR #1787 merged as part of tomorrow's release!

@jacoblee93
Copy link
Collaborator

Merged in #1787

@jacoblee93 jacoblee93 closed this Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants