Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add filter support for Prisma #1234

Merged
merged 13 commits into from
Jun 3, 2023
Merged

feat: Add filter support for Prisma #1234

merged 13 commits into from
Jun 3, 2023

Conversation

erictt
Copy link
Contributor

@erictt erictt commented May 12, 2023

Closes #1426

@vercel
Copy link

vercel bot commented May 12, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 30, 2023 9:17am

@erictt erictt marked this pull request as ready for review May 12, 2023 04:23
langchain/src/vectorstores/prisma.ts Outdated Show resolved Hide resolved
langchain/src/vectorstores/prisma.ts Outdated Show resolved Hide resolved
langchain/src/vectorstores/prisma.ts Outdated Show resolved Hide resolved
@dqbd dqbd added question Further information is requested in progress PRs that are in progress but not ready to merge labels May 25, 2023
@dqbd dqbd self-assigned this May 25, 2023
@dqbd
Copy link
Collaborator

dqbd commented May 29, 2023

Hi! I've done some small adjustments and polish, addressing some of the pain points mentioned in the review above.

Notably:

  • Using Prisma.join and Prisma.sql, we can send values separately without string interpolation, reducing the risk of SQL injection.
  • Filters have been extended to support multiple operators (=, <>, <, <=, >=, >), modelling the same filter structure similar to how where works in Prisma
  • Migration and schema has been fixed

@dqbd dqbd removed question Further information is requested in progress PRs that are in progress but not ready to merge labels May 29, 2023
this.Prisma.join(
[
this.Prisma.sql`
SELECT ${this.selectSql}, ${this.vectorColumnSql} <=> ${vectorQuery}::vector as "_distance"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok because everything here is generated in from a config file right or a vector array right?

I still think it would be nice to parameterize things for best practices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does this.Prisma.sql magically sanitize somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually I think if someone wasn't using TypeScript (or ignored warnings) there's an attack where someone could pass in some argument to this method like:

["* FROM tables; DROP TABLES; SELECT"]; 

Can we sanitize this?

Copy link
Contributor Author

@erictt erictt May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base on the doc:

$executeRaw can only run one query at a time. You cannot append a second query - for example, adding DROP bobby_tables to the end of an ALTER.

Although it didn't mention queryRaw, i found a related issue says:

queryRaw(sql) and executeRaw(sql) will only be executing a single statement, which means some injections won't work (the one with multiple queries trying to drop tables for example) but some will (unveiling additional data using UNION for example)

So I don't think this will be a problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Parameters are by default sanitised unless they are wrapped in a Prisma.raw call.

There are some dynamic fields, which cannot be parametrised (such as table names and column names). Moved these Prisma.raw calls closer to the executeRaw and queryRaw invocations alongside a note in docs to warn users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Using a single statement helps but I'm sure there's still sneaky issues and damage some bad metadata on a loaded document could do even in a single query, so please escape here:

https://github.com/hwchase17/langchainjs/pull/1234/files#diff-9bc13719b32d4ab3d65a0d4f561ae4427f1c746ad7d140e30c0005b4e0d91018R271

Raw SQL from user provided fields is also really not ideal but at least it'll be the user's own fault if something happens there.

Copy link
Collaborator

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions but looks good

@jacoblee93 jacoblee93 added question Further information is requested lgtm PRs that are ready to be merged as-is close PRs that need one or two touch-ups to be ready and removed lgtm PRs that are ready to be merged as-is labels May 29, 2023
WHERE ${idSql} = ${documents[idx].metadata[this.idColumn]}
UPDATE ${tableNameRaw}
SET ${vectorColumnRaw} = ${`[${vector.join(",")}]`}::vector
WHERE ${idColumnRaw} = ${documents[idx].metadata[this.idColumn]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if documents[idx].metadata[this.idColumn] can't be trusted? E.g. if you're loading documents in from a different source?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right didn't see the other comment about executeRaw only being able to execute one query. I still don't love this and it should be escaped.

Copy link
Collaborator

@dqbd dqbd Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict documents[idx].metadata[this.idColumn] and [${vector.join(",")}] is still escaped and parametrised, as Prisma uses https://github.com/blakeembrey/sql-template-tag behind the scenes :)

@dqbd dqbd requested a review from jacoblee93 June 2, 2023 16:20
@jacoblee93
Copy link
Collaborator

Thanks for the patience here, will merge with the next release!

@jacoblee93 jacoblee93 added lgtm PRs that are ready to be merged as-is and removed question Further information is requested close PRs that need one or two touch-ups to be ready labels Jun 2, 2023
@jacoblee93 jacoblee93 merged commit 28fc4bc into langchain-ai:main Jun 3, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PRs that are ready to be merged as-is
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants