Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added filtering ability to supabase #905

Merged
merged 7 commits into from
May 5, 2023

Conversation

mishkinf
Copy link
Contributor

@mishkinf mishkinf commented Apr 20, 2023

Added filtering ability to supabase. With the following function defined in your supabase postgres db:

CREATE FUNCTION match_documents_with_filters (
  query_embedding vector(1536),
  match_count int,
  filter jsonb DEFAULT '{}'
) RETURNS TABLE (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
#variable_conflict use_column
BEGIN
  RETURN QUERY
  SELECT
    id,
    content,
    metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  FROM
    documents
  WHERE
    -- Check for user_id filter
    jsonb_exists(metadata, 'user_id') AND metadata->>'user_id' = filter->>'user_id'
    -- Add additional filters here using the same pattern
  ORDER BY
    documents.embedding <=> query_embedding
  LIMIT
    match_count;
END;
$$;

You should be able to filter the documents based on fields in your document metadata:

export const query = async (query) => {
  const chat = new ChatOpenAI({
    modelName: "gpt-3.5-turbo",
    apiKey: OPEN_AI_API_KEY,
  });

  const vectorStore = await SupabaseVectorStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    {
      client,
      tableName: "documents",
      queryName: "match_documents_with_filters",
    }
  );

  const chain = ConversationalRetrievalQAChain.fromLLM(
    chat,
    vectorStore.asRetriever(null, { user_id: "2" }),
    { returnSourceDocuments: true }
  );

  const res = await chain.call({
    question: query,
    chat_history: [],
  });

  console.log(res.text);
  console.log({ docs: res.sourceDocuments });

  return res;
};

my twitter handle: https://twitter.com/mishkinf

@vercel
Copy link

vercel bot commented Apr 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 5, 2023 4:07pm

@mishkinf mishkinf changed the title Added filtering ability to supabase feat: Added filtering ability to supabase Apr 20, 2023
@ShantanuNair
Copy link
Contributor

ShantanuNair commented Apr 20, 2023

@mishkinf @nfcampos Here's an alternate implementation, let me know what you think?

In addVectors, we take in additional fields to write to in the table. So, we get to take advantage of Supabase db schema constraints.
In similaritySearchVectorWithScore these extra fields an be passed in to the rpc call. The rpc function will have to be defined by the user with the additional fields in the args, and it will use them accordingly in the pgsql function.

If they want to have different querying methods completely, they can have different matcher functions with different names, each with the fields being passed in as args. This way they get the benefits of the structured schemas of supabase, and get the flexibility to implement lookup in an efficient manner per match function.

@mishkinf
Copy link
Contributor Author

@ShantanuNair no strong opinions on my end, I merely need the functionality. One thing to note is that you could start with this implementation which essentially creates a RPC friendly way to pass filters to a supabase function, and then later on build in support to create new supabase database columns (with whatever schema constraints and functionality). The arguments of filters to the pgsql function is agnostic of the underlying data you are filtering the documents by. So in this case I am filtering data that is in metadata, but you could take this approach and use the same interface to filter the documents based on whatever db fields you have.

@ShantanuNair
Copy link
Contributor

@mishkinf Makes sense, soon I will create a PR with the functionality I suggested as well, you can take a look if you have the time

@mishkinf
Copy link
Contributor Author

I think that would be great! @ShantanuNair

@tedspare
Copy link

tedspare commented Apr 28, 2023

Looking forward to this!

The filter param could also be generalized to accept any metadata field with WHERE metadata @> filter as follows:

CREATE FUNCTION match_documents_with_filters (
  query_embedding vector(1536),
  match_count int,
  filter jsonb DEFAULT '{}'
) RETURNS TABLE (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
#variable_conflict use_column
BEGIN
  RETURN QUERY
  SELECT
    id,
    content,
    metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  FROM
    documents
  WHERE
    metadata @> filter
  ORDER BY
    documents.embedding <=> query_embedding
  LIMIT
    match_count;
END;
$$;

with usage:

...
const chain = ConversationalRetrievalQAChain.fromLLM(
    chat,
    vectorStore.asRetriever(null, {
        user_id: "2",
        repo: "langchain" // or any metadata field
    }),
    { returnSourceDocuments: true }
  );
...

etc.

Hope this helps!

@liamcharmer
Copy link

Honestly this is so fricken cool! I hope this happens!

@pnutmath
Copy link

pnutmath commented May 1, 2023

Looking forward this filtering feature in Supabase!

@mishkinf PR, Looks good to me!

@mz1979
Copy link

mz1979 commented May 1, 2023

Let's do this! Thanks for the PR @mishkinf

@JimmyLv
Copy link

JimmyLv commented May 4, 2023

yes! need this feature!

@jacoblee93 jacoblee93 self-assigned this May 4, 2023
@jacoblee93 jacoblee93 added the lgtm PRs that are ready to be merged as-is label May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PRs that are ready to be merged as-is
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants