Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGVector as vector db #718

Closed
Filo01 opened this issue Nov 14, 2023 · 2 comments
Closed

PGVector as vector db #718

Filo01 opened this issue Nov 14, 2023 · 2 comments

Comments

@Filo01
Copy link

Filo01 commented Nov 14, 2023

I'm trying to use PGVector to store the embeddings produced by this crawler

I've added this in the crawler.yaml:

assets:
  - name: "documents-table"
    asset-type: "jdbc-table"
    creation-mode: create-if-not-exists
    config:
      table-name: "documents"
      datasource: "JdbcDatasource"
      create-statements:
        - |
          CREATE EXTENSION IF NOT EXISTS vector;
          CREATE TABLE documents (
          filename TEXT,
          chunk_id int,
          num_tokens int,
          lang TEXT,
          text TEXT,
          embeddings_vector vector,
          PRIMARY KEY (filename, chunk_id));

and modified this

- name: "Write"
    type: "vector-db-sink"
    input: chunks-topic
    configuration:
      datasource: "JdbcDatasource"
      table-name: "documents"
      fields:
        - name: "filename"
          expression: "value.filename"
          primary-key: true
        - name: "chunk_id"
          expression: "fn:toInt(value.chunk_id)"
          primary-key: true
        - name: "embeddings_vector"
          expression: "fn:toListOfFloat(value.embeddings_vector)"
        - name: "lang"
          expression: "value.language"
        - name: "text"
          expression: "value.text"
        - name: "num_tokens"
          expression: "value.chunk_num_tokens"

but I'm having this error when doing the INSERT
ai.langstream.runtime.agent.AgentRunner$PermanentFailureException: org.postgresql.util.PSQLException: Can't infer the SQL type to use for an instance of java.util.ArrayList. Use setObject() with an explicit Types value to specify the type to use.

I also tried adding this in configuration.yaml:

  dependencies:
   ...
    - name: "PGVector"
      url: "https://repo1.maven.org/maven2/com/pgvector/pgvector/0.1.3/pgvector-0.1.3.jar"
      sha512sum: "84c37b191e20730a143963108aa11f5c4c28831b8c2637c1d639d2aaaa44f4803461e5b68c353ea51ba8abddde837796ebad65572aa2ac95c0fcbfd76cbf73d1"
      type: "java-library"

And creating a PGVector via EL like this

        - name: "embeddings_vector"
          expression: "new com.pgvector.PGvector(fn:toListOfFloat(value.embeddings_vector))"```

but it's probably invalid as EL and I'm not sure how to proceed from there

@Filo01
Copy link
Author

Filo01 commented Nov 14, 2023

I solved it by avoiding the expression language completely by doing it this way instead:

  - name: "Write"
    type: "query"
    input: chunks-topic
    configuration:
      datasource: "JdbcDatasource"
      mode: "execute"
      query: |
        INSERT INTO documents(filename,chunk_id,num_tokens,lang,text,embeddings_vector) values ( ? , ? , ? , ? , ? , CAST( ? as vector ) );
      output-field: "value.command_result"
      fields:
        - "value.filename"
        - "fn:toInt(value.chunk_id)"
        - "fn:toInt(value.chunk_num_tokens)"
        - "value.language"
        - "value.text"
        - "fn:toString(fn:toListOfFloat(value.embeddings_vector))"

@Filo01 Filo01 closed this as completed Nov 14, 2023
@eolivelli
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants