@kessler/embedding (WIP)

This module is built to allow progressive advancement from simple json based embedding database to more advanced solutions like chroma or redis.

quick start

import { loadProviders, Collection } from '@kessler/embedding'

async function main() {
  const { storage, embedders } = await loadProviders({ 
    fs: { directory: '/some/directory' },
    openai: { apiKey: 'openai key here' } 
  })

  const { fs } = storage
  const { openai } = embedders

  await fs.init()
  await openai.init()

  const collection = new Collection('test', openai, fs)
  
  await collection.add('hello world', { created: Date.now() })
  console.log(await collection.query('hello'))

  await fs.shutdown()
  await openai.shutdown()
}

main()

Collections

Collections are the highest abstraction layer. They group together documents, their embedding data and some optional metadata.

class Collection {
  constructor(name, embeddingService, storage) {}
  async query(text, { maxResults = Infinity, threshold = 0.8 }) {}
  async add(text, metadata) {}
  async delete(id) {}
  async get(id) {}
}

Providers

There are two categories for providers: embedding and storage. Embedding providers expose embedding services through a unified interface and storage providers do the same, just for storing and querying documents.

Providers can be loaded and created manually by importing their classes and instantiating them or they can be loaded through loadProviders (see below)

Once a provider is loaded you should call it's init method, regardless of wether you loaded it manually or through load providers. (TODO: i might want to change this behavior)

embedding provider

class Embedder {
  constructor(underlyingProvider, config) {}
  async exec(text, metadata) {}
  async init() {}
  async shutdown() {}
}

TODO: once a document is embedded with one service and stored, the embedding provider cannot be changed, if the embedding scheme is different in the new provider. This must be addressed some how in the design.

storage provider

class MyStorage {
  constructor(underlyingProvider, config) {}
  async query(collectionName, embedding, { maxResults, threshold }) {}
  async add(collectionName, content, embedding, metadata) {}
  async delete(collectionName, id) {}
  async get(collectionName, id) {}
  async init() {}
  async shutdown() {}
  async collections() {}
}

loading automatically

the intent of loadProviders is to load and instatiate any provider that can be loaded, meaning that their peer dependencies exist.

import { loadProviders } from '@kessler/embedding'

async function main() {
  const { storage, embedders } = await loadProviders({ /* ...providers config */ })
  const { pg } = storage
  const { openai } = embedders

  await pg.init()
  await openai.init()
}

main()

loading manually

TBD

embedding providers

openai embedder

Currently the only supported embedding service.

run npm install openai

import { loadProviders } from '@kessler/embedding'

async function main() {
  const { embedders, storage } = await loadProviders({ 
    openai: { apiKey: 'your-api-key' } 
  })

  const { openai } = embedders
  await openai.init()

  // do stuff
  await openai.shutdown()
}

main()

storage providers

File System storage

The simplest non optimized solution, collections are saved on the file system in json files.

Embedding is matched by going through all the existing documents, so not very scalable.

I have plans to implement a better algorithm in the future.

import { loadProviders } from '@kessler/embedding'

async function main() {
  const { embedders, storage } = await loadProviders({ 
    fs: { directory: '/some/path/to/embedding-db' },
  })

  const { fs } = storage
  await fs.init()

  // do stuff
  await fs.shutdown()
}

main()

Postgresql storage

Uses postgresql database with pgvector extension installed.

run npm install pg pgvector (mind the peer dependency versions)

import { loadProviders } from './index.mjs'

async function main() {

  const { embedders, storage } = await loadProviders({ 
    // there are defaults though, database "embedding", localhost, root and no password
    pg: {
      databaseConfig: {
        database: 'embedding',
        user: 'root',
        password: 'shhhhhhhhhhh'
      }
    }
  })
  
  const { pg } = storage
  await pg.init()
  
  // do stuff
  await pg.shutdown()
}

main()

Redis storage

TBD

Chroma storage

TBD

resources

https://supabase.com/blog/openai-embeddings-postgres-vector

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
lib		lib
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.mjs		index.mjs
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@kessler/embedding (WIP)

quick start

Collections

Providers

embedding provider

storage provider

loading automatically

loading manually

embedding providers

openai embedder

storage providers

File System storage

Postgresql storage

Redis storage

Chroma storage

resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kessler/node-embedding

Folders and files

Latest commit

History

Repository files navigation

@kessler/embedding (WIP)

quick start

Collections

Providers

embedding provider

storage provider

loading automatically

loading manually

embedding providers

openai embedder

storage providers

File System storage

Postgresql storage

Redis storage

Chroma storage

resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages