Skip to content

ianmacartney/embeddings-in-convex

Repository files navigation

Embeddings Playground with Pinecone, OpenAI, and Convex

An example of working with embeddings and vector databases in Convex.

Embeddings enable all sorts of use cases, but it's hard to know how they'll perform on comparisons and queries without playing around with them.

This project allows you to add source data, generate embeddings via OpenAI, compare them to each other, and compare semantic and word searches over them.

You can then use the queried source data to include in a ChatGPT prompt (WIP).

UI:

  • React
  • Tailwindcss
  • Rewind-UI
  • Vite

Backend:

  • Pinecone for storing and querying vector embeddings.
  • OpenAI API for creating vector embeddings.
  • Convex for storing application data and running server-side functions.

Work planned:

  • Add a python script that scrapes URLs and imports the data.
  • Add a node script that imports local files (.pdf, .md, .txt).
  • Allow picking which sources to use in a ChatGPT prompt, and what template to use, to iterate on templates.
  • Configuration to fetch the most 20, 40, or 80 documents when searching (hard-coded to 10 currently).

Setup

Prerequisites:

  1. A Convex backend: it will be configured automatically on npm run dev. By running this first, you can enter environment variables for (2) and (3) on the dashboard.

  2. A Pinecone API Key and Index. Free to start. The only important configuration is to set the vector length to 1536 Environment variables:

    • PINECONE_INDEX_NAME (for me, embeddings-playground)
    • PINECONE_ENVIRONMENT (for me, asia-southeast1-gcp-free)
    • PINECONE_API_KEY (a uuid, don't share this publicly)
  3. An OpenAI API key. Environment variable: OPEN_API_KEY (should start with sk-).

Run:

npm install
npm run dev

Upload sources from a URL

You can add a source from a URL using the scripts/addURL.py python script:

pip install dotenv convex langchain
python scripts/addURL.py https://example.com

Upload sources from a folder

You can add .txt, .md, and .pdf files as sources to your project via:

export CONVEX_URL= # your backend url - see .env.local (dev) or .env (prod)
npx ts-node-esm scripts/addFiles.ts ./path/to/folder

By default it'll check in a documents folder at the root of the repo. It will upload in chunks

About

An example of working with embeddings and the Pinecone vector database in Convex.

Resources

Stars

Watchers

Forks