Skip to content

steve-piece/vector-codebase

Repository files navigation

Vector-Codebase: A Semantic Database for Code

🚀 Generate vector embeddings for your codebase and store them in Supabase for AI-powered code search and understanding.


🎯 Quick Setup

Prerequisites: Node.js, npm/pnpm/yarn, Supabase project, OpenAI API key

📦 Step 1: Install Workflow

Run from your project root:

One-liner installation:

curl -sSL https://raw.githubusercontent.com/steve-piece/vector-codebase/main/install-embeddings-workflow.sh | bash

Manual installation:

# Clone and run installer
git clone https://github.com/steve-piece/vector-codebase.git temp-vector-codebase
cd temp-vector-codebase
bash install-embeddings-workflow.sh --target ../your-project
cd ../your-project
rm -rf temp-vector-codebase

🗄️ Step 2: Setup Database

Copy and paste the entire vector-search-functions.sql file into your Supabase SQL Editor and run it.
💡 This includes: pgvector extension, table creation, RLS setup, and AI analysis functions.

⚙️ Step 3: Configure Environment

Edit .env with your credentials:

If you need help finding the credentials, please refer to env.txt
OPENAI_API_KEY="your-openai-api-key"
SUPABASE_URL="your-supabase-project-url"
SUPABASE_SECRET_KEY="your-supabase-service-role-key"

▶️ Step 4: Run Script

Execute the embedding generation:
node --env-file=.env embedding_workflow/ingest-embeddings.mjs
cd ---

🔄 Auto-sync with GitHub Actions (Optional)

The github-actions/ folder contains 5 pre-configured workflow variants for different package managers and triggers.

📋 Available Workflow Types:

Trigger-based:

  • npm-workflow.yml - Runs on every push to main (npm)
  • pnpm-workflow.yml - Runs on every push to main (pnpm)
  • yarn-workflow.yml - Runs on every push to main (yarn)
  • manual-workflow.yml - Manual trigger only (workflow_dispatch)
  • scheduled-workflow.yml - Daily at 2 AM UTC + manual trigger

🚀 Setup Your Workflow:

# Create workflows directory
mkdir -p .github/workflows

# Choose ONE workflow that matches your setup:

# For npm users:
mv embedding_workflow/github-actions/npm-workflow.yml .github/workflows/sync-embeddings.yml

# For pnpm users:
mv embedding_workflow/github-actions/pnpm-workflow.yml .github/workflows/sync-embeddings.yml

# For yarn users:
mv embedding_workflow/github-actions/yarn-workflow.yml .github/workflows/sync-embeddings.yml

# For manual-only runs:
mv embedding_workflow/github-actions/manual-workflow.yml .github/workflows/sync-embeddings.yml

# For scheduled daily runs:
mv embedding_workflow/github-actions/scheduled-workflow.yml .github/workflows/sync-embeddings.yml
⚡ Repository Secrets Required:
Add these secrets to your GitHub repository (Settings → Secrets and variables → Actions):
  • OPENAI_API_KEY
  • SUPABASE_URL
  • SUPABASE_SECRET_KEY

🤖 AI Coding Assistant Integration (Optional)

Transform any AI coding assistant into a context-aware developer that understands your codebase architecture, finds existing implementations, and maintains consistency across your project.

🎯 What this enables:

  • Smart code placement - AI knows where files belong based on your project structure
  • Duplicate prevention - AI finds existing similar functions before creating new ones
  • Pattern consistency - AI matches your existing code style and architecture
  • Context-aware suggestions - AI understands your tech stack and conventions

Setup Complete!

If you ran vector-search-functions.sql in Step 2, you already have the AI analysis functions set up! The database setup includes both the table creation and all 4 RPC functions.
💡 Performance Note: For small codebases (< 1000 files), no vector index is needed - PostgreSQL's sequential scan is actually faster! The index is only beneficial for large projects.

Add AI Agent Guidelines (Recommended)

Copy the agents.md file to your project root to provide AI assistants with complete codebase analysis instructions and SQL examples.
# Copy AI agent guidelines to your project (includes complete SQL examples)
cp embeddings_workflow/agents.md agents.md

Available AI Functions

  • get_codebase_overview() - Understand project scope and technologies
  • find_existing_implementations() - Avoid duplicate code
  • find_architecture_patterns() - Match existing code patterns
  • analyze_directory_patterns() - Follow project organization
💡 How it works: AI assistants call these functions before coding to understand your codebase architecture, find existing implementations, and maintain consistency with your project patterns.

About

Shadows your repo and evolves the embeddings alongside it. Use with the Supabase MCP to give agents in AI coding workflows the context necessary to operate efficiently without the bloat.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors