Semantic search for static sites using libSQL/Turso with multi-provider embeddings.
Add AI-powered vector search to your Astro, Next.js, or any static site with minimal configuration. Index markdown content, generate embeddings locally or via API, and provide lightning-fast semantic search to your users.
- 🔍 Semantic Search - Find content by meaning, not just keywords
- 🌐 Multi-Provider Embeddings - Choose local (Xenova), Gemini, or OpenAI
- ⚡ Edge-Ready - Works with Turso's global edge database
- 📝 Markdown Support - Built-in gray-matter parsing
- 🎯 Type-Safe - Full TypeScript support
- 🆓 Free Tier Friendly - Local embeddings require no API keys
npm:
npm install libsql-search @libsql/client
pnpm:
pnpm add libsql-search @libsql/client
JSR:
deno add @logan/libsql-search
import { createClient } from '@libsql/client';
import { createTable } from 'libsql-search';
const client = createClient({
url: 'libsql://your-db.turso.io',
authToken: 'your-auth-token'
});
// Create the articles table with vector index
await createTable(client, 'articles', 768);
import { indexContent } from 'libsql-search';
const result = await indexContent({
client,
contentPath: './content',
embeddingOptions: {
provider: 'local', // or 'gemini', 'openai'
dimensions: 768
},
onProgress: (current, total, file) => {
console.log(`[${current}/${total}] Indexing: ${file}`);
}
});
console.log(`Indexed ${result.success}/${result.total} documents`);
import { search } from 'libsql-search';
const results = await search({
client,
query: 'how to deploy astro',
limit: 5,
embeddingOptions: {
provider: 'local'
}
});
results.forEach(result => {
console.log(`${result.title} (${result.distance})`);
});
Free, no API key required. Runs all-MiniLM-L6-v2
in Node.js using ONNX.
embeddingOptions: {
provider: 'local',
dimensions: 768 // 384 native, padded to 768
}
Pros:
- ✅ No API costs
- ✅ No rate limits
- ✅ Works offline
- ✅ Privacy-friendly
Cons:
⚠️ First run downloads model (~50MB)⚠️ Slower than API-based options⚠️ Lower quality than large models
Free tier: 1,500 requests/day. Uses text-embedding-004
model.
embeddingOptions: {
provider: 'gemini',
apiKey: process.env.GEMINI_API_KEY,
dimensions: 768 // native
}
Pros:
- ✅ Generous free tier
- ✅ High quality embeddings
- ✅ Fast
Cons:
⚠️ Requires API key⚠️ Rate limited
Paid only. Uses text-embedding-3-small
or text-embedding-3-large
.
embeddingOptions: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
dimensions: 1536 // or 3072 for large
}
Pros:
- ✅ Highest quality
- ✅ Very fast
- ✅ Configurable dimensions
Cons:
⚠️ Costs money ($0.02 per 1M tokens)⚠️ Requires API key
Index markdown files from a directory.
interface IndexerOptions {
client: Client; // libSQL client
contentPath: string; // Path to content directory
embeddingOptions?: EmbeddingOptions;
fileExtensions?: string[]; // Default: ['.md', '.markdown']
exclude?: string[]; // Default: ['node_modules', '.git']
tableName?: string; // Default: 'articles'
onProgress?: (current, total, file) => void;
}
Create the articles table with vector index.
Perform semantic search.
interface SearchOptions {
client: Client;
query: string;
limit?: number; // Default: 10
tableName?: string; // Default: 'articles'
embeddingOptions?: EmbeddingOptions;
}
Returns SearchResult[]
:
interface SearchResult {
id: number;
slug: string;
title: string;
content: string;
folder: string;
tags: string[];
distance: number; // Lower is better
created_at: string;
}
Get all articles (useful for building static pages).
Get a single article by slug.
Get all articles in a folder.
Get all unique folders.
Generate embeddings for arbitrary text.
interface EmbeddingOptions {
provider?: 'local' | 'gemini' | 'openai';
apiKey?: string;
dimensions?: number;
maxLength?: number; // Default: 8000
}
Combine multiple fields into embedding text.
const text = prepareTextForEmbedding({
title: 'My Article',
description: 'A description',
content: '# Content here',
tags: ['astro', 'turso']
});
Search API Endpoint (src/pages/api/search.json.ts
):
import type { APIRoute } from 'astro';
import { createClient } from '@libsql/client';
import { search } from 'libsql-search';
export const prerender = false;
const client = createClient({
url: import.meta.env.TURSO_DB_URL,
authToken: import.meta.env.TURSO_AUTH_TOKEN
});
export const POST: APIRoute = async ({ request }) => {
const { query, limit = 10 } = await request.json();
const results = await search({
client,
query,
limit,
embeddingOptions: { provider: 'local' }
});
return new Response(JSON.stringify({ results }), {
headers: { 'Content-Type': 'application/json' }
});
};
Static Page Generation (src/pages/[...slug].astro
):
---
import { createClient } from '@libsql/client';
import { getAllArticles, getArticleBySlug } from 'libsql-search';
export const prerender = true;
const client = createClient({
url: import.meta.env.TURSO_DB_URL,
authToken: import.meta.env.TURSO_AUTH_TOKEN
});
export async function getStaticPaths() {
const articles = await getAllArticles(client);
return articles.map(article => ({
params: { slug: article.slug }
}));
}
const { slug } = Astro.params;
const article = await getArticleBySlug(client, slug);
---
<article>
<h1>{article.title}</h1>
<div set:html={article.content} />
</article>
API Route (app/api/search/route.ts
):
import { createClient } from '@libsql/client';
import { search } from 'libsql-search';
import { NextRequest } from 'next/server';
const client = createClient({
url: process.env.TURSO_DB_URL!,
authToken: process.env.TURSO_AUTH_TOKEN!
});
export async function POST(request: NextRequest) {
const { query, limit = 10 } = await request.json();
const results = await search({
client,
query,
limit,
embeddingOptions: { provider: 'local' }
});
return Response.json({ results });
}
Static Generation (app/[slug]/page.tsx
):
import { createClient } from '@libsql/client';
import { getAllArticles, getArticleBySlug } from 'libsql-search';
const client = createClient({
url: process.env.TURSO_DB_URL!,
authToken: process.env.TURSO_AUTH_TOKEN!
});
export async function generateStaticParams() {
const articles = await getAllArticles(client);
return articles.map(article => ({
slug: article.slug
}));
}
export default async function Page({ params }: { params: { slug: string } }) {
const article = await getArticleBySlug(client, params.slug);
return (
<article>
<h1>{article.title}</h1>
<div dangerouslySetInnerHTML={{ __html: article.content }} />
</article>
);
}
- Use 768 dimensions for best compatibility
- Local model outputs 384, automatically padded to 768
- Gemini outputs 768 natively
- OpenAI supports custom dimensions
Create a script to re-index content:
{
"scripts": {
"index": "node scripts/index.js",
"build": "npm run index && astro build"
}
}
Improve search results:
- Include relevant fields in embedding text (title, description, tags)
- Truncate long content to avoid noise
- Use the same provider for indexing and search
- Experiment with distance thresholds (lower is better)
- Cache the embedding model (done automatically)
- Use edge databases (Turso) for low latency
- Implement search debouncing in the UI
- Limit result count to 5-10 for best UX
See the /examples
directory for complete implementations:
For a standalone indexing script:
// scripts/index.js
import { createClient } from '@libsql/client';
import { createTable, indexContent } from 'libsql-search';
const client = createClient({
url: process.env.TURSO_DB_URL,
authToken: process.env.TURSO_AUTH_TOKEN
});
await createTable(client);
const result = await indexContent({
client,
contentPath: './content',
embeddingOptions: {
provider: process.env.EMBEDDING_PROVIDER || 'local'
},
onProgress: (current, total, file) => {
console.log(`[${current}/${total}] ${file}`);
}
});
console.log(`✅ Indexed ${result.success} documents`);
Run with:
node --env-file=.env scripts/index.js
MIT
Contributions welcome! Please open an issue or PR on GitHub.
- Turso - Edge SQLite database
- libSQL - Open source SQLite fork
- Astro - Static site framework
- Transformers.js - ML models in JavaScript