semanticsql

A TypeScript library that builds semantic layers through conversation and provides safe database access via Claude's tool_use API.

Instead of letting AI write raw SQL (error-prone, insecure), semanticsql creates a security boundary where AI operates through structured JSON operations that get translated to parameterized SQL. For complex queries, a validated raw SQL escape hatch is available.

The key differentiator: conversational schema building. The AI asks hypothesis-driven questions about your database ("Does amt represent the total in USD including tax?") to build a semantic layer that dramatically improves query accuracy.

Why

Problem	How semanticsql solves it
LLMs write incorrect SQL (wrong tables/columns)	Semantic layer with business definitions, validated against real schema
SQL injection via prompt injection	JSON operations = allowlist security boundary. No DROP, ALTER, or arbitrary SQL
LLMs don't understand business context	Conversational schema builder captures enum meanings, business rules, relationships
Schema too large for LLM context	Schema linking selects only relevant tables
Multi-dialect SQL differences	AST + dialect compiler pattern (Postgres, MySQL, SQLite)

Research basis

This design is informed by Text-to-SQL research (Spider/BIRD benchmarks, DIN-SQL, DAIL-SQL), LLM tool-use papers (Toolformer, Gorilla), and analysis of existing products (Vanna.ai, WrenAI, Dataherald). Key finding: semantic layers improve LLM-to-SQL accuracy from ~50% to ~90%.

Quick start

npm install semanticsql

import { DataLayer } from 'semanticsql'
import Anthropic from '@anthropic-ai/sdk'

// 1. Connect to your database
const layer = new DataLayer({
  dialect: 'sqlite',       // or 'postgres' | 'mysql'
  connectionString: './myapp.db',
})

// 2. Build semantic layer (one-time, interactive)
await layer.buildSchema({
  onQuestion: async (question) => {
    // Route to your UI, CLI prompt, or Claude itself
    return await askUser(question)
  },
})
layer.saveSchema('./semantic-layer.json')

// 3. Use with Claude
const client = new Anthropic()
const tools = layer.getTools()
const systemPrompt = layer.getSystemContext()

const response = await client.messages.create({
  model: 'claude-opus-4-6',
  max_tokens: 16000,
  system: systemPrompt,
  tools,
  messages: [{ role: 'user', content: 'Show me top customers by revenue' }],
})

// 4. Execute Claude's tool calls
for (const block of response.content) {
  if (block.type === 'tool_use') {
    const result = await layer.execute(block.name, block.input)
    console.log(result.rows)    // query results
    console.log(result.sql)     // the SQL that ran (transparency)
  }
}

How it works

1. Schema builder (conversational)

Point it at a database and it introspects tables, columns, foreign keys, and sample rows. Then it asks hypothesis-driven questions:

Q: The 'status' column in 'orders' has values: 'P', 'S', 'D', 'C'.
   Do these stand for Pending, Shipped, Delivered, Cancelled?
A: Yes

Q: The 'orders' table has a nullable 'deleted_at' column.
   Should queries exclude rows where this is not null?
A: Yes, soft delete - always exclude unless analyzing deleted records

This produces a semantic-layer.json with business definitions, enum mappings, relationships, business rules, and example queries.

2. Two tool modes

db_query - Structured JSON operations (80% of queries). This is the security allowlist.

{
  "operation": "select",
  "table": "orders",
  "fields": ["total"],
  "joins": [{ "table": "customers", "on": { "orders.customer_id": "customers.id" }, "fields": ["name"] }],
  "where": { "status": { "$ne": "C" } },
  "groupBy": ["customers.name"],
  "aggregate": { "revenue": { "$sum": "total" } },
  "orderBy": [{ "field": "revenue", "direction": "desc" }],
  "limit": 10
}

Only tables/columns in your semantic layer are accepted. Values are always parameterized. No DDL possible.

db_sql - Raw SQL for complex analytics (CTEs, window functions, subqueries). Validated before execution: no DDL, no multi-statement injection, row limits enforced.

3. What Claude sees

The library generates a system prompt with the optimal format for LLM accuracy (based on Text-to-SQL research):

-- CREATE TABLE DDL with inline business context
CREATE TABLE orders (
  id INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL,   -- The buyer (FK to customers.id)
  total DECIMAL(10,2),            -- Order total in USD including tax
  status VARCHAR(1),              -- P=Pending, S=Shipped, D=Delivered, C=Cancelled
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);
-- Sample rows: (1, 42, 259.99, 'S')
-- Business rule: Exclude rows where deleted_at IS NOT NULL

-- Q: "Top 10 customers by revenue"
-- SQL: SELECT c.name, SUM(o.total) ...

Security model

User Input → LLM → JSON Query (constrained grammar)
                        ↓
                 Schema Validation (only known tables/columns)
                        ↓
                 Parameterized SQL Generation (never string-interpolated)
                        ↓
                 Read-only DB Connection (row limits, timeouts)

The JSON layer is an allowlist, not a blocklist. There is no syntax for DROP TABLE because the grammar doesn't include it. Each layer is independently sufficient to prevent data destruction.

API

class DataLayer {
  constructor(config: {
    dialect: 'postgres' | 'mysql' | 'sqlite'
    connectionString: string
    readOnly?: boolean         // default: true
    rowLimit?: number          // default: 1000
    queryTimeout?: number      // default: 30000ms
    allowMutations?: boolean   // default: false (SELECT only)
  })

  // Schema
  introspect(): RawSchema
  buildSchema(opts: { onQuestion: (q: string) => Promise<string> }): Promise<SemanticLayer>
  loadSchema(path: string): void
  saveSchema(path: string): void

  // Claude integration
  getTools(): Tool[]              // Claude tool_use definitions
  getSystemContext(): string      // DDL-based system prompt

  // Execution
  execute(toolName: string, input: unknown): QueryResult

  close(): void
}

Filter operators

MongoDB-style operators in the where clause:

Operator	SQL	Example
`$eq`	`= ?`	`{ "status": "active" }`
`$ne`	`!= ?`	`{ "status": { "$ne": "cancelled" } }`
`$gt` / `$gte`	`> ?` / `>= ?`	`{ "age": { "$gt": 25 } }`
`$lt` / `$lte`	`< ?` / `<= ?`	`{ "price": { "$lt": 100 } }`
`$in` / `$nin`	`IN (?)` / `NOT IN (?)`	`{ "id": { "$in": [1, 2, 3] } }`
`$like` / `$ilike`	`LIKE ?` / `ILIKE ?`	`{ "name": { "$ilike": "%alice%" } }`
`$is_null`	`IS NULL`	`{ "deleted_at": null }`
`$or` / `$and`	`OR` / `AND`	`{ "$or": [{ "a": 1 }, { "b": 2 }] }`

Multi-dialect support

SQL generation uses an AST + dialect compiler pattern:

Feature	PostgreSQL	MySQL	SQLite
Params	`$1, $2`	`?`	`?`
Quoting	`"name"`	`name`	`"name"`
ILIKE	Native	`LOWER() LIKE LOWER()`	`LIKE`
UPSERT	`ON CONFLICT`	`ON DUPLICATE KEY`	`ON CONFLICT`

Demo

npx tsx demo.ts

Creates a sample e-commerce database and runs through the full pipeline: introspection, conversational schema building, tool generation, structured queries, CTE queries, and security boundary testing.

Testing

npm test          # 268 tests
npm run test:watch  # watch mode

Contributing

Contributions require a pull request. Direct pushes to main are not allowed.

Fork the repo
Create a feature branch: git checkout -b feat/my-feature
Make your changes and add tests
Ensure all tests pass: npm test
Commit with a clear message
Open a PR against main

All PRs require passing tests before merge.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
test		test
.gitignore		.gitignore
README.md		README.md
demo.ts		demo.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semanticsql

Why

Research basis

Quick start

How it works

1. Schema builder (conversational)

2. Two tool modes

3. What Claude sees

Security model

API

Filter operators

Multi-dialect support

Demo

Testing

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

semanticsql

Why

Research basis

Quick start

How it works

1. Schema builder (conversational)

2. Two tool modes

3. What Claude sees

Security model

API

Filter operators

Multi-dialect support

Demo

Testing

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages