feat: add SeekDB MCP server with vector search capabilities (issue #34)#41
feat: add SeekDB MCP server with vector search capabilities (issue #34)#41YixinZ-NUS wants to merge 4 commits intosecond-state:mainfrom
Conversation
Implements Issue second-state#34: SeekDB Rust MCP Server with: - rmcp 0.14.0 from crates.io for MCP server - fastembed for client-side embeddings (all-MiniLM-L6-v2, 384 dims) - mysql_async for SeekDB connectivity - Three tools: search_collection, list_collections, collection_info - Local .gitignore for build artifacts
- README.md: concise usage guide, test instructions (--test-threads=1) - config.toml: example EchoKit MCP integration - .env.template: environment variable reference - Cargo.toml: add repository URL, rust-version, updated description - Documents alternative AI_EMBED() for DB-side embeddings
- Add Docker lifecycle commands (start, status, logs, stop) - Document MCP session management with notifications/initialized - Add step-by-step curl examples with actual responses - Clarify EchoKit config.toml location - Explain --test-threads=1 requirement for unit tests - Update all example outputs from live testing
Environment variables are already documented in README.md. The .env.template file is redundant since this implementation uses client-side embeddings without requiring API keys.
There was a problem hiding this comment.
Pull request overview
Implements a Rust-based MCP server for SeekDB that provides vector-search tools backed by client-side embeddings via fastembed, and wires it into the existing EchoKit MCP integration. The PR adds the SeekDB MCP binary/library, embedding service, DB/config helpers, example EchoKit config, and documentation plus lockfile.
Changes:
- Add
seekdb-mcp-servercrate with MCP tool implementations forcreate_collection,add_documents,search_collection,list_collections, andcollection_info, served over rmcp’s streamable HTTP transport. - Introduce a client-side
EmbeddingServiceusingfastembed(all-MiniLM-L6-v2, 384-dim) with unit tests validating embedding shape, normalization, and semantic similarity. - Provide SeekDB MCP-specific configuration (
config.toml), README usage guide (including curl-based MCP interaction walkthrough), and standard Rust project scaffolding (Cargo.toml,Cargo.lock,.gitignore).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
examples/mcp/seekdb/src/main.rs |
Defines the SeekDbServer MCP handler, all five tools (search, add, create, list, info), error mapping, and the axum-based HTTP server bootstrapping with rmcp’s StreamableHttpService. |
examples/mcp/seekdb/src/lib.rs |
Exposes the config, db, and embeddings modules for reuse by the binary and tests. |
examples/mcp/seekdb/src/embeddings.rs |
Implements EmbeddingService around fastembed for single/batch embedding, formatting for SQL, and includes unit tests confirming embedding dimension, normalization, semantic similarity, and SQL formatting. |
examples/mcp/seekdb/src/db.rs |
Adds a small helper to construct a mysql_async::Pool from ServerConfig for connecting to SeekDB via the MySQL protocol. |
examples/mcp/seekdb/src/config.rs |
Introduces ServerConfig loaded from environment variables (SEEKDB_*), including sensible defaults and required SEEKDB_DATABASE validation. |
examples/mcp/seekdb/config.toml |
Example EchoKit server configuration showing how to register the SeekDB MCP server and wire it into TTS/ASR/LLM, plus a system prompt for using the search tools (currently references some non-existent tool names). |
examples/mcp/seekdb/README.md |
Documents SeekDB MCP server setup (Dockerized SeekDB, building & running the server), MCP usage via curl (session initialization, tool calls), testing strategy, environment variables, architecture, and notes on client-side vs DB-side embeddings. |
examples/mcp/seekdb/Cargo.toml |
Declares the new seekdb-mcp-server crate (edition 2024), its binary target, and dependencies on rmcp, fastembed, mysql_async, axum, tokio, tracing, dotenvy, etc. |
examples/mcp/seekdb/Cargo.lock |
Lockfile capturing the full dependency graph for the new crate (including rmcp, fastembed/ORT, mysql_async, reqwest/ureq, rustls, etc.). |
examples/mcp/seekdb/.gitignore |
Ignores standard Rust build artifacts and the local .fastembed_cache/ directory used for model downloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let sql = format!( | ||
| r#"SELECT id, document, metadata, | ||
| COSINE_DISTANCE(embedding, '{}') as distance | ||
| FROM {} | ||
| ORDER BY distance ASC |
There was a problem hiding this comment.
collection_name from tool input is interpolated directly into the SQL string in the FROM {} clause (and into the log), which allows a malicious or malformed collection name to break the query or perform SQL injection. Please either validate collection names against a strict identifier pattern and/or quote+escape them as identifiers instead of concatenating raw user input into the query string.
| let sql = format!( | ||
| r#"INSERT INTO {} (id, document, embedding, metadata) | ||
| VALUES (?, ?, '{}', ?) | ||
| ON DUPLICATE KEY UPDATE | ||
| document = VALUES(document), |
There was a problem hiding this comment.
collection_name is concatenated directly into the INSERT INTO {} statement, so a crafted collection name could inject additional SQL or reference unintended tables. This should be hardened by validating the collection name and/or properly quoting and escaping it as an identifier rather than embedding raw user input in the SQL template.
| let sql = format!( | ||
| r#"CREATE TABLE IF NOT EXISTS {} ( | ||
| id VARCHAR(255) PRIMARY KEY, | ||
| document TEXT NOT NULL, | ||
| embedding VECTOR({}) NOT NULL, |
There was a problem hiding this comment.
collection_name is directly interpolated into the CREATE TABLE IF NOT EXISTS {} DDL, which makes it possible to inject arbitrary SQL via the tool argument or to fail on unusual table names. Consider enforcing an allowed-character whitelist for collection names and/or quoting+escaping them as identifiers instead of concatenating the raw string.
| // Get row count | ||
| let count_sql = format!("SELECT COUNT(*) FROM {}", collection_name); | ||
| let rows: Vec<Row> = conn.query(&count_sql).await.map_err(|e| { | ||
| error!("Failed to get row count: {}", e); | ||
| McpError::internal_error(format!("Query failed: {}", e), None) | ||
| })?; | ||
| let count: i64 = rows.first().and_then(|r| r.get(0)).unwrap_or(0); | ||
|
|
||
| // Get column info | ||
| let schema_sql = format!( | ||
| "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{}' AND TABLE_SCHEMA = DATABASE()", |
There was a problem hiding this comment.
In collection_info, collection_name is interpolated directly into both SELECT COUNT(*) FROM {} and the TABLE_NAME = '{}' filter, which allows SQL injection or malformed queries if a caller passes a crafted name. As with the other tools, this should instead validate and safely quote/escape the identifier, or otherwise ensure only valid collection names can reach these format strings.
| You are a helpful voice assistant with access to a knowledge base through the SeekDB search tools. | ||
|
|
||
| When users ask questions that might be answered by the knowledge base: | ||
| 1. Use the `query_collection` tool to search for relevant information | ||
| 2. Use the `hybrid_search` tool for complex queries that need both keyword and semantic matching |
There was a problem hiding this comment.
The system prompt mentions query_collection and hybrid_search tools, but this MCP server only exposes create_collection, add_documents, search_collection, list_collections, and collection_info. This mismatch can confuse clients and LLM behavior; please update the prompt text to reference the actual tool names and behaviors provided by this server.
Implements a Rust-based Model Context Protocol (MCP) server for SeekDB vector database, located at
examples/mcp/seekdb/. Built withrmcp 0.14.0and features client-side embeddings viafastembed(all-MiniLM-L6-v2model) to reduce the need for api keys and ensure consistency withpyseekdb.MCP Tools Provided:
create_collection: Create tables with 384-dim HNSW vector indexadd_documents: Insert documents with auto-generated embeddingssearch_collection: Perform vector similarity searchlist_collections: List all tables with vector indexescollection_info: Get schema, row count, and embedding infoIncludes usage guide under examples/mcp/seekdb/README.md and unit testing for validation.
Closes #34