This folder contains a small Model Context Protocol (MCP) server that connects an LLM agent to Trino over the official Python client. The server is implemented in trino_mcp.py using FastMCP and exposes tools the model can call to discover catalogs, schemas, tables, column metadata, and to run SQL with bounded row limits.
Trino is a query engine: it federates many data sources behind one SQL dialect and one JDBC/HTTP API. The MCP layer does not replace Trino; it wraps Trino so the agent gets a stable, tool-based interface (list_catalogs, describe_table, query, …) instead of raw connection strings or ad-hoc scripts.
Typical agent loop:
- Discover what exists (
list_catalogs→list_schemas→list_tables→describe_table). - Plan a read-only SQL query using names and types from
DESCRIBE. - Execute with
queryand interpret the returned rows/columns as context for reasoning or summarization.
Row payloads are always returned as tabular JSON-friendly structures (columns + rows), which maps cleanly to how models consume tool results.
Trino’s strength is that one SQL surface can address very different physical formats. From the agent’s perspective, almost everything still arrives as named columns and rows (sometimes with nested or string-typed cells).
| Kind of data | What it usually means in Trino | How the agent sees it via this MCP |
|---|---|---|
| Structured | Relational tables (Iceberg/Hive/Delta, RDBMS connectors, etc.) with scalar columns | describe_table returns a clear schema; query returns typed rows. Best case for reliable joins and aggregations. |
| Semi-structured | ROW, MAP, ARRAY, JSON types; nested columns in Iceberg; key/value or document-like fields |
Still columns in the result set. Complex values may serialize as nested structures or strings depending on the client and types; the agent should read DESCRIBE output and sample with SELECT limits. |
| Unstructured or file-like | Blobs, file paths, object storage references, log lines in VARCHAR, or specialized table functions (e.g. reading files through a connector) |
Often exposed as metadata + text/path columns or parsed fragments (e.g. a line, a JSON envelope, OCR text). The model reasons over those representations; Trino does not stream raw binary to the MCP—use SQL to extract the slice the agent needs (substr, JSON functions, regexp, etc.). |
In short: Trino normalizes access; MCP transports the query outcome to the model as structured tool output. For semi-structured and unstructured sources, good SQL (projection, filtering, sampling) keeps responses within the row cap and improves answer quality.
Create a .env next to trino_mcp.py (the server calls load_dotenv() on startup). Required and optional variables:
| Variable | Required | Default | Purpose |
|---|---|---|---|
TRINO_HOST |
Yes | — | Coordinator hostname |
TRINO_USER |
Yes | — | Username sent to Trino |
TRINO_PASSWORD |
No | empty | If set, enables BasicAuthentication |
TRINO_PORT |
No | 8080 |
Coordinator port |
TRINO_HTTP_SCHEME |
No | https if password set, else http |
HTTP scheme |
TRINO_VERIFY_SSL |
No | true |
TLS verification (1/true/yes/on to verify) |
TRINO_CATALOG |
No | empty | When list_tables is called with a schema that has no dot, the server prefixes TRINO_CATALOG. |
Connection setup is in _get_connection() in trino_mcp.py: it uses trino.dbapi.connect with optional basic auth and configurable TLS verification.
From this directory, with dependencies installed (see pyproject.toml in this folder for the broader course stack, including trino and mcp[cli]):
python trino_mcp.pyThe process writes Starting Trino MCP... to stderr and runs the MCP transport (stdio) via FastMCP’s mcp.run().
Add a server entry that launches this script with the same Python environment where mcp and trino are installed, and ensure TRINO_* variables are available (inline env or a loaded .env—depending on how you start the process).
Conceptually:
{
"mcpServers": {
"trino": {
"command": "python",
"args": ["C:/Users/you/project/mcp/trino_mcp.py"],
"env": {
"TRINO_HOST": "trino.example.com",
"TRINO_USER": "agent_reader",
"TRINO_PORT": "443",
"TRINO_HTTP_SCHEME": "https",
"TRINO_PASSWORD": "use-secret-manager-in-practice"
}
}
}
}Adjust paths and secrets to your environment; prefer read-only Trino users and secret storage over plaintext passwords in config files.
All tools that run SQL share _fetch_query, which returns:
columns: list of column namesrows: up tomax_rowsrows (hard cap 1000 at the API level)row_count,max_rows
max_rows must be between 1 and 1000 for query; other listing tools use up to 1000 rows internally.
| Tool | Role |
|---|---|
query(sql, max_rows=100) |
Arbitrary Trino SQL read path; primary interface for structured/semi-structured extraction |
list_catalogs() |
SHOW CATALOGS |
list_schemas(catalog) |
SHOW SCHEMAS FROM "<catalog>" with safe identifier quoting |
list_tables(schema) |
SHOW TABLES FROM "<catalog.schema>" (optional TRINO_CATALOG prefix when schema has no dot) |
describe_table(table) |
DESCRIBE "<catalog.schema.table>" for column names/types |
Identifiers are passed through _quote_identifier_path, which splits on . and emits double-quoted segments—reducing accidental injection when composing SHOW/DESCRIBE statements.
- Prefer read-only credentials at Trino (system, catalog, or session-level) so
querycannot mutate data unless you explicitly allow it. - Row caps limit accidental huge responses; agents should use
LIMITand selective projections. - PII and compliance: the model receives whatever columns your SQL returns. Narrow selects and masking views are usually better than wide
SELECT *. - Performance: heavy scans still hit Trino workers; the MCP server only bounds returned rows, not cluster work—use partitions, predicates, and appropriate connectors.
trino_mcp.py— MCP server and Trino integrationpyproject.toml— Python dependencies for this course/module layoutmcp_client.py— example MCP client code (currently pointed at a different reference server path); adapt the pattern if you want automated tests againsttrino_mcp.py