Skip to content

loloopes/MCP_server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trino MCP server

This folder contains a small Model Context Protocol (MCP) server that connects an LLM agent to Trino over the official Python client. The server is implemented in trino_mcp.py using FastMCP and exposes tools the model can call to discover catalogs, schemas, tables, column metadata, and to run SQL with bounded row limits.

What this gives an LLM agent

Trino is a query engine: it federates many data sources behind one SQL dialect and one JDBC/HTTP API. The MCP layer does not replace Trino; it wraps Trino so the agent gets a stable, tool-based interface (list_catalogs, describe_table, query, …) instead of raw connection strings or ad-hoc scripts.

Typical agent loop:

  1. Discover what exists (list_catalogslist_schemaslist_tablesdescribe_table).
  2. Plan a read-only SQL query using names and types from DESCRIBE.
  3. Execute with query and interpret the returned rows/columns as context for reasoning or summarization.

Row payloads are always returned as tabular JSON-friendly structures (columns + rows), which maps cleanly to how models consume tool results.

How Trino represents structured, semi-structured, and unstructured data

Trino’s strength is that one SQL surface can address very different physical formats. From the agent’s perspective, almost everything still arrives as named columns and rows (sometimes with nested or string-typed cells).

Kind of data What it usually means in Trino How the agent sees it via this MCP
Structured Relational tables (Iceberg/Hive/Delta, RDBMS connectors, etc.) with scalar columns describe_table returns a clear schema; query returns typed rows. Best case for reliable joins and aggregations.
Semi-structured ROW, MAP, ARRAY, JSON types; nested columns in Iceberg; key/value or document-like fields Still columns in the result set. Complex values may serialize as nested structures or strings depending on the client and types; the agent should read DESCRIBE output and sample with SELECT limits.
Unstructured or file-like Blobs, file paths, object storage references, log lines in VARCHAR, or specialized table functions (e.g. reading files through a connector) Often exposed as metadata + text/path columns or parsed fragments (e.g. a line, a JSON envelope, OCR text). The model reasons over those representations; Trino does not stream raw binary to the MCP—use SQL to extract the slice the agent needs (substr, JSON functions, regexp, etc.).

In short: Trino normalizes access; MCP transports the query outcome to the model as structured tool output. For semi-structured and unstructured sources, good SQL (projection, filtering, sampling) keeps responses within the row cap and improves answer quality.

Configuration

Create a .env next to trino_mcp.py (the server calls load_dotenv() on startup). Required and optional variables:

Variable Required Default Purpose
TRINO_HOST Yes Coordinator hostname
TRINO_USER Yes Username sent to Trino
TRINO_PASSWORD No empty If set, enables BasicAuthentication
TRINO_PORT No 8080 Coordinator port
TRINO_HTTP_SCHEME No https if password set, else http HTTP scheme
TRINO_VERIFY_SSL No true TLS verification (1/true/yes/on to verify)
TRINO_CATALOG No empty When list_tables is called with a schema that has no dot, the server prefixes TRINO_CATALOG.

Connection setup is in _get_connection() in trino_mcp.py: it uses trino.dbapi.connect with optional basic auth and configurable TLS verification.

Running the server

From this directory, with dependencies installed (see pyproject.toml in this folder for the broader course stack, including trino and mcp[cli]):

python trino_mcp.py

The process writes Starting Trino MCP... to stderr and runs the MCP transport (stdio) via FastMCP’s mcp.run().

Wiring into Cursor (example)

Add a server entry that launches this script with the same Python environment where mcp and trino are installed, and ensure TRINO_* variables are available (inline env or a loaded .env—depending on how you start the process).

Conceptually:

{
  "mcpServers": {
    "trino": {
      "command": "python",
      "args": ["C:/Users/you/project/mcp/trino_mcp.py"],
      "env": {
        "TRINO_HOST": "trino.example.com",
        "TRINO_USER": "agent_reader",
        "TRINO_PORT": "443",
        "TRINO_HTTP_SCHEME": "https",
        "TRINO_PASSWORD": "use-secret-manager-in-practice"
      }
    }
  }
}

Adjust paths and secrets to your environment; prefer read-only Trino users and secret storage over plaintext passwords in config files.

Tools exposed to the agent

All tools that run SQL share _fetch_query, which returns:

  • columns: list of column names
  • rows: up to max_rows rows (hard cap 1000 at the API level)
  • row_count, max_rows

max_rows must be between 1 and 1000 for query; other listing tools use up to 1000 rows internally.

Tool Role
query(sql, max_rows=100) Arbitrary Trino SQL read path; primary interface for structured/semi-structured extraction
list_catalogs() SHOW CATALOGS
list_schemas(catalog) SHOW SCHEMAS FROM "<catalog>" with safe identifier quoting
list_tables(schema) SHOW TABLES FROM "<catalog.schema>" (optional TRINO_CATALOG prefix when schema has no dot)
describe_table(table) DESCRIBE "<catalog.schema.table>" for column names/types

Identifiers are passed through _quote_identifier_path, which splits on . and emits double-quoted segments—reducing accidental injection when composing SHOW/DESCRIBE statements.

Operational and security notes

  • Prefer read-only credentials at Trino (system, catalog, or session-level) so query cannot mutate data unless you explicitly allow it.
  • Row caps limit accidental huge responses; agents should use LIMIT and selective projections.
  • PII and compliance: the model receives whatever columns your SQL returns. Narrow selects and masking views are usually better than wide SELECT *.
  • Performance: heavy scans still hit Trino workers; the MCP server only bounds returned rows, not cluster work—use partitions, predicates, and appropriate connectors.

Related files

  • trino_mcp.py — MCP server and Trino integration
  • pyproject.toml — Python dependencies for this course/module layout
  • mcp_client.py — example MCP client code (currently pointed at a different reference server path); adapt the pattern if you want automated tests against trino_mcp.py

About

this is a mcp server that exposes trino for LLM analytics agent to query upon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors