openai · saito-oai · Nov 12, 2025 · Nov 12, 2025 · Nov 12, 2025 · Nov 12, 2025
diff --git a/README.md b/README.md
@@ -35,6 +35,7 @@ The MCP servers in this demo highlight how each tool can light up widgets by com
 - `pizzaz_server_node/` – MCP server implemented with the official TypeScript SDK.
 - `pizzaz_server_python/` – Python MCP server that returns the Pizzaz widgets.
 - `solar-system_server_python/` – Python MCP server for the 3D solar system widget.
+- `data_explorer_server_python/` – Python MCP server that powers the Data Explorer widget (CSV uploads, filters, charts).
 - `build-all.mts` – Vite build orchestrator that produces hashed bundles for every widget entrypoint.
 
 ## Prerequisites
@@ -73,7 +74,7 @@ pnpm run dev
 
 ## Serve the static assets
 
-All of the MCP servers expect the bundled HTML, JS, and CSS to be served from the local static file server. After every build, start the server before launching any MCP processes:
+All of the MCP servers (except the Data Explorer server) expect the bundled HTML, JS, and CSS to be served from the local static file server. After every build, start the server before launching any MCP processes:
 
 ```bash
 pnpm run serve
@@ -83,12 +84,15 @@ The assets are exposed at [`http://localhost:4444`](http://localhost:4444) with
 
 > **Note:** The Python Pizzaz server caches widget HTML with `functools.lru_cache`. If you rebuild or manually edit files in `assets/`, restart the MCP server so it picks up the updated markup.
 
+> **Note:** The Data Explorer server reads the built widget assets directly from the `assets/` directory, so you still need to run `pnpm run build` whenever you change the frontend, but you don't have to start `pnpm run serve` while that server is running.
+
 ## Run the MCP servers
 
 The repository ships several demo MCP servers that highlight different widget bundles:
 
 - **Pizzaz (Node & Python)** – pizza-inspired collection of tools and components
 - **Solar system (Python)** – 3D solar system viewer
+- **Data Explorer (Python)** – interactive CSV upload, profiling, preview, and charting
 
 ### Pizzaz Node server
 
@@ -117,6 +121,35 @@ uvicorn solar-system_server_python.main:app --port 8000
 
 You can reuse the same virtual environment for all Python servers—install the dependencies once and run whichever entry point you need.
 
+### Data Explorer Python server
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r data_explorer_server_python/requirements.txt
+pnpm run build
+uvicorn data_explorer_server_python.main:app --port 8001 --reload
+```
+
+This server accepts CSV uploads, profiles dataset metadata, exposes filtered preview tables, and generates chart-ready aggregates. Built assets are served directly by the MCP server, so rerun `pnpm run build` whenever you update the widget bundle.
+
+#### Data Explorer security configuration
+
+- `DATA_EXPLORER_ALLOWED_UPLOAD_ROOTS` (required for `filePath`/`fileUri` uploads) – os-path-separated list of directories (e.g., `/tmp:/Users/me/datasets`). Path-based uploads are disabled unless this allowlist is set.
+- `DATA_EXPLORER_AUTH_TOKEN` – when set, every HTTP request (including MCP transport) must send `Authorization: Bearer <token>`.
+- `DATA_EXPLORER_CORS_ALLOW_ORIGINS` – comma-delimited list of origins (e.g., `https://platform.openai.com,https://studio.openai.com`) that should receive CORS headers. CORS is disabled when this variable is unset.
+
+If you expose the server over the public internet, configure all three variables to avoid leaking local files or running an unauthenticated, cross-origin-accessible endpoint.
+
+For local development you can continue testing path uploads by pointing the allowlist at directories you control, for example:
+
+```bash
+export DATA_EXPLORER_ALLOWED_UPLOAD_ROOTS="$(pwd)/sample-data:/tmp"
+uvicorn data_explorer_server_python.main:app --port 8001 --reload
+```
+
+Inline (`csvText`) and chunked uploads do not require any of the security environment variables, so you can omit them when doing quick experiments.
+
 ## Testing in ChatGPT
 
 To add these apps to ChatGPT, enable [developer mode](https://platform.openai.com/docs/guides/developer-mode), and add your apps in Settings > Connectors.
@@ -143,7 +176,7 @@ You can then invoke tools by asking something related. For example, for the Pizz
 
 ## Next steps
 
-- Customize the widget data: edit the handlers in `pizzaz_server_node/src`, `pizzaz_server_python/main.py`, or the solar system server to fetch data from your systems.
+- Customize the widget data: edit the handlers in `pizzaz_server_node/src`, `pizzaz_server_python/main.py`, `solar-system_server_python`, or `data_explorer_server_python` to fetch data from your systems.
 - Create your own components and add them to the gallery: drop new entries into `src/` and they will be picked up automatically by the build script.
 
 ### Deploy your MCP server

diff --git a/data_explorer_server_python/README.md b/data_explorer_server_python/README.md
@@ -0,0 +1,64 @@
+## Data Explorer MCP Server
+
+This FastMCP server backs the Data Explorer demo widget. It accepts CSV uploads, profiles column metadata, serves preview rows with optional filters, and produces chart-ready aggregates.
+
+### Prerequisites
+
+- Python 3.10 or later
+- `uv` (recommended) or `pip`
+- Frontend assets built via `pnpm run build` (the server loads `assets/data-explorer-*.html`)
+
+### Setup
+
+```bash
+cd data_explorer_server_python
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+### Run
+
+```bash
+uvicorn data_explorer_server_python.main:app --port 8001 --reload
+```
+
+Once built, the server serves the widget's HTML, JS, and CSS directly over MCP resource requests,
+so you don't need to run a separate static asset host. Re-run `pnpm run build` whenever you update
+the frontend code to refresh the embedded assets.
+
+When developing & uploading files from your local environment, add allowlist directories you own before starting the server:
+
+```bash
+export DATA_EXPLORER_ALLOWED_UPLOAD_ROOTS="$(pwd)/sample-data:/tmp"
+uvicorn data_explorer_server_python.main:app --port 8001 --reload
+```
+
+If you only use inline (`csvText`) uploads, you can skip that variable.
+
+Interactive tooling (ChatGPT Apps SDK, `mcp-client`, etc.) can then call the following tools:
+
+- `data-explorer.open` – returns the widget template and recent dataset summaries.
+- `data-explorer.uploadInit` – begin a chunked upload session for large CSVs (returns an `uploadId`).
+- `data-explorer.uploadChunk` – append CSV text to a session; mark the final chunk with `isFinal=true` to trigger profiling.
+- `data-explorer.upload` – store and profile an uploaded CSV. Supply either `csvText` (inline
+  string data) or a `filePath`/`fileUri` pointing to a local file when the dataset is already on
+  disk. Path-based uploads require `DATA_EXPLORER_ALLOWED_UPLOAD_ROOTS` to include the directory
+  that holds the CSV.
+- `data-explorer.preview` – fetch filtered table rows with pagination.
+- `data-explorer.chart` – build datasets for bar, scatter, or histogram charts.
+
+Restart the server to clear in-memory datasets.
+
+### Security hardening
+
+The server ships with conservative defaults:
+
+- Path-based uploads are disabled until you set `DATA_EXPLORER_ALLOWED_UPLOAD_ROOTS` (use the OS
+  path separator, e.g., `:/` on Unix or `;` on Windows) to the directories that should be readable.
+- Set `DATA_EXPLORER_AUTH_TOKEN` to require every HTTP request to include `Authorization: Bearer <token>`.
+- Provide `DATA_EXPLORER_CORS_ALLOW_ORIGINS` (comma-separated list) to opt into CORS headers for
+  trusted origins. Leave it empty to block cross-origin callers.
+
+Always combine these knobs with your preferred network isolation (VPN, tunnel, etc.) before exposing
+the MCP server to the wider internet.
diff --git a/data_explorer_server_python/__init__.py b/data_explorer_server_python/__init__.py
@@ -0,0 +1 @@
+"""Data explorer MCP server package."""
diff --git a/data_explorer_server_python/charts.py b/data_explorer_server_python/charts.py
@@ -0,0 +1,140 @@
+from __future__ import annotations
+
+from typing import Dict, List
+
+import numpy as np
+import pandas as pd
+
+from .schemas import (
+    Aggregation,
+    BarChartSeries,
+    ChartConfig,
+    ChartResponse,
+    ChartType,
+    HistogramBin,
+    ScatterPoint,
+)
+from .utils import ensure_column_exists, to_python_value
+
+
+def _bar_chart(dataframe: pd.DataFrame, config: ChartConfig) -> List[Dict]:
+    ensure_column_exists(dataframe, config.x)
+    group_keys = [config.x]
+    if config.color:
+        ensure_column_exists(dataframe, config.color)
+        group_keys.append(config.color)
+
+    working = dataframe.dropna(subset=[config.x])
+
+    if config.aggregation == Aggregation.COUNT:
+        grouped = (
+            working.groupby(group_keys, dropna=False).size().reset_index(name="value")
+        )
+    else:
+        if config.y is None:
+            raise ValueError("Bar charts with sum/avg require `y` column.")
+        ensure_column_exists(dataframe, config.y)
+        numeric_y = pd.to_numeric(working[config.y], errors="coerce")
+        working = working.assign(**{config.y: numeric_y}).dropna(subset=[config.y])
+        grouped = working.groupby(group_keys, dropna=False)[config.y]
+        if config.aggregation == Aggregation.SUM:
+            grouped = grouped.sum().reset_index(name="value")
+        else:
+            grouped = grouped.mean().reset_index(name="value")
+
+    records: List[Dict] = []
+    for _, row in grouped.iterrows():
+        base_record = {
+            "category": to_python_value(row[config.x]),
+            "value": float(row["value"]) if row["value"] is not None else None,
+        }
+        if config.color:
+            base_record["color"] = to_python_value(row[config.color])
+        records.append(base_record)
+    return records
+
+
+def _scatter_points(
+    dataframe: pd.DataFrame, config: ChartConfig, limit: int = 500
+) -> List[Dict]:
+    if config.y is None:
+        raise ValueError("Scatter charts require `y` column.")
+
+    series_x = pd.to_numeric(ensure_column_exists(dataframe, config.x), errors="coerce")
+    series_y = pd.to_numeric(ensure_column_exists(dataframe, config.y), errors="coerce")
+
+    working = pd.DataFrame({config.x: series_x, config.y: series_y})
+    if config.color and config.color in dataframe.columns:
+        working[config.color] = dataframe[config.color]
+
+    working = working.dropna(subset=[config.x, config.y])
+    working = working.iloc[:limit]
+    working = working.sort_values(by=config.x)
+
+    points: List[Dict] = []
+    for _, row in working.iterrows():
+        point = {
+            "x": float(row[config.x]),
+            "y": float(row[config.y]),
+        }
+        if config.color and config.color in working.columns:
+            point["color"] = to_python_value(row[config.color])
+        points.append(point)
+    return points
+
+
+def _histogram_bins(dataframe: pd.DataFrame, config: ChartConfig) -> List[Dict]:
+    series = pd.to_numeric(ensure_column_exists(dataframe, config.x), errors="coerce")
+    numeric = series.dropna()
+    if numeric.empty:
+        return []
+
+    bin_count = config.bin_count or 10
+    counts, bin_edges = np.histogram(numeric, bins=bin_count)
+
+    bins: List[Dict] = []
+    for idx in range(len(counts)):
+        bins.append(
+            {
+                "binStart": float(bin_edges[idx]),
+                "binEnd": float(bin_edges[idx + 1]),
+                "count": int(counts[idx]),
+            }
+        )
+    return bins
+
+
+def build_chart_response(
+    dataframe: pd.DataFrame, config: ChartConfig, dataset_id: str
+) -> ChartResponse:
+    if config.chart_type == ChartType.BAR:
+        data = _bar_chart(dataframe, config)
+        series = [BarChartSeries(**item) for item in data]
+        return ChartResponse(
+            dataset_id=dataset_id,
+            chart_type=config.chart_type,
+            series=series,
+            config=config,
+        )
+
+    if config.chart_type == ChartType.SCATTER:
+        data = _scatter_points(dataframe, config)
+        points = [ScatterPoint(**item) for item in data]
+        return ChartResponse(
+            dataset_id=dataset_id,
+            chart_type=config.chart_type,
+            points=points,
+            config=config,
+        )
+
+    if config.chart_type == ChartType.HISTOGRAM:
+        data = _histogram_bins(dataframe, config)
+        bins = [HistogramBin(**item) for item in data]
+        return ChartResponse(
+            dataset_id=dataset_id,
+            chart_type=config.chart_type,
+            bins=bins,
+            config=config,
+        )
+
+    raise ValueError(f"Unsupported chart type: {config.chart_type}")
diff --git a/data_explorer_server_python/filters.py b/data_explorer_server_python/filters.py
@@ -0,0 +1,49 @@
+from __future__ import annotations
+
+from typing import Iterable
+
+import pandas as pd
+
+from .schemas import EqualsFilter, Filter, RangeFilter
+from .utils import coerce_value_for_series, ensure_column_exists
+
+
+def apply_filters(dataframe: pd.DataFrame, filters: Iterable[Filter]) -> pd.DataFrame:
+    filters_list = list(filters) if filters is not None else []
+    if not filters_list:
+        return dataframe
+
+    mask = pd.Series(True, index=dataframe.index)
+
+    for raw_filter in filters_list:
+        try:
+            series = ensure_column_exists(dataframe, raw_filter.column)
+        except KeyError:
+            # Ignore filters that reference non-existent columns.
+            continue
+
+        if raw_filter.type == "equals":
+            equals_filter = (
+                raw_filter
+                if isinstance(raw_filter, EqualsFilter)
+                else EqualsFilter.model_validate(raw_filter.model_dump())
+            )
+            value = coerce_value_for_series(series, equals_filter.value)
+            if value is None or pd.isna(value):
+                mask &= series.isna()
+            else:
+                mask &= series == value
+        elif raw_filter.type == "range":
+            range_filter = (
+                raw_filter
+                if isinstance(raw_filter, RangeFilter)
+                else RangeFilter.model_validate(raw_filter.model_dump())
+            )
+            if range_filter.min is not None:
+                min_value = coerce_value_for_series(series, range_filter.min)
+                mask &= series >= min_value
+            if range_filter.max is not None:
+                max_value = coerce_value_for_series(series, range_filter.max)
+                mask &= series <= max_value
+
+    return dataframe[mask]