Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 48 additions & 107 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,34 +45,36 @@ graph TB
- 🐍 Python 3.12+
- ⚑ [uv](https://docs.astral.sh/uv/getting-started/installation/) package manager

### πŸš€ Setup
### πŸš€ Setup & Testing
```bash
git clone https://github.com/DeepDiagnostix-AI/spark-history-server-mcp.git
cd spark-history-server-mcp
uv sync --frozen
uv run main.py

# Install Task (if not already installed)
brew install go-task # macOS, see https://taskfile.dev/installation/ for others

# Setup and start testing
task install # Install dependencies
task start-spark-bg # Start Spark History Server with sample data
task start-mcp-bg # Start MCP Server
task start-inspector-bg # Start MCP Inspector

# Opens http://localhost:6274 for interactive testing
# When done: task stop-all
```

### βš™οΈ Configuration
Edit `config.yaml`:
Edit `config.yaml` for your Spark History Server:
```yaml
servers:
local:
default: true # if server name is not provided in tool calls, this Spark History Server is used
default: true
url: "http://your-spark-history-server:18080"
auth: # optional
username: "user"
password: "pass"
```

### πŸ”¬ Testing with MCP Inspector
```bash
# Start MCP server with Inspector (opens browser automatically)
npx @modelcontextprotocol/inspector
```

**🌐 Test in Browser** - The MCP Inspector opens at http://localhost:6274 for interactive tool testing!

## πŸ“Έ Screenshots

### πŸ” Get Spark Application
Expand All @@ -81,44 +83,34 @@ npx @modelcontextprotocol/inspector
### ⚑ Job Performance Comparison
![Job Comparison](screenshots/job-compare.png)

## πŸ› οΈ Available Tools

### πŸ“Š Application & Job Analysis
| πŸ”§ Tool | πŸ“ Description |
|---------|----------------|
| `get_application` | Get detailed information about a specific Spark application |
| `get_jobs` | Get a list of all jobs for a Spark application |
| `get_slowest_jobs` | Get the N slowest jobs for a Spark application |

### ⚑ Stage & Task Analysis
| πŸ”§ Tool | πŸ“ Description |
|---------|----------------|
| `get_stages` | Get a list of all stages for a Spark application |
| `get_slowest_stages` | Get the N slowest stages for a Spark application |
| `get_stage` | Get information about a specific stage |
| `get_stage_task_summary` | Get task metrics summary for a specific stage |

### πŸ–₯️ Executor & Resource Analysis
| πŸ”§ Tool | πŸ“ Description |
|---------|----------------|
| `get_executors` | Get executor information for an application |
| `get_executor` | Get information about a specific executor |
| `get_executor_summary` | Get aggregated metrics across all executors |
| `get_resource_usage_timeline` | Get resource usage timeline for an application |
## πŸ› οΈ Available Tools

### πŸ” SQL & Performance Analysis
### Core Analysis Tools (All Integrations)
| πŸ”§ Tool | πŸ“ Description |
|---------|----------------|
| `get_slowest_sql_queries` | Get the top N slowest SQL queries for an application |
| `get_job_bottlenecks` | Identify performance bottlenecks in a Spark job |
| `get_environment` | Get comprehensive Spark runtime configuration |

### πŸ“ˆ Comparison Tools
| `get_application` | πŸ“Š Get detailed application information |
| `get_jobs` | πŸ”— List jobs within an application |
| `compare_job_performance` | πŸ“ˆ Compare performance between applications |
| `compare_sql_execution_plans` | πŸ”Ž Compare SQL execution plans |
| `get_job_bottlenecks` | 🚨 Identify performance bottlenecks |
| `get_slowest_jobs` | ⏱️ Find slowest jobs in application |

### Additional Tools (LlamaIndex/LangGraph HTTP Mode)
| πŸ”§ Tool | πŸ“ Description |
|---------|----------------|
| `compare_job_performance` | Compare performance metrics between two Spark jobs |
| `compare_job_environments` | Compare Spark environment configurations between two jobs |
| `compare_sql_execution_plans` | Compare SQL execution plans between two Spark jobs |
| `list_applications` | πŸ“‹ List Spark applications with filtering |
| `get_application_details` | πŸ“Š Get comprehensive application info |
| `get_stage_details` | ⚑ Analyze stage-level metrics |
| `get_task_details` | 🎯 Examine individual task performance |
| `get_executor_summary` | πŸ–₯️ Review executor utilization |
| `get_application_environment` | βš™οΈ Review Spark configuration |
| `get_storage_info` | πŸ’Ύ Analyze RDD storage usage |
| `get_sql_execution_details` | πŸ”Ž Deep dive into SQL queries |
| `get_resource_usage_timeline` | πŸ“ˆ Resource allocation over time |
| `compare_job_environments` | βš™οΈ Compare Spark configurations |
| `get_slowest_stages` | ⏱️ Find slowest stages |
| `get_task_metrics` | πŸ“Š Detailed task performance metrics |

## πŸš€ Production Deployment

Expand All @@ -139,69 +131,13 @@ helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/ \

πŸ“š See [`deploy/kubernetes/helm/`](deploy/kubernetes/helm/) for complete deployment manifests and configuration options.

## πŸ§ͺ Testing & Development

### πŸ”¬ Local Development

#### πŸ“‹ Prerequisites
- Install [Task](https://taskfile.dev/installation/) for running development commands:
```bash
# macOS
brew install go-task

# Other platforms - see https://taskfile.dev/installation/
```

*Note: uv will be automatically installed when you run `task install`*

#### πŸš€ Development Commands

**Quick Setup:**
```bash
# πŸ“¦ Install dependencies and setup pre-commit hooks
task install
task pre-commit-install

# πŸš€ Start services one by one (all in background)
task start-spark-bg # Start Spark History Server
task start-mcp-bg # Start MCP Server
task start-inspector-bg # Start MCP Inspector

# 🌐 Then open http://localhost:6274 in your browser

# πŸ›‘ When done, stop all services
task stop-all
```

**Essential Commands:**
```bash

# πŸ›‘ Stop all background services
task stop-all

# πŸ§ͺ Run tests and checks
task test # Run pytest
task lint # Check code style
task pre-commit # Run all pre-commit hooks
task validate # Run lint + tests

# πŸ”§ Development utilities
task format # Auto-format code
task clean # Clean build artifacts
```

*For complete command reference, see `Taskfile.yml`*

### πŸ“Š Sample Data
## πŸ“Š Sample Data
The repository includes real Spark event logs for testing:
- `spark-bcec39f6201b42b9925124595baad260` - βœ… Successful ETL job
- `spark-110be3a8424d4a2789cb88134418217b` - πŸ”„ Data processing job
- `spark-cc4d115f011443d787f03a71a476a745` - πŸ“ˆ Multi-stage analytics job

They are available in the [`examples/basic/events`](examples/basic/events) directory.
The [`start_local_spark_history.sh`](start_local_spark_history.sh) script automatically makes them available for local testing.

πŸ“– **Complete testing guide**: **[TESTING.md](TESTING.md)**
πŸ“– **Advanced testing**: **[TESTING.md](TESTING.md)**

## βš™οΈ Configuration

Expand Down Expand Up @@ -229,12 +165,17 @@ MCP_DEBUG=false

## πŸ€– AI Agent Integration

For production AI agent integration, see [`examples/integrations/`](examples/integrations/):
### Quick Start Options

- πŸ¦™ [LlamaIndex](examples/integrations/llamaindex.md) - Vector indexing and search
- πŸ”— [LangGraph](examples/integrations/langgraph.md) - Multi-agent workflows
| Integration | Transport | Entry Point | Best For |
|-------------|-----------|-------------|----------|
| **[Local Testing](TESTING.md)** | HTTP | `main.py` | Development, testing tools |
| **[Claude Desktop](examples/integrations/claude-desktop/)** | STDIO | `main_stdio.py` | Interactive analysis |
| **[Amazon Q CLI](examples/integrations/amazon-q-cli/)** | STDIO | `main_stdio.py` | Command-line automation |
| **[LlamaIndex](examples/integrations/llamaindex.md)** | HTTP | `main.py` | Knowledge systems, RAG |
| **[LangGraph](examples/integrations/langgraph.md)** | HTTP | `main.py` | Multi-agent workflows |

πŸ§ͺ **For local testing and development, use [TESTING.md](TESTING.md) with MCP Inspector.**
**Note**: Claude Desktop and Amazon Q CLI use STDIO transport with 6 core tools. LlamaIndex/LangGraph use HTTP transport with 18 comprehensive tools.

## 🎯 Example Use Cases

Expand Down
103 changes: 30 additions & 73 deletions TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,44 @@

## πŸ§ͺ Quick Test with MCP Inspector (5 minutes)

**Use this for**: Local development, testing tools, understanding capabilities

### Prerequisites
- Docker must be running (for Spark History Server)
- Node.js installed (for MCP Inspector)
- Python 3.12+ with uv package manager
- Run commands from project root directory

### Setup (2 terminals)
### Setup Repository
```bash
git clone https://github.com/DeepDiagnostix-AI/spark-history-server-mcp.git
cd spark-history-server-mcp

# Install Task (if not already installed)
brew install go-task # macOS
# or see https://taskfile.dev/installation/ for other platforms

# Setup dependencies
task install
```

### Start Testing

```bash
# Terminal 1: Start Spark History Server with sample data
./start_local_spark_history.sh
# One-command setup (recommended)
task start-spark-bg && task start-mcp-bg && task start-inspector-bg

# Terminal 2: Start MCP server with Inspector
npx @modelcontextprotocol/inspector uv run main.py
# This will open http://localhost:6274 in your browser
# Opens http://localhost:6274 automatically in your browser
# When done: task stop-all
```

### Alternative: Start MCP Server Separately
**Alternative** (if you prefer manual control):
```bash
# Terminal 1: Start Spark History Server
./start_local_spark_history.sh

# Terminal 2: Start MCP Server
uv run main.py
task start-spark

# Terminal 3: Start MCP Inspector (connects to existing MCP server)
DANGEROUSLY_OMIT_AUTH=true npx @modelcontextprotocol/inspector
# Terminal 2: Start MCP server with Inspector
task start-inspector
```

#### Expected Output from Terminal 1:
Expand All @@ -43,9 +55,11 @@ DANGEROUSLY_OMIT_AUTH=true npx @modelcontextprotocol/inspector

### Test Applications Available
Your 3 real Spark applications (all successful):
- `spark-bcec39f6201b42b9925124595baad260`
- `spark-110be3a8424d4a2789cb88134418217b`
- `spark-cc4d115f011443d787f03a71a476a745`
- `spark-bcec39f6201b42b9925124595baad260` - ETL job (104K events)
- `spark-110be3a8424d4a2789cb88134418217b` - Data processing job (512K events)
- `spark-cc4d115f011443d787f03a71a476a745` - Multi-stage analytics job (704K events)

**Note**: Testing uses HTTP transport with `main.py` providing access to all 18 tools.

## 🌐 Using MCP Inspector

Expand Down Expand Up @@ -75,63 +89,6 @@ Once the MCP Inspector opens in your browser (http://localhost:6274), you can:
- `spark_id2` = `spark-110be3a8424d4a2789cb88134418217b`
- **Expected**: Performance comparison metrics

## πŸ”¬ Detailed Test Cases

### 1. **Basic Connectivity**
```json
Tool: list_applications
Expected: 3 applications returned
```

### 2. **Job Environment Comparison**
```json
Tool: compare_job_environments
Parameters: {
"spark_id1": "spark-bcec39f6201b42b9925124595baad260",
"spark_id2": "spark-110be3a8424d4a2789cb88134418217b"
}
Expected: Configuration differences including:
- Runtime comparison (Java/Scala versions)
- Spark property differences
- System property differences
```

### 3. **Performance Comparison**
```json
Tool: compare_job_performance
Parameters: {
"spark_id1": "spark-bcec39f6201b42b9925124595baad260",
"spark_id2": "spark-cc4d115f011443d787f03a71a476a745"
}
Expected: Performance metrics including:
- Resource allocation comparison
- Executor metrics comparison
- Job performance ratios
```

### 4. **Bottleneck Analysis**
```json
Tool: get_job_bottlenecks
Parameters: {
"spark_id": "spark-cc4d115f011443d787f03a71a476a745"
}
Expected: Performance analysis with:
- Slowest stages identification
- Resource bottlenecks
- Optimization recommendations
```

### 5. **Resource Timeline**
```json
Tool: get_resource_usage_timeline
Parameters: {
"spark_id": "spark-bcec39f6201b42b9925124595baad260"
}
Expected: Timeline showing:
- Executor addition/removal events
- Stage execution timeline
- Resource utilization over time
```

## βœ… Success Criteria

Expand Down
4 changes: 2 additions & 2 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ tasks:
pre-commit:
desc: Run all pre-commit checks
cmds:
- pre-commit run --all-files
- uv run pre-commit run --all-files
- echo "βœ… Pre-commit checks completed!"

pre-commit-install:
desc: Install pre-commit hooks
cmds:
- pre-commit install
- uv run pre-commit install
- echo "βœ… Pre-commit hooks installed!"

clean:
Expand Down
Loading