Skip to content

bierbios/mcpserver

Repository files navigation

MCP Server ELK (Read-Only, Production-Oriented)

MCP Server berbasis Python 3.12 + FastAPI untuk analisa ELK Stack secara aman (read-only) dan kompatibel OpenClaw.

1) Arsitektur Mermaid

flowchart TB
    OC[OpenClaw] -->|X-API-Key| API[FastAPI MCP Endpoint]
    API --> SEC[Security Layer\nAPI Key Auth + RBAC + Rate Limit]
    SEC --> REG[Tool Registry\nDiscovery + Execute + Schema Validation]
    REG --> TOOLS[MCP Tools\nELK Cluster/Logs/Kibana/Logstash/Filebeat/APM/Recommendation]
    TOOLS --> CTRL[Controllers]
    CTRL --> SRV[Services\nBusiness Logic]
    SRV --> REPO[Repositories\nRead-Only Data Access]
    REPO --> ES[(Elasticsearch)]
    REPO --> KB[(Kibana API)]
    REPO --> LS[(Logstash Monitoring API)]

    API --> AUDIT[Structured JSON Audit Log]
    API --> METRICS[Prometheus Metrics /metrics]

    classDef safe fill:#e7f7ef,stroke:#1f8f5f,stroke-width:1px;
    class SEC,REG,TOOLS,AUDIT,METRICS safe;
Loading

2) Struktur Folder

mcpserver-elk/
├── app/
│   ├── main.py
│   ├── api/routes/
│   │   ├── health_controller.py
│   │   ├── mcp_controller.py
│   │   └── metrics_controller.py
│   ├── core/
│   │   ├── config.py
│   │   ├── exceptions.py
│   │   ├── logging.py
│   │   ├── masking.py
│   │   ├── metrics.py
│   │   ├── rate_limit.py
│   │   └── security.py
│   ├── mcp/
│   │   ├── registry.py
│   │   ├── schemas.py
│   │   ├── server.py
│   │   └── tool_base.py
│   ├── models/
│   │   ├── common_models.py
│   │   ├── elk_models.py
│   │   ├── mcp_models.py
│   │   └── schemas.py
│   ├── controllers/
│   │   ├── elk_controller.py
│   │   └── recommendation_controller.py
│   ├── services/
│   │   ├── apm_service.py
│   │   ├── elk_cluster_service.py
│   │   ├── elk_logs_service.py
│   │   ├── filebeat_service.py
│   │   ├── kibana_service.py
│   │   ├── logstash_service.py
│   │   └── recommendation_service.py
│   ├── repositories/
│   │   ├── apm_repository.py
│   │   ├── elasticsearch_repository.py
│   │   ├── kibana_repository.py
│   │   └── logstash_repository.py
│   ├── clients/
│   │   ├── elasticsearch_client.py
│   │   └── http_client.py
│   ├── tools/
│   │   ├── elk_apm.py
│   │   ├── elk_cluster.py
│   │   ├── elk_cluster_tools.py
│   │   ├── elk_filebeat.py
│   │   ├── elk_kibana.py
│   │   ├── elk_logs.py
│   │   ├── elk_logs_tools.py
│   │   ├── elk_logstash.py
│   │   ├── filebeat_tools.py
│   │   ├── kibana_tools.py
│   │   ├── logstash_tools.py
│   │   ├── recommendation.py
│   │   └── recommendation_tools.py
│   └── utils/
│       ├── query_builder.py
│       ├── response_limiter.py
│       └── time_range.py
├── tests/
│   ├── conftest.py
│   ├── integration/
│   └── unit/
├── k8s/
│   ├── configmap.yaml
│   ├── deployment.yaml
│   ├── hpa.yaml
│   ├── ingress.yaml
│   ├── namespace.yaml
│   ├── networkpolicy.yaml
│   ├── pdb.yaml
│   ├── secret.yaml
│   ├── service.yaml
│   └── serviceaccount.yaml
├── .env.example
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
├── requirements.txt
└── README.md

3) Fitur Security & Safety

  • Read-only by default, tidak ada endpoint write/delete/restart.
  • API key auth via X-API-Key.
  • RBAC per tool (elk_viewer, elk_operator, elk_admin_readonly).
  • Allowlist index pattern (ALLOWED_INDEX_PATTERNS).
  • Denylist dangerous query (script, painless, delete_by_query, dsb).
  • Timeout + retry terbatas untuk ES/HTTP API.
  • Rate limit per API key.
  • Audit log sebelum/sesudah eksekusi tool.
  • Structured JSON logging.
  • Masking secret (password/token/api_key/authorization/cookie).
  • Response size limiter (MAX_RESPONSE_BYTES).
  • TLS verification aktif default.

4) Endpoint

  • GET /healthz
  • GET /readyz
  • GET /metrics
  • GET /metrics/json
  • GET /mcp/tools
  • POST /mcp/execute
  • POST /mcp (JSON-RPC)

5) Tools MCP Wajib

  • elk_cluster_health
  • elk_nodes_stats
  • elk_indices_summary
  • elk_search_logs
  • elk_detect_errors
  • elk_logstash_health
  • elk_filebeat_status
  • elk_kibana_status
  • elk_apm_summary
  • elk_recommend_fix

6) Konfigurasi Environment

Gunakan file .env.example:

cp .env.example .env

7) Jalankan Lokal

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

8) Docker

Build & run:

docker build -t mcpserver-elk:1.0.0 .
docker run --rm -p 8080:8080 --env-file .env mcpserver-elk:1.0.0

Docker Compose lab (dengan sample ELK):

docker compose --profile lab up -d --build

9) Kubernetes Deploy

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/serviceaccount.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/pdb.yaml

10) OpenClaw Integration

Contoh konfigurasi OpenClaw (contoh JSON):

{
  "mcpServers": [
    {
      "name": "elk-prod-readonly",
      "url": "https://mcp-elk.example.com/mcp",
      "headers": {
        "X-API-Key": "ops-key"
      },
      "timeoutSeconds": 30,
      "tools": [
        "elk_cluster_health",
        "elk_nodes_stats",
        "elk_indices_summary",
        "elk_search_logs",
        "elk_detect_errors",
        "elk_logstash_health",
        "elk_filebeat_status",
        "elk_kibana_status",
        "elk_apm_summary",
        "elk_recommend_fix"
      ]
    }
  ]
}

Contoh prompt OpenClaw:

Gunakan MCP Server ELK untuk cek cluster health Elasticsearch, cari error log service payment-service dalam 1 jam terakhir, kelompokkan error terbanyak, analisa root cause, dan berikan rekomendasi perbaikan yang aman.

11) Contoh Request/Response MCP

List tools:

curl -sS -H "X-API-Key: dev-key" http://localhost:8080/mcp/tools | jq

Execute elk_cluster_health:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_cluster_health","input":{"include_shards":true}}' | jq

Execute elk_nodes_stats:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_nodes_stats","input":{"include_thread_pool":false}}' | jq

Execute elk_indices_summary:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_indices_summary","input":{"index_pattern":"logs-*","sort_by":"size","limit":20}}' | jq

Execute elk_search_logs:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_search_logs","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","log_level":"error","limit":20}}' | jq

Execute elk_detect_errors:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_detect_errors","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","top_n":10}}' | jq

Execute elk_logstash_health:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ops-key" \
  -d '{"tool_name":"elk_logstash_health","input":{"pipeline_id":"main"}}' | jq

Execute elk_filebeat_status:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_filebeat_status","input":{"index_pattern":"filebeat-*","max_delay_minutes":5}}' | jq

Execute elk_kibana_status:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_kibana_status","input":{"include_plugins":true}}' | jq

Execute elk_apm_summary:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_apm_summary","input":{"service_name":"payment-service","start_time":"now-1h","end_time":"now"}}' | jq

Execute elk_recommend_fix:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_recommend_fix","input":{"findings":{"cluster_health":{"status":"yellow","metrics":{"unassigned_shards":2}}}}}' | jq

Contoh JSON-RPC:

curl -sS -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"jsonrpc":"2.0","id":"1","method":"mcp.list_tools","params":{}}' | jq

Contoh response elk_cluster_health:

{
  "ok": true,
  "tool_name": "elk_cluster_health",
  "data": {
    "status": "yellow",
    "summary": "Cluster prod-elk status=yellow, nodes=6, unassigned_shards=2",
    "metrics": {
      "cluster_name": "prod-elk",
      "number_of_nodes": 6,
      "active_shards": 1240,
      "relocating_shards": 0,
      "initializing_shards": 0,
      "unassigned_shards": 2
    },
    "recommendation": [
      "Periksa replica shard yang belum ter-assign.",
      "Jalankan analisa allocation explain untuk shard unassigned (read-only)."
    ]
  }
}

Contoh response elk_detect_errors:

{
  "ok": true,
  "tool_name": "elk_detect_errors",
  "data": {
    "total_errors": 182,
    "errors_by_service": [
      {"service": "payment-service", "count": 145}
    ],
    "top_error_messages": [
      {"message": "timeout to fraud-service", "count": 72}
    ],
    "samples": [
      {"timestamp": "2026-04-26T01:25:00Z", "service": "payment-service", "log_level": "error", "message": "timeout to fraud-service", "trace_id": "abc"}
    ],
    "recommendation": [
      "Validasi error paling sering dengan trace_id untuk korelasi lintas service."
    ]
  }
}

12) Testing

Run semua test:

pytest -q

Script Uji Coba Cepat

Seed data simulasi:

chmod +x scripts/*.sh
./scripts/seed_data.sh

Smoke test end-to-end:

./scripts/smoke_test.sh

Contoh dengan custom endpoint/key:

MCP_BASE_URL=http://localhost:8080 \
MCP_VIEWER_KEY=dev-key \
MCP_OPERATOR_KEY=ops-key \
./scripts/smoke_test.sh

Run lint/type:

ruff check .
mypy app

Test yang sudah disediakan

  • Unit test tool registry.
  • Unit test RBAC.
  • Unit test secret masking.
  • Unit test Elasticsearch query builder.
  • Integration test MCP + mock Elasticsearch.
  • Integration test MCP + mock Kibana.
  • Integration test MCP + mock Logstash.

Smoke Test Checklist

  • GET /healthz mengembalikan 200.
  • GET /readyz status ready saat ES up.
  • GET /mcp/tools mengembalikan daftar tool.
  • POST /mcp/execute dengan key valid berhasil.
  • POST /mcp/execute dengan key invalid mengembalikan 401.
  • Tool elk_logstash_health dengan role viewer ditolak (403).
  • Query berbahaya ditolak.
  • Response besar ditolak (413) bila melewati limit.
  • /metrics dapat di-scrape Prometheus.

13) Troubleshooting Guide

Masalah Gejala Kemungkinan Penyebab Command Pengecekan Solusi Aman
OpenClaw tidak bisa connect MCP Server Timeout/connection refused DNS/Ingress/Service salah kubectl get ingress -n mcpserver-elk Perbaiki host/path Ingress dan Service port
401 API key invalid Response authentication_failed X-API-Key salah/tidak dikirim curl -i http://host/mcp/tools Update key di OpenClaw, sinkronkan Secret
403 RBAC denied Response permission_denied Role tidak punya akses tool curl ... /mcp/execute Gunakan API key role tepat atau sesuaikan policy
Elasticsearch TLS error CERTIFICATE_VERIFY_FAILED CA cert salah/expired openssl s_client -connect es:9200 -showcerts Mount CA valid, aktifkan verify TLS
Elasticsearch authentication failed 401 dari ES User/password salah curl -u user:pass https://es:9200/_cluster/health Rotasi secret kredensial readonly
index pattern denied 403 pattern not allowed Pattern di luar allowlist cek ALLOWED_INDEX_PATTERNS Tambah pattern aman di allowlist
query timeout 504 timeout Query berat / cluster sibuk GET /_tasks?detailed=true&actions=*search Kecilkan range waktu, turunkan limit, optimasi index
response too large 413 response_too_large Hasil terlalu besar cek MAX_RESPONSE_BYTES Kurangi limit/filter, naikkan limit secara terukur
Kibana 401 Tool kibana gagal auth User Kibana salah curl -u user:pass https://kibana/api/status -H 'kbn-xsrf:true' Pakai akun readonly Kibana valid
Kibana status unavailable status degraded/down Kibana/ES backend issue curl https://kibana/api/status Cek koneksi Kibana -> Elasticsearch
Logstash monitoring API mati tool logstash error 502 Port 9600 down/firewall curl http://logstash:9600/_node/stats Aktifkan monitoring API / perbaiki network
Filebeat delay ingestion tinggi delayed_hosts meningkat Agent terputus/backpressure GET filebeat-*/_search Cek output beat, network, queue Logstash
cluster yellow status yellow Replica belum teralokasi GET /_cluster/health + GET /_cat/shards?v Tambah node/disk, cek allocation rule
cluster red status red Primary shard unassigned GET /_cluster/allocation/explain Prioritaskan recovery shard primary
shard unassigned unassigned_shards > 0 Disk watermark/node down/filter allocation GET /_cluster/allocation/explain Bebaskan disk, perbaiki node, cek awareness setting
disk watermark exceeded shard tidak bisa allocate Disk penuh > watermark GET /_cat/allocation?v Tambah kapasitas, ILM cleanup, rebalance
JVM heap tinggi heap > 80% Query/aggs berat, shard terlalu banyak GET /_nodes/stats/jvm Optimasi query, kurangi shard, tuning heap
Logstash pipeline stuck events in naik, out stagnan Output blocked / queue penuh GET http://logstash:9600/_node/stats Cek output plugin, perbesar worker/queue dengan aman

14) Production Checklist

  • Elasticsearch user sudah read-only.
  • TLS certificate valid dan verify aktif.
  • API key dirotasi berkala.
  • RBAC per tool aktif.
  • Audit log aktif (JSON).
  • Prometheus scrape /metrics aktif.
  • Dashboard Grafana tersedia.
  • NetworkPolicy aktif.
  • Resource request/limit aktif.
  • HPA aktif.
  • Secret tidak muncul di log.
  • Dangerous query ditolak.
  • Backup config/manifest tersedia.
  • CI/CD security scan aktif.

15) Contoh Bamboo Pipeline (CI/CD)

---
version: 2
plan:
  project-key: MCP
  key: ELK
  name: mcpserver-elk

stages:
  - Build & Test:
      jobs:
        - lint-test-build

jobs:
  - lint-test-build:
      docker:
        image: python:3.12-slim
      tasks:
        - script: |
            python -m pip install --upgrade pip
            pip install -r requirements.txt
            ruff check .
            mypy app
            pytest -q
        - script: |
            docker build -t new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0 .
        - script: |
            trivy image --exit-code 1 new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
        - script: |
            docker login new-nexus.bri.co.id -u "$NEXUS_USER" -p "$NEXUS_PASS"
            docker push new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
        - script: |
            kubectl apply -f k8s/
            kubectl rollout status deploy/mcpserver-elk -n mcpserver-elk
        - script: |
            curl -fsS https://mcp-elk.example.com/healthz
            curl -fsS https://mcp-elk.example.com/readyz

16) Catatan Enterprise

  • Gunakan HTTPS end-to-end (Ingress TLS + upstream TLS).
  • Simpan secret di secret manager (Vault/KMS/ExternalSecret), bukan plaintext di repo.
  • Gunakan image signing + SBOM untuk compliance.
  • Pastikan user Elasticsearch memiliki role read-only (monitor, read, tanpa write/manage).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors