MCP Server berbasis Python 3.12 + FastAPI untuk analisa ELK Stack secara aman (read-only) dan kompatibel OpenClaw.
flowchart TB
OC[OpenClaw] -->|X-API-Key| API[FastAPI MCP Endpoint]
API --> SEC[Security Layer\nAPI Key Auth + RBAC + Rate Limit]
SEC --> REG[Tool Registry\nDiscovery + Execute + Schema Validation]
REG --> TOOLS[MCP Tools\nELK Cluster/Logs/Kibana/Logstash/Filebeat/APM/Recommendation]
TOOLS --> CTRL[Controllers]
CTRL --> SRV[Services\nBusiness Logic]
SRV --> REPO[Repositories\nRead-Only Data Access]
REPO --> ES[(Elasticsearch)]
REPO --> KB[(Kibana API)]
REPO --> LS[(Logstash Monitoring API)]
API --> AUDIT[Structured JSON Audit Log]
API --> METRICS[Prometheus Metrics /metrics]
classDef safe fill:#e7f7ef,stroke:#1f8f5f,stroke-width:1px;
class SEC,REG,TOOLS,AUDIT,METRICS safe;
mcpserver-elk/
├── app/
│ ├── main.py
│ ├── api/routes/
│ │ ├── health_controller.py
│ │ ├── mcp_controller.py
│ │ └── metrics_controller.py
│ ├── core/
│ │ ├── config.py
│ │ ├── exceptions.py
│ │ ├── logging.py
│ │ ├── masking.py
│ │ ├── metrics.py
│ │ ├── rate_limit.py
│ │ └── security.py
│ ├── mcp/
│ │ ├── registry.py
│ │ ├── schemas.py
│ │ ├── server.py
│ │ └── tool_base.py
│ ├── models/
│ │ ├── common_models.py
│ │ ├── elk_models.py
│ │ ├── mcp_models.py
│ │ └── schemas.py
│ ├── controllers/
│ │ ├── elk_controller.py
│ │ └── recommendation_controller.py
│ ├── services/
│ │ ├── apm_service.py
│ │ ├── elk_cluster_service.py
│ │ ├── elk_logs_service.py
│ │ ├── filebeat_service.py
│ │ ├── kibana_service.py
│ │ ├── logstash_service.py
│ │ └── recommendation_service.py
│ ├── repositories/
│ │ ├── apm_repository.py
│ │ ├── elasticsearch_repository.py
│ │ ├── kibana_repository.py
│ │ └── logstash_repository.py
│ ├── clients/
│ │ ├── elasticsearch_client.py
│ │ └── http_client.py
│ ├── tools/
│ │ ├── elk_apm.py
│ │ ├── elk_cluster.py
│ │ ├── elk_cluster_tools.py
│ │ ├── elk_filebeat.py
│ │ ├── elk_kibana.py
│ │ ├── elk_logs.py
│ │ ├── elk_logs_tools.py
│ │ ├── elk_logstash.py
│ │ ├── filebeat_tools.py
│ │ ├── kibana_tools.py
│ │ ├── logstash_tools.py
│ │ ├── recommendation.py
│ │ └── recommendation_tools.py
│ └── utils/
│ ├── query_builder.py
│ ├── response_limiter.py
│ └── time_range.py
├── tests/
│ ├── conftest.py
│ ├── integration/
│ └── unit/
├── k8s/
│ ├── configmap.yaml
│ ├── deployment.yaml
│ ├── hpa.yaml
│ ├── ingress.yaml
│ ├── namespace.yaml
│ ├── networkpolicy.yaml
│ ├── pdb.yaml
│ ├── secret.yaml
│ ├── service.yaml
│ └── serviceaccount.yaml
├── .env.example
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
├── requirements.txt
└── README.md
- Read-only by default, tidak ada endpoint write/delete/restart.
- API key auth via
X-API-Key. - RBAC per tool (
elk_viewer,elk_operator,elk_admin_readonly). - Allowlist index pattern (
ALLOWED_INDEX_PATTERNS). - Denylist dangerous query (
script,painless,delete_by_query, dsb). - Timeout + retry terbatas untuk ES/HTTP API.
- Rate limit per API key.
- Audit log sebelum/sesudah eksekusi tool.
- Structured JSON logging.
- Masking secret (password/token/api_key/authorization/cookie).
- Response size limiter (
MAX_RESPONSE_BYTES). - TLS verification aktif default.
GET /healthzGET /readyzGET /metricsGET /metrics/jsonGET /mcp/toolsPOST /mcp/executePOST /mcp(JSON-RPC)
elk_cluster_healthelk_nodes_statselk_indices_summaryelk_search_logselk_detect_errorselk_logstash_healthelk_filebeat_statuselk_kibana_statuselk_apm_summaryelk_recommend_fix
Gunakan file .env.example:
cp .env.example .envpython3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reloadBuild & run:
docker build -t mcpserver-elk:1.0.0 .
docker run --rm -p 8080:8080 --env-file .env mcpserver-elk:1.0.0Docker Compose lab (dengan sample ELK):
docker compose --profile lab up -d --buildkubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/serviceaccount.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/pdb.yamlContoh konfigurasi OpenClaw (contoh JSON):
{
"mcpServers": [
{
"name": "elk-prod-readonly",
"url": "https://mcp-elk.example.com/mcp",
"headers": {
"X-API-Key": "ops-key"
},
"timeoutSeconds": 30,
"tools": [
"elk_cluster_health",
"elk_nodes_stats",
"elk_indices_summary",
"elk_search_logs",
"elk_detect_errors",
"elk_logstash_health",
"elk_filebeat_status",
"elk_kibana_status",
"elk_apm_summary",
"elk_recommend_fix"
]
}
]
}Contoh prompt OpenClaw:
Gunakan MCP Server ELK untuk cek cluster health Elasticsearch, cari error log service payment-service dalam 1 jam terakhir, kelompokkan error terbanyak, analisa root cause, dan berikan rekomendasi perbaikan yang aman.
List tools:
curl -sS -H "X-API-Key: dev-key" http://localhost:8080/mcp/tools | jqExecute elk_cluster_health:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_cluster_health","input":{"include_shards":true}}' | jqExecute elk_nodes_stats:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_nodes_stats","input":{"include_thread_pool":false}}' | jqExecute elk_indices_summary:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_indices_summary","input":{"index_pattern":"logs-*","sort_by":"size","limit":20}}' | jqExecute elk_search_logs:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_search_logs","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","log_level":"error","limit":20}}' | jqExecute elk_detect_errors:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_detect_errors","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","top_n":10}}' | jqExecute elk_logstash_health:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: ops-key" \
-d '{"tool_name":"elk_logstash_health","input":{"pipeline_id":"main"}}' | jqExecute elk_filebeat_status:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_filebeat_status","input":{"index_pattern":"filebeat-*","max_delay_minutes":5}}' | jqExecute elk_kibana_status:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_kibana_status","input":{"include_plugins":true}}' | jqExecute elk_apm_summary:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_apm_summary","input":{"service_name":"payment-service","start_time":"now-1h","end_time":"now"}}' | jqExecute elk_recommend_fix:
curl -sS -X POST http://localhost:8080/mcp/execute \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"tool_name":"elk_recommend_fix","input":{"findings":{"cluster_health":{"status":"yellow","metrics":{"unassigned_shards":2}}}}}' | jqContoh JSON-RPC:
curl -sS -X POST http://localhost:8080/mcp \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-key" \
-d '{"jsonrpc":"2.0","id":"1","method":"mcp.list_tools","params":{}}' | jqContoh response elk_cluster_health:
{
"ok": true,
"tool_name": "elk_cluster_health",
"data": {
"status": "yellow",
"summary": "Cluster prod-elk status=yellow, nodes=6, unassigned_shards=2",
"metrics": {
"cluster_name": "prod-elk",
"number_of_nodes": 6,
"active_shards": 1240,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 2
},
"recommendation": [
"Periksa replica shard yang belum ter-assign.",
"Jalankan analisa allocation explain untuk shard unassigned (read-only)."
]
}
}Contoh response elk_detect_errors:
{
"ok": true,
"tool_name": "elk_detect_errors",
"data": {
"total_errors": 182,
"errors_by_service": [
{"service": "payment-service", "count": 145}
],
"top_error_messages": [
{"message": "timeout to fraud-service", "count": 72}
],
"samples": [
{"timestamp": "2026-04-26T01:25:00Z", "service": "payment-service", "log_level": "error", "message": "timeout to fraud-service", "trace_id": "abc"}
],
"recommendation": [
"Validasi error paling sering dengan trace_id untuk korelasi lintas service."
]
}
}Run semua test:
pytest -qSeed data simulasi:
chmod +x scripts/*.sh
./scripts/seed_data.shSmoke test end-to-end:
./scripts/smoke_test.shContoh dengan custom endpoint/key:
MCP_BASE_URL=http://localhost:8080 \
MCP_VIEWER_KEY=dev-key \
MCP_OPERATOR_KEY=ops-key \
./scripts/smoke_test.shRun lint/type:
ruff check .
mypy app- Unit test tool registry.
- Unit test RBAC.
- Unit test secret masking.
- Unit test Elasticsearch query builder.
- Integration test MCP + mock Elasticsearch.
- Integration test MCP + mock Kibana.
- Integration test MCP + mock Logstash.
GET /healthzmengembalikan 200.GET /readyzstatusreadysaat ES up.GET /mcp/toolsmengembalikan daftar tool.POST /mcp/executedengan key valid berhasil.POST /mcp/executedengan key invalid mengembalikan 401.- Tool
elk_logstash_healthdengan role viewer ditolak (403). - Query berbahaya ditolak.
- Response besar ditolak (413) bila melewati limit.
/metricsdapat di-scrape Prometheus.
| Masalah | Gejala | Kemungkinan Penyebab | Command Pengecekan | Solusi Aman |
|---|---|---|---|---|
| OpenClaw tidak bisa connect MCP Server | Timeout/connection refused | DNS/Ingress/Service salah | kubectl get ingress -n mcpserver-elk |
Perbaiki host/path Ingress dan Service port |
| 401 API key invalid | Response authentication_failed |
X-API-Key salah/tidak dikirim |
curl -i http://host/mcp/tools |
Update key di OpenClaw, sinkronkan Secret |
| 403 RBAC denied | Response permission_denied |
Role tidak punya akses tool | curl ... /mcp/execute |
Gunakan API key role tepat atau sesuaikan policy |
| Elasticsearch TLS error | CERTIFICATE_VERIFY_FAILED |
CA cert salah/expired | openssl s_client -connect es:9200 -showcerts |
Mount CA valid, aktifkan verify TLS |
| Elasticsearch authentication failed | 401 dari ES | User/password salah | curl -u user:pass https://es:9200/_cluster/health |
Rotasi secret kredensial readonly |
| index pattern denied | 403 pattern not allowed | Pattern di luar allowlist | cek ALLOWED_INDEX_PATTERNS |
Tambah pattern aman di allowlist |
| query timeout | 504 timeout | Query berat / cluster sibuk | GET /_tasks?detailed=true&actions=*search |
Kecilkan range waktu, turunkan limit, optimasi index |
| response too large | 413 response_too_large | Hasil terlalu besar | cek MAX_RESPONSE_BYTES |
Kurangi limit/filter, naikkan limit secara terukur |
| Kibana 401 | Tool kibana gagal auth | User Kibana salah | curl -u user:pass https://kibana/api/status -H 'kbn-xsrf:true' |
Pakai akun readonly Kibana valid |
| Kibana status unavailable | status degraded/down | Kibana/ES backend issue | curl https://kibana/api/status |
Cek koneksi Kibana -> Elasticsearch |
| Logstash monitoring API mati | tool logstash error 502 | Port 9600 down/firewall | curl http://logstash:9600/_node/stats |
Aktifkan monitoring API / perbaiki network |
| Filebeat delay ingestion tinggi | delayed_hosts meningkat | Agent terputus/backpressure | GET filebeat-*/_search |
Cek output beat, network, queue Logstash |
| cluster yellow | status yellow | Replica belum teralokasi | GET /_cluster/health + GET /_cat/shards?v |
Tambah node/disk, cek allocation rule |
| cluster red | status red | Primary shard unassigned | GET /_cluster/allocation/explain |
Prioritaskan recovery shard primary |
| shard unassigned | unassigned_shards > 0 | Disk watermark/node down/filter allocation | GET /_cluster/allocation/explain |
Bebaskan disk, perbaiki node, cek awareness setting |
| disk watermark exceeded | shard tidak bisa allocate | Disk penuh > watermark | GET /_cat/allocation?v |
Tambah kapasitas, ILM cleanup, rebalance |
| JVM heap tinggi | heap > 80% | Query/aggs berat, shard terlalu banyak | GET /_nodes/stats/jvm |
Optimasi query, kurangi shard, tuning heap |
| Logstash pipeline stuck | events in naik, out stagnan | Output blocked / queue penuh | GET http://logstash:9600/_node/stats |
Cek output plugin, perbesar worker/queue dengan aman |
- Elasticsearch user sudah read-only.
- TLS certificate valid dan verify aktif.
- API key dirotasi berkala.
- RBAC per tool aktif.
- Audit log aktif (JSON).
- Prometheus scrape
/metricsaktif. - Dashboard Grafana tersedia.
- NetworkPolicy aktif.
- Resource request/limit aktif.
- HPA aktif.
- Secret tidak muncul di log.
- Dangerous query ditolak.
- Backup config/manifest tersedia.
- CI/CD security scan aktif.
---
version: 2
plan:
project-key: MCP
key: ELK
name: mcpserver-elk
stages:
- Build & Test:
jobs:
- lint-test-build
jobs:
- lint-test-build:
docker:
image: python:3.12-slim
tasks:
- script: |
python -m pip install --upgrade pip
pip install -r requirements.txt
ruff check .
mypy app
pytest -q
- script: |
docker build -t new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0 .
- script: |
trivy image --exit-code 1 new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
- script: |
docker login new-nexus.bri.co.id -u "$NEXUS_USER" -p "$NEXUS_PASS"
docker push new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
- script: |
kubectl apply -f k8s/
kubectl rollout status deploy/mcpserver-elk -n mcpserver-elk
- script: |
curl -fsS https://mcp-elk.example.com/healthz
curl -fsS https://mcp-elk.example.com/readyz- Gunakan HTTPS end-to-end (Ingress TLS + upstream TLS).
- Simpan secret di secret manager (Vault/KMS/ExternalSecret), bukan plaintext di repo.
- Gunakan image signing + SBOM untuk compliance.
- Pastikan user Elasticsearch memiliki role read-only (
monitor,read, tanpawrite/manage).