diff --git a/docs/toolhive/concepts/observability.md b/docs/toolhive/concepts/observability.md index 276cba9d..1c4faea7 100644 --- a/docs/toolhive/concepts/observability.md +++ b/docs/toolhive/concepts/observability.md @@ -417,13 +417,15 @@ groups: Now that you understand how ToolHive's observability works, you can: 1. **Choose a monitoring backend** that fits your needs and budget -2. **Enable telemetry** when running your servers: +2. **Follow the tutorial** to set up a local observability stack with + [OpenTelemetry, Jaeger, Prometheus, and Grafana](../tutorials/opentelemetry.mdx) +3. **Enable telemetry** when running your servers: - [using the ToolHive CLI](../guides-cli/telemetry-and-metrics.md) - using the Kubernetes operator (not yet supported - [contributions welcome](https://github.com/stacklok/toolhive/releases/tag/v0.2.0)) -3. **Set up basic dashboards** to track request rates, error rates, and response +4. **Set up basic dashboards** to track request rates, error rates, and response times -4. **Configure alerts** for critical issues +5. **Configure alerts** for critical issues The telemetry system works automatically once enabled, providing immediate insights into your MCP server performance and usage patterns. diff --git a/docs/toolhive/guides-cli/telemetry-and-metrics.md b/docs/toolhive/guides-cli/telemetry-and-metrics.md index 7dee480e..576368c9 100644 --- a/docs/toolhive/guides-cli/telemetry-and-metrics.md +++ b/docs/toolhive/guides-cli/telemetry-and-metrics.md @@ -322,6 +322,8 @@ Telemetry adds minimal overhead when properly configured: ## Related information +- Tutorial: + [Collect telemetry for MCP workloads](../tutorials/opentelemetry.mdx) - [Telemetry and monitoring concepts](../concepts/observability.md) - [`thv run` command reference](../reference/cli/thv_run.md) - [Run MCP servers](run-mcp-servers.mdx) diff --git a/docs/toolhive/guides-k8s/telemetry-and-metrics.md b/docs/toolhive/guides-k8s/telemetry-and-metrics.md index d0cf3d97..7268ed79 100644 --- a/docs/toolhive/guides-k8s/telemetry-and-metrics.md +++ b/docs/toolhive/guides-k8s/telemetry-and-metrics.md @@ -278,6 +278,11 @@ Telemetry adds minimal overhead when properly configured: ## Related information +- Tutorial: + [Collect telemetry for MCP workloads](../tutorials/opentelemetry.mdx) - + Step-by-step guide to set up a local observability stack +- [Telemetry and monitoring concepts](../concepts/observability.md) - Overview + of ToolHive's observability architecture - [Kubernetes CRD reference](../reference/crd-spec.mdx) - Reference for the `MCPServer` Custom Resource Definition (CRD) - [Deploy the operator using Helm](./deploy-operator-helm.md) - Install the diff --git a/docs/toolhive/tutorials/opentelemetry.mdx b/docs/toolhive/tutorials/opentelemetry.mdx new file mode 100644 index 00000000..7b64f5d3 --- /dev/null +++ b/docs/toolhive/tutorials/opentelemetry.mdx @@ -0,0 +1,989 @@ +--- +title: Collect telemetry for MCP workloads +description: + Learn how to collect metrics and traces for MCP workloads using either the + ToolHive CLI or Kubernetes operator with OpenTelemetry, Jaeger, and + Prometheus. +toc_max_heading_level: 2 +--- + +import useBaseUrl from '@docusaurus/useBaseUrl'; +import ThemedImage from '@theme/ThemedImage'; + +In this tutorial, you'll set up comprehensive observability for your MCP +workloads using [OpenTelemetry](https://opentelemetry.io/) with +[Jaeger](https://www.jaegertracing.io/) for distributed tracing, +[Prometheus](https://prometheus.io/) for metrics collection, and +[Grafana](https://grafana.com/) for visualization. + +By the end, you'll have a complete, industry-standard observability solution +that captures detailed traces and metrics, giving you visibility into your MCP +server performance and usage patterns. + + + +## Choose your deployment path + +This tutorial offers two paths for MCP observability: + + + + +**ToolHive CLI + Docker observability stack** + +Use the ToolHive CLI to run MCP servers locally, with Jaeger and Prometheus +running in Docker containers. This approach is perfect for: + +- Local development and testing +- Quick setup and experimentation +- Individual developer workflows +- Learning OpenTelemetry concepts + + + + +**ToolHive Kubernetes Operator + in-cluster observability** + +Use the ToolHive Kubernetes operator to manage MCP servers in a cluster, with +Jaeger and Prometheus deployed inside Kubernetes. This approach is ideal for: + +- Production-like environments +- Team collaboration and shared infrastructure +- Container orchestration workflows +- Scalable observability deployments + + + + +:::tip[Choose one path] + +Select your preferred deployment method using the tabs above. All subsequent +steps will show instructions for your chosen path. + +::: + +## What you'll learn + +- How to deploy Jaeger and Prometheus for your chosen environment +- How to configure OpenTelemetry collection for ToolHive MCP servers +- How to analyze traces in Jaeger and metrics in Prometheus +- How to set up queries and monitoring for MCP workloads +- Best practices for observability in your deployment environment + +## Prerequisites + +Before starting this tutorial, make sure you have: + + + + +- Completed the [ToolHive CLI quickstart](./quickstart-cli.mdx) +- A supported container runtime installed and running. + [Docker](https://docs.docker.com/get-docker/) or + [Podman](https://podman-desktop.io/downloads) are recommended for this + tutorial +- [Docker Compose](https://docs.docker.com/compose/install/) or + [Podman Compose](https://podman-desktop.io/docs/compose/setting-up-compose) + available +- A [supported MCP client](../reference/client-compatibility.mdx) for testing + + + + +- Completed the [ToolHive Kubernetes quickstart](./quickstart-k8s.mdx) with a + local kind cluster +- [`kubectl`](https://kubernetes.io/docs/tasks/tools/) configured to access your + cluster +- [Helm](https://helm.sh/docs/intro/install/) (v3.10 minimum) installed +- A [supported MCP client](../reference/client-compatibility.mdx) for testing +- The [ToolHive CLI](./quickstart-cli.mdx) (optional, for client configuration) +- Basic familiarity with Kubernetes concepts + + + + +## Overview + +The architecture for each deployment method: + + + + +```mermaid +graph TB + A[AI client] + THV[ToolHive CLI] + Proxy[Proxy process] + subgraph Docker[**Docker**] + MCP[MCP server
container] + OTEL[OTel Collector] + J[Jaeger] + P[Prometheus] + G[Grafana] + end + + THV -. manages .-> MCP & Proxy + Proxy -- HTTP or stdio --> MCP + Proxy -- OpenTelemetry data --> OTEL + OTEL -- traces --> J + OTEL -- metrics --> P + G -- visualization --> P + A -- HTTP --> Proxy +``` + +Your setup will include: + +- **ToolHive CLI** managing MCP servers in containers +- **Jaeger** for distributed tracing with built-in UI +- **Prometheus** for metrics collection with web UI +- **OpenTelemetry Collector** forwarding data to both backends + +
+ + +```mermaid +graph TB + A[AI client] + subgraph K8s[**K8s Cluster**] + THV[ToolHive Operator] + Proxy[Proxyrunner pod] + MCP[MCP server pod] + OTEL[OTel Collector] + J[Jaeger] + P[Prometheus] + G[Grafana] + end + + A -- HTTP
via ingress --> Proxy + THV -. manages .-> Proxy + Proxy -. manages .-> MCP + Proxy -- OpenTelemetry data --> OTEL + OTEL -- traces --> J + OTEL -- metrics --> P + G -- visualization --> P +``` + +Your setup will include: + +- **ToolHive Operator** managing MCP servers as Kubernetes pods +- **Jaeger** for distributed tracing +- **Prometheus** for metrics collection +- **Grafana** for metrics visualization +- **OpenTelemetry Collector** running as a Kubernetes service + +
+
+ +## Step 1: Deploy the observability stack + +First, set up the observability infrastructure for your chosen environment. + + + + +### Create Docker Compose configuration + +Create a Docker Compose file for the observability stack: + +```yaml title="observability-stack.yml" +services: + jaeger: + image: jaegertracing/jaeger:latest + container_name: jaeger + environment: + - COLLECTOR_OTLP_ENABLED=true + ports: + - '16686:16686' # Jaeger UI + networks: + - observability + + prometheus: + image: prom/prometheus:latest + container_name: prometheus + command: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--storage.tsdb.path=/prometheus' + - '--web.console.libraries=/etc/prometheus/console_libraries' + - '--web.console.templates=/etc/prometheus/consoles' + - '--web.enable-lifecycle' + - '--enable-feature=native-histograms' + ports: + - '9090:9090' + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml + - prometheus-data:/prometheus + networks: + - observability + + grafana: + image: grafana/grafana:latest + container_name: grafana + environment: + - GF_SECURITY_ADMIN_USER=admin + - GF_SECURITY_ADMIN_PASSWORD=admin + - GF_USERS_ALLOW_SIGN_UP=false + ports: + - '3000:3000' + volumes: + - ./grafana-prometheus.yml:/etc/grafana/provisioning/datasources/prometheus.yml + - grafana-data:/var/lib/grafana + networks: + - observability + + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + container_name: otel-collector + command: ['--config=/etc/otel-collector-config.yml'] + volumes: + - ./otel-collector-config.yml:/etc/otel-collector-config.yml + ports: + - '4318:4318' # OTLP HTTP receiver (ToolHive sends here) + - '8889:8889' # Prometheus exporter metrics + depends_on: + - jaeger + - prometheus + networks: + - observability + +volumes: + prometheus-data: + grafana-data: + +networks: + observability: + driver: bridge +``` + +### Configure the OpenTelemetry Collector + +Create the collector configuration to export to both Jaeger and Prometheus: + +```yaml title="otel-collector-config.yml" +receivers: + otlp: + protocols: + http: + endpoint: 0.0.0.0:4318 + +processors: + batch: + timeout: 10s + send_batch_size: 1024 + +exporters: + # Export traces to Jaeger + otlp/jaeger: + endpoint: jaeger:4317 + tls: + insecure: true + + # Expose metrics for Prometheus + prometheus: + endpoint: 0.0.0.0:8889 + const_labels: + service: 'toolhive-mcp-proxy' + +service: + pipelines: + traces: + receivers: [otlp] + processors: [batch] + exporters: [otlp/jaeger] + metrics: + receivers: [otlp] + processors: [batch] + exporters: [prometheus] +``` + +### Configure Prometheus and Grafana + +Create a Prometheus configuration to scrape the OpenTelemetry Collector: + +```yaml title="prometheus.yml" +global: + scrape_interval: 15s + +scrape_configs: + - job_name: 'otel-collector' + static_configs: + - targets: ['otel-collector:8889'] +``` + +Create the Prometheus data source configuration for Grafana: + +```yaml title="grafana-prometheus.yml" +apiVersion: 1 + +datasources: + - name: prometheus + type: prometheus + access: proxy + url: http://prometheus:9090 + isDefault: true + editable: true +``` + +### Start the observability stack + +Deploy the stack and verify it's running: + +```bash +# Start the stack +docker compose -f observability-stack.yml up -d + +# Verify Jaeger is running +curl http://localhost:16686/api/services + +# Verify Prometheus is running +curl http://localhost:9090/-/healthy + +# Verify the OpenTelemetry Collector is ready +curl -I http://localhost:8889/metrics +``` + +Access the interfaces: + +- **Jaeger UI**: `http://localhost:16686` +- **Prometheus Web UI**: `http://localhost:9090` +- **Grafana**: `http://localhost:3000` (login: admin/admin) + + + + +### Prerequisite + +If you've completed the [Kubernetes quickstart](./quickstart-k8s.mdx), skip to +the next step. + +Otherwise, set up a local kind cluster and install the ToolHive operator: + +```bash +kind create cluster --name toolhive +helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds +helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace +``` + +Verify the operator is running: + +```bash +kubectl get pods -n toolhive-system +``` + +### Create the monitoring namespace + +Create a dedicated namespace for your observability stack: + +```bash +kubectl create namespace monitoring +``` + +### Deploy Jaeger + +Install Jaeger using Helm with a configuration suited for ToolHive: + +```bash +helm repo add jaegertracing https://jaegertracing.github.io/helm-charts +helm repo update +helm upgrade -i jaeger-all-in-one jaegertracing/jaeger -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.3.6/examples/otel/jaeger-values.yaml -n monitoring +``` + +### Deploy Prometheus and Grafana + +Install Prometheus and Grafana using the kube-prometheus-stack Helm chart: + +```bash +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm repo update +helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack -f https://raw.githubusercontent.com/stacklok/toolhive/v0.3.6/examples/otel/prometheus-stack-values.yaml -n monitoring +``` + +### Deploy OpenTelemetry Collector + +Create the collector configuration and deployment manifest: + +```bash +helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts +helm repo update +helm upgrade -i otel-collector open-telemetry/opentelemetry-collector -f https://raw.githubusercontent.com/stacklok/toolhive/v0.3.6/examples/otel/otel-values.yaml -n monitoring +``` + +### Verify all components + +Verify all components are running: + +```bash +kubectl get pods -n monitoring +``` + +Wait for all pods to be in Running status before proceeding. The output should +look similar to: + +```text +NAME READY STATUS RESTARTS AGE +jaeger-all-in-one-6bf667c984-p5455 1/1 Running 0 2m12s +kube-prometheus-stack-grafana-69c88f77c5-b9f7m 3/3 Running 0 37s +kube-prometheus-stack-kube-state-metrics-55cb9c8889-cnlkt 1/1 Running 0 37s +kube-prometheus-stack-operator-85655fb7cd-rxms9 1/1 Running 0 37s +kube-prometheus-stack-prometheus-node-exporter-zzcvh 1/1 Running 0 37s +otel-collector-opentelemetry-collector-agent-hqtnq 1/1 Running 0 11s +prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 36s +``` + + + + +## Step 2: Configure MCP server telemetry + +Now configure your MCP servers to send telemetry data to the observability +stack. + + + + +### Set global telemetry configuration + +Configure ToolHive CLI with default telemetry settings to send data to the +OpenTelemetry Collector: + +```bash +# Configure the OpenTelemetry endpoint (collector, not directly to Jaeger) +thv config otel set-endpoint localhost:4318 + +# Enable both metrics and tracing +thv config otel set-metrics-enabled true +thv config otel set-tracing-enabled true + +# Set 100% sampling for development +thv config otel set-sampling-rate 1.0 + +# Use insecure connection for local development +thv config otel set-insecure true +``` + +### Run an MCP server with telemetry + +Start an MCP server with enhanced telemetry configuration: + +```bash +thv run \ + --otel-service-name "mcp-fetch-server" \ + --otel-env-vars "USER,HOST" \ + --otel-enable-prometheus-metrics-path \ + fetch +``` + +Verify the server started and is exporting telemetry: + +```bash +# Check server status +thv list + +# Check Prometheus metrics are available on the MCP server +PORT=$(thv list | grep fetch | awk '{print $5}') +curl http://localhost:$PORT/metrics +``` + + + + +### Create an MCP server with telemetry + +Create an MCPServer resource with comprehensive telemetry configuration: + +```yaml {18-30} title="fetch-with-telemetry.yml" +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServer +metadata: + name: fetch-telemetry + namespace: toolhive-system +spec: + image: ghcr.io/stackloklabs/gofetch/server + transport: streamable-http + port: 8080 + targetPort: 8080 + resources: + limits: + cpu: '100m' + memory: '128Mi' + requests: + cpu: '50m' + memory: '64Mi' + telemetry: + openTelemetry: + enabled: true + endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 + serviceName: mcp-fetch-server + insecure: true # Using HTTP collector endpoint + metrics: + enabled: true + tracing: + enabled: true + samplingRate: '1.0' + prometheus: + enabled: true +``` + +Deploy the MCP server: + +```bash +kubectl apply -f fetch-with-telemetry.yml +``` + +Verify the MCP server is running and healthy: + +```bash +# Verify the server is running +kubectl get mcpserver -n toolhive-system + +# Check the pods are healthy +kubectl get pods -n toolhive-system -l app.kubernetes.io/instance=fetch-telemetry +``` + + + + +## Step 3: Generate telemetry data + +Create some MCP interactions to generate traces and metrics for analysis. + + + + +### Connect your AI client + +Your MCP server is already configured to work with your AI client from the CLI +quickstart. Simply use your client to make requests that will generate telemetry +data. + + + + +### Port-forward to access the MCP server + +In a separate terminal window, create a port-forward to connect your AI client: + +```bash +kubectl port-forward service/mcp-fetch-telemetry-proxy -n toolhive-system 8080:8080 +``` + +Leave this running for the duration of this tutorial. + +### Configure your AI client + +Use the ToolHive CLI to add the MCP server to your client configuration: + +```bash +thv run http://localhost:8080/mcp --name fetch-k8s --transport streamable-http +``` + + + + +### Generate sample data + +Make several requests using your AI client to create diverse telemetry: + +1. **Basic fetch request**: "Fetch the content from https://toolhive.dev and + summarize it" +2. **Multiple requests**: Make 3-4 more fetch requests with different URLs +3. **Error generation**: Try an invalid URL to generate error traces + +Each interaction creates rich telemetry data including: + +- Request traces with timing information sent to Jaeger +- Tool call details with sanitized arguments +- Performance metrics sent to Prometheus + +The CLI and Kubernetes deployments will both generate similar telemetry data, +with the Kubernetes setup including additional Kubernetes-specific attributes. + +## Step 4: Access and analyze telemetry data + +Now examine your telemetry data using Jaeger and Prometheus to understand MCP +server performance. + + + + +### Access Jaeger for traces + +Open Jaeger in your browser at `http://localhost:16686`. + +### Explore traces in Jaeger + +1. In the **Service** dropdown, select `mcp-fetch-server` +2. Click **Find Traces** to see recent traces +3. Click on individual traces to see detailed spans + +Look for traces with protocol and MCP-specific attributes like: + +```json +{ + "serviceName": "mcp-fetch-server", + "http.duration_ms": "307.8", + "http.status_code": 200, + "mcp.method": "tools/call", + "mcp.tool.name": "fetch", + "mcp.tool.arguments": "url=https://toolhive.dev", + "mcp.transport": "streamable-http", + "service.version": "v0.3.6" +} +``` + +### Access Grafana for visualization + +Open `http://localhost:3000` in your browser and log in using the default +credentials (`admin` / `admin`). + +### Import the ToolHive dashboard + +1. Click the **+** icon in the top-right of the Grafana interface and select + **Import dashboard** +2. In the **Import via dashboard JSON** model input box, paste the contents of + [this example dashboard file](https://raw.githubusercontent.com/stacklok/toolhive/main/examples/otel/grafana-dashboards/toolhive-cli-mcp-grafana-dashboard-otel-scrape.json) +3. Click **Load**, then **Import** + +Make some requests to your MCP server again and watch the dashboard update in +real-time. + +You can also explore other metrics in Grafana by creating custom panels and +queries. See the +[Observability guide](../concepts/observability.md#grafana-dashboard-queries) +for examples. + + + + +### Port-forward to Jaeger + +Access Jaeger through a port-forward: + +```bash +kubectl port-forward service/jaeger-all-in-one-query -n monitoring 16686:16686 +``` + +Open `http://localhost:16686` in your browser. + +### Explore traces in Jaeger + +1. In the **Service** dropdown, select `mcp-fetch-server` +2. Click **Find Traces** to see recent traces +3. Click on individual traces to see detailed spans + +Review the available information including MCP and Kubernetes-specific +attributes like: + +```json +{ + "serviceName": "mcp-fetch-server", + "http.duration_ms": "307.8", + "http.status_code": 200, + "mcp.method": "tools/call", + "mcp.tool.name": "fetch", + "mcp.tool.arguments": "url=https://toolhive.dev", + "mcp.transport": "streamable-http", + "k8s.deployment.name": "fetch-telemetry", + "k8s.namespace.name": "toolhive-system", + "k8s.node.name": "toolhive-control-plane", + "k8s.pod.name": "fetch-telemetry-7d7d55687c-glvpz", + "service.namespace": "toolhive-system", + "service.version": "v0.3.6" +} +``` + +### Port-forward to Grafana + +Access Grafana through a port-forward: + +```bash +kubectl port-forward service/kube-prometheus-stack-grafana -n monitoring 3000:80 +``` + +Open `http://localhost:3000` in your browser and log in using the default +credentials (`admin` / `admin`). + +### Import the ToolHive dashboard + +1. Click the **+** icon in the top-right of the Grafana interface and select + **Import dashboard** +2. In the **Import via dashboard JSON** model input box, paste the contents of + [this example dashboard file](https://raw.githubusercontent.com/stacklok/toolhive/main/examples/otel/grafana-dashboards/toolhive-mcp-grafana-dashboard-otel-scrape.json) +3. Click **Load**, then **Import** + +Make some requests to your MCP server again and watch the dashboard update in +real-time. + +You can also explore other metrics in Grafana by creating custom panels and +queries. See the +[Observability guide](../concepts/observability.md#grafana-dashboard-queries) +for examples. + + + + +## Step 5: Cleanup + +When you're finished exploring, clean up your resources. + + + + +### Stop MCP servers + +```bash +# Stop and remove the MCP server +thv rm fetch + +# Clear telemetry configuration (optional) +thv config otel unset-endpoint +thv config otel unset-metrics-enabled +thv config otel unset-tracing-enabled +thv config otel unset-sampling-rate +thv config otel unset-insecure +``` + +### Stop observability stack + +```bash +# Stop all containers +docker compose -f observability-stack.yml down + +# Remove all data (optional) +docker compose -f observability-stack.yml down -v + +# Clean up provisioning directories (optional) +rm -rf grafana/ +``` + + + + +### Remove MCP servers + +```bash +# Delete the MCP server +kubectl delete mcpserver fetch-telemetry -n toolhive-system +``` + +### Remove observability stack + +```bash +# Delete observability components +helm uninstall otel-collector -n monitoring +helm uninstall kube-prometheus-stack -n monitoring +helm uninstall jaeger-all-in-one -n monitoring + +# Remove the monitoring namespace +kubectl delete namespace monitoring +``` + +### Optional: Remove the kind cluster + +If you're completely done: + +```bash +kind delete cluster --name toolhive +``` + + + + +## What's next? + +Congratulations! You've successfully set up comprehensive observability for +ToolHive MCP workloads using Jaeger and Prometheus. + +To learn more about ToolHive's telemetry capabilities and best practices, see +the [Observability concepts guide](../concepts/observability.md). + +Here are some next steps to explore: + +- **Custom dashboards**: Create Grafana dashboards that query both Jaeger and + Prometheus +- **Alerting**: Set up Prometheus AlertManager for performance and error alerts +- **Performance optimization**: Use telemetry data to optimize MCP server + performance +- **Distributed tracing**: Understand request flows across multiple MCP servers + + + + +### CLI-specific next steps + +- **Review the CLI telemetry guide**: Explore + [detailed configuration options](../guides-cli/telemetry-and-metrics.md) +- **Scale to multiple servers**: Run multiple MCP servers with different + configurations +- **Production CLI setup**: Learn about + [secrets management](../guides-cli/secrets-management.mdx) and + [custom permissions](../guides-cli/custom-permissions.mdx) +- **Alternative backends**: Try other observability platforms mentioned in the + [CLI telemetry guide](../guides-cli/telemetry-and-metrics.md) + + + + +### Kubernetes-specific next steps + +- **Review the Kubernetes telemetry guide**: Explore + [detailed configuration options](../guides-k8s/telemetry-and-metrics.md) +- **Production deployment**: Set up production-grade + [Jaeger](https://www.jaegertracing.io/docs/deployment/) and + [Prometheus](https://prometheus.io/docs/prometheus/latest/installation/) with + persistent storage, or configure an OpenTelemetry Collector to work with your + existing observability tools +- **Advanced MCP configurations**: Explore + [Kubernetes MCP deployment patterns](../guides-k8s/run-mcp-k8s.mdx) +- **Secrets integration**: Learn about + [HashiCorp Vault integration](./vault-integration.mdx) +- **Service mesh observability**: Integrate with Istio or Linkerd for enhanced + tracing + + + + +## Related information + +- [Observability concepts](../concepts/observability.md) - Understanding + ToolHive's telemetry architecture +- [CLI telemetry guide](../guides-cli/telemetry-and-metrics.md) - Detailed CLI + configuration options +- [Kubernetes telemetry guide](../guides-k8s/telemetry-and-metrics.md) - + Kubernetes operator telemetry features +- [OpenTelemetry Collector documentation](https://opentelemetry.io/docs/collector/) - + Official OpenTelemetry Collector documentation +- [Jaeger documentation](https://www.jaegertracing.io/docs/) - Official Jaeger + documentation +- [Prometheus documentation](https://prometheus.io/docs/) - Official Prometheus + documentation + +## Troubleshooting + + + + +
+Docker containers won't start + +Check Docker daemon and container logs: + +```bash +# Verify Docker is running +docker info + +# Check container logs +docker compose -f observability-stack.yml logs jaeger +docker compose -f observability-stack.yml logs prometheus +docker compose -f observability-stack.yml logs otel-collector +``` + +Common issues: + +- Port conflicts with existing services +- Insufficient Docker memory allocation +- Missing configuration files + +
+ +
+ToolHive CLI not sending telemetry + +Verify telemetry configuration: + +```bash +# Check current config +thv config otel get-endpoint +thv config otel get-metrics-enabled + +``` + +Check the ToolHive CLI logs for telemetry export errors: + +- **macOS**: `~/Library/Application Support/toolhive/logs/fetch.log` +- **Windows**: `%LOCALAPPDATA%\toolhive\logs\fetch.log` +- **Linux**: `~/.local/share/toolhive/logs/fetch.log` + +
+ +
+No traces in Jaeger + +Check the telemetry pipeline: + +1. **Verify collector is receiving data**: `curl http://localhost:8888/metrics` +2. **Check collector logs**: `docker logs otel-collector` +3. **Verify Jaeger connectivity**: `curl http://localhost:16686/api/services` + +
+ +
+ + +
+Pods stuck in pending state + +Check cluster resources and pod events: + +```bash +# Check pod status +kubectl get pods -n monitoring + +# Describe problematic pods +kubectl describe pod -n monitoring + +# Check node resources +kubectl top nodes +``` + +Common issues: + +- Insufficient cluster resources +- Image pull failures +- Network policies blocking communication + +
+ +
+MCP server not sending telemetry + +Verify the telemetry configuration and connectivity: + +```bash +# Check MCPServer status +kubectl describe mcpserver fetch-telemetry -n toolhive-system + +# Check OpenTelemetry Collector logs +kubectl logs deployment/otel-collector -n monitoring + +# Verify service connectivity +kubectl exec -it deployment/otel-collector -n monitoring -- wget -qO- http://jaeger:16686/api/services +``` + +
+ +
+
+ +
+No metrics in Prometheus + +Common troubleshooting steps: + +1. **Verify Prometheus targets**: Check `http://localhost:9090/targets` to + ensure `otel-collector` target is UP +2. **Check collector metrics endpoint**: `curl http://localhost:8889/metrics` + (CLI) or port-forward and check in K8s +3. **Review collector configuration**: Ensure the Prometheus exporter is + properly configured +4. **Check Prometheus config**: Verify the scrape configuration includes the + collector endpoint + +
diff --git a/sidebars.ts b/sidebars.ts index 672d1ed2..b0f2e50d 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -157,7 +157,6 @@ const sidebars: SidebarsConfig = { description: 'Learn how to use ToolHive with these step-by-step tutorials.', }, - collapsed: false, items: [ { type: 'link', @@ -166,6 +165,7 @@ const sidebars: SidebarsConfig = { }, 'toolhive/tutorials/custom-registry', 'toolhive/tutorials/vault-integration', + 'toolhive/tutorials/opentelemetry', ], }, diff --git a/static/img/toolhive/otel-grafana-dark.webp b/static/img/toolhive/otel-grafana-dark.webp new file mode 100644 index 00000000..3207d4a3 Binary files /dev/null and b/static/img/toolhive/otel-grafana-dark.webp differ diff --git a/static/img/toolhive/otel-grafana-light.webp b/static/img/toolhive/otel-grafana-light.webp new file mode 100644 index 00000000..23191257 Binary files /dev/null and b/static/img/toolhive/otel-grafana-light.webp differ