Skip to content

nicolevanderhoeven/asimov

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asimov's Zeroth Law of Robotics: Observability for AI

Author: Nicole van der Hoeven (Mastodon)

This is a repository for the slides and code for the talk "Asimov's Zeroth Law of Robotics: Observability for AI" presented at:

This repository consists of:

  • A D&D-based AI game. Its main logic is in two_player_dnd.py, and play.py is the Flask wrapper for it.
  • The OTel configuration, including authentication, in otel-config.yml
  • The Docker compose configuration to run the OTel Collector, in docker-compose.yml.
  • A k6 test to run against the AI app, in test.js.
  • A custom logging framework, in loggingfw.py.
  • A CLI wrapper for the game, in cli_play.py.
  • (new) A k6 test that uses AI to test the AI app, in test-ai.js.

A diagram of the architecture of the AI app, showing the D&D game, the OpenTelemetry Collector, and Grafana Cloud

Setup

Telemetry is now Sigil-only. The Sigil SDK handles both normalized generation export and gen_ai.* OTel metrics/traces, so OpenLIT is no longer needed. Two small modules wire this up:

  • sigil_setup.py — singleton Sigil client + LangChain callback helper (generations).
  • otel_setup.py — bootstraps the global OTel TracerProvider + MeterProvider with OTLP/HTTP exporters. Sigil's histograms (gen_ai.client.operation.duration, gen_ai.client.token.usage, gen_ai.client.time_to_first_token, gen_ai.client.tool_calls_per_operation) and spans flow through these.

Steps

  1. Create a free Grafana Cloud account (or use an existing stack) and enable the Sigil app on that stack.
  2. Install dependencies: pip install -r requirements.txt.
  3. Copy env.example to .env: cp env.example .env.
  4. Fill in .env:
    • ANTHROPIC_API_KEY — your Anthropic API key.
    • OPENAI_API_KEY — your OpenAI API key (if using OpenAI models).
    • OTel (metrics + traces):
      • OTLP_ENDPOINT — your stack's OTLP gateway, e.g. https://otlp-gateway-prod-us-central-0.grafana.net/otlp.
      • OTLP_HEADERS — base64-encoded "<instance_id>:<otlp_write_token>". The app prefixes Basic automatically.
    • Sigil (generations):
      • GRAFANA_CLOUD_SIGIL_ENDPOINT — e.g. https://sigil-prod-us-central-0.grafana.net/api/v1/generations:export.
      • GRAFANA_CLOUD_INSTANCE_ID — your Grafana Cloud instance ID (or set GRAFANA_CLOUD_INSTANCE as an alias).
      • GRAFANA_CLOUD_API_KEY — a Grafana Cloud API key with Sigil-write scope.
    • Optional:
      • ASIMOV_AGENT_VERSION — explicit agent version. Defaults to git-<short-sha>, falling back to 1.0.0.
      • SSL_CERT_FILE — path to your CA bundle if your Python install lacks trust roots (common on python.org macOS builds). certifi/cacert.pem works.
  5. In the Sigil app, link grafanacloud-<stack>-prom as the Prometheus datasource and your stack's Tempo as the traces datasource. Without this, conversations will appear but rollup panels stay empty.
  6. Install k6 by following the instructions here if you want to run load tests.

What you get in the Sigil app

  • Conversations — every LLM call, grouped by conversation_id, with full inputs/outputs, tagged by sigil.component (game_setup, dialogue, classifier, gm_qa, storyteller_single, storyteller_scenario).
  • Rollup metrics — requests, error rate, p50/p95 latency, token consumption, tool calls per operation (from Sigil's gen_ai.client.* histograms).
  • Traces — one span per LLM call, with gen_ai.* semantic-convention attributes.
  • DAG placeholder — the scenario runner emits sigil.run.id on classifier generations and sigil.run.parent_ids on downstream gm_qa / storyteller_scenario generations, so multi-agent links can be rendered once Sigil exposes native DAG support.
  • Request paramsgen_ai.request.temperature and gen_ai.request.max_tokens appear on each generation (read from LangChain's invocation params by sigil-sdk-langchain).

Time to first token (TTFT)

TTFT only populates for streaming calls. This app uses .invoke() (non-streaming), so TTFT panels stay empty. Switch specific call sites to .stream() if you want TTFT coverage.

Usage

To replicate my setup as I demonstrate in the talk:

  1. Start the Docker daemon and deploy the OTel Collector by running: docker compose up -d.
  2. Run the D&D app by running: python play.py. Alternatively, you can run the CLI version of the game by running python cli_play.py.
  3. Interact with the game.

If you're using the Flask app: - You can start the game by sending this to the command line: curl -X GET http://localhost:5050/. - You can respond to the game by sending a POST request with your input, like this:

 curl -X POST http://localhost:5050/play \
     -H "Content-Type: application/json" \
     -d '{"message": "I scan the ship for life signs."}'

If you're using the CLI version, type your input directly into the terminal after the welcome message. Type exit or quit to end the game.

  1. Monitor your app using the GenAI Observability dashboard as well as the Drilldown Logs/Metrics/Traces features in Grafana.
  2. Run the k6 test using k6 run test.js.

Resources

Slides

You can find the slides here.

References

Asimov, I. (1942). Runaround. In I, Robot (pp. 1-42). Gnome Press.

Pictures in presentation:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors