AgentShark 🦈

Wireshark for AI-agent token traffic.

AgentShark analyzes AI-agent traces and tells you where your tokens went, which ones were wasted, and how to reduce cost without hurting answer quality.

python agentshark.py analyze examples/sample_trace_rag_overload.json

AgentShark v0.1
========================================
Total tokens:   6,200
Estimated cost: $0.0310

Breakdown:
  System Prompt             800 (13%)
  Conversation History    1,200 (19%)
  Retrieved Context       2,900 (47%)
  Tool Output               700 (11%)
  Final Answer              600 (10%)

Diagnosis:        RAG_OVERLOAD
Explanation:      Retrieved context exceeds 40% of total tokens.

Recommendations:
  1. Reduce RAG top_k
  2. Add reranking before sending to model
  3. Deduplicate retrieved chunks

Estimated savings: 35-50%
Quality risk:      Low

The Problem

Modern AI agents make multiple model calls, retrieve large knowledge chunks, call external tools, use long system prompts, and pass long conversation histories back into the model.

Existing observability tools tell you what happened and how much it cost.

They rarely tell you why it cost so much or what to change.

Developers often don't know:

Which part of the workflow used the most tokens?
Were those tokens actually useful?
Did retrieved documents help the final answer?
Did the agent make unnecessary model calls?
How can cost be reduced without reducing quality?

The Solution

AgentShark reads an AI-agent trace and produces a plain-English explanation:

Token breakdown — where did each token go?
Waste diagnosis — which pattern caused the cost?
Recommendations — what should you change?
Savings estimate — how much could you save?
Quality risk — what is the risk of the optimization?

AgentShark is not another tracing dashboard. It is a focused analysis layer that sits on top of your existing observability setup.

From token counting to token intelligence.

Wireshark Analogy

Wireshark	AgentShark
Understands internet traffic	Understands token traffic
Captures packets	Captures agent steps
Measures packet size	Measures token usage
Finds network bottlenecks	Finds token cost bottlenecks
Helps debug network problems	Helps debug expensive AI workflows

Quickstart

Requirements: Python 3.8+. No external dependencies for v0.1.

# Clone the repo
git clone https://github.com/yourusername/agentshark.git
cd agentshark

# Run against the sample trace
python agentshark.py analyze examples/sample_trace_rag_overload.json

Trace Format

AgentShark reads a simple JSON trace file. You can export this from your own agent or use one of the included examples.

{
  "trace_id": "trace_001",
  "session_id": "session_abc123",
  "total_tokens": 6200,
  "estimated_cost": 0.031,
  "steps": [
    { "type": "system_prompt",          "tokens": 800  },
    { "type": "conversation_history",   "tokens": 1200 },
    { "type": "retrieved_context",      "tokens": 2900, "chunk_count": 8 },
    { "type": "tool_output",            "tokens": 700  },
    { "type": "final_answer",           "tokens": 600  }
  ]
}

Supported step types:

Type	Description
`system_prompt`	Developer or operator system prompt
`conversation_history`	Prior turns passed back to the model
`retrieved_context`	RAG chunks sent as context
`tool_output`	Output from tool or API calls
`final_answer`	Model response to the user

Waste Taxonomy

AgentShark classifies token waste into named categories.

Category	Trigger	Description
`RAG_OVERLOAD`	Retrieved context > 40% of tokens	Too many chunks retrieved
`HISTORY_BLOAT`	Conversation history > 30% of tokens	Old turns bloating input
`TOOL_OUTPUT_BLOAT`	Tool output > 25% of tokens	Verbose tool responses
`PROMPT_BLOAT`	System prompt > 20% of tokens	Oversized system prompt
`MULTI_CALL_OVERHEAD`	Many LLM steps for simple request	Unnecessary model calls
`CACHE_MISS`	Repeated similar questions recomputed	No caching in place

Roadmap

v0.1 — CLI Analyzer ✅

JSON trace input
Token breakdown
Waste classification
Savings recommendations
Sample traces

v0.2 — Langfuse Connector

Import traces directly from Langfuse
Batch analysis across sessions
Markdown report export

v0.3 — Simple Dashboard

Cost by trace, client, and session
Top waste reasons
Estimated savings over time

v0.4 — More Connectors

Arize Phoenix
Helicone
OpenTelemetry

v0.5 — Quality Measurement

Before/after optimization comparison
LLM judge scoring
Cost vs quality tradeoff report

How It Fits With Existing Tools

AgentShark is not a replacement for Langfuse, Arize Phoenix, Helicone, or LangSmith.

Those tools are excellent for tracing, monitoring, and evaluation.

AgentShark adds a focused layer on top:

Langfuse / Phoenix / Helicone / Custom Logs
        ↓
AgentShark
        ↓
Token breakdown + Waste diagnosis + Recommendations

Think of it as the analysis engine that explains what your existing traces are telling you.

Example Traces

Three sample traces are included:

File	Scenario
`examples/sample_trace_simple.json`	Clean FAQ response, minimal waste
`examples/sample_trace_rag_overload.json`	RAG overload, 8 chunks retrieved
`examples/sample_trace_tool_loop.json`	Agent loop, repeated tool calls

Contributing

AgentShark is early and welcomes contributions.

Good first issues:

Add a new connector (Langfuse, Phoenix, Helicone)
Add a new waste rule
Improve the report format
Add a new sample trace
Write a test

Please open an issue before starting a large PR.

Origin

AgentShark was built because the author needed it.

Running AI chatbots in production surfaces a problem quickly: you know what your LLM bill is, but you do not know why. Existing tools show the total. AgentShark shows the breakdown, names the waste, and tells you what to fix.

Built by a finance and AI builder who wanted clearer visibility into AI token costs.

License

MIT

Status

AgentShark v0.1 is a working CLI analyzer. It is early, focused, and intentionally small.

The goal is to be useful immediately — not to be complete eventually.

AgentShark. See where your tokens go.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
LICENSE		LICENSE
README.md		README.md
agentshark.py		agentshark.py
rules.py		rules.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentShark 🦈

The Problem

The Solution

Wireshark Analogy

Quickstart

Trace Format

Waste Taxonomy

Roadmap

v0.1 — CLI Analyzer ✅

v0.2 — Langfuse Connector

v0.3 — Simple Dashboard

v0.4 — More Connectors

v0.5 — Quality Measurement

How It Fits With Existing Tools

Example Traces

Contributing

Origin

Built by a finance and AI builder who wanted clearer visibility into AI token costs.

License

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentShark 🦈

The Problem

The Solution

Wireshark Analogy

Quickstart

Trace Format

Waste Taxonomy

Roadmap

v0.1 — CLI Analyzer ✅

v0.2 — Langfuse Connector

v0.3 — Simple Dashboard

v0.4 — More Connectors

v0.5 — Quality Measurement

How It Fits With Existing Tools

Example Traces

Contributing

Origin

Built by a finance and AI builder who wanted clearer visibility into AI token costs.

License

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages