Skip to content

intai2026/AgentShark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentShark 🦈

Wireshark for AI-agent token traffic.

AgentShark analyzes AI-agent traces and tells you where your tokens went, which ones were wasted, and how to reduce cost without hurting answer quality.

python agentshark.py analyze examples/sample_trace_rag_overload.json
AgentShark v0.1
========================================
Total tokens:   6,200
Estimated cost: $0.0310

Breakdown:
  System Prompt             800 (13%)
  Conversation History    1,200 (19%)
  Retrieved Context       2,900 (47%)
  Tool Output               700 (11%)
  Final Answer              600 (10%)

Diagnosis:        RAG_OVERLOAD
Explanation:      Retrieved context exceeds 40% of total tokens.

Recommendations:
  1. Reduce RAG top_k
  2. Add reranking before sending to model
  3. Deduplicate retrieved chunks

Estimated savings: 35-50%
Quality risk:      Low

The Problem

Modern AI agents make multiple model calls, retrieve large knowledge chunks, call external tools, use long system prompts, and pass long conversation histories back into the model.

Existing observability tools tell you what happened and how much it cost.

They rarely tell you why it cost so much or what to change.

Developers often don't know:

  • Which part of the workflow used the most tokens?
  • Were those tokens actually useful?
  • Did retrieved documents help the final answer?
  • Did the agent make unnecessary model calls?
  • How can cost be reduced without reducing quality?

The Solution

AgentShark reads an AI-agent trace and produces a plain-English explanation:

  • Token breakdown — where did each token go?
  • Waste diagnosis — which pattern caused the cost?
  • Recommendations — what should you change?
  • Savings estimate — how much could you save?
  • Quality risk — what is the risk of the optimization?

AgentShark is not another tracing dashboard. It is a focused analysis layer that sits on top of your existing observability setup.

From token counting to token intelligence.


Wireshark Analogy

Wireshark AgentShark
Understands internet traffic Understands token traffic
Captures packets Captures agent steps
Measures packet size Measures token usage
Finds network bottlenecks Finds token cost bottlenecks
Helps debug network problems Helps debug expensive AI workflows

Quickstart

Requirements: Python 3.8+. No external dependencies for v0.1.

# Clone the repo
git clone https://github.com/yourusername/agentshark.git
cd agentshark

# Run against the sample trace
python agentshark.py analyze examples/sample_trace_rag_overload.json

Trace Format

AgentShark reads a simple JSON trace file. You can export this from your own agent or use one of the included examples.

{
  "trace_id": "trace_001",
  "session_id": "session_abc123",
  "total_tokens": 6200,
  "estimated_cost": 0.031,
  "steps": [
    { "type": "system_prompt",          "tokens": 800  },
    { "type": "conversation_history",   "tokens": 1200 },
    { "type": "retrieved_context",      "tokens": 2900, "chunk_count": 8 },
    { "type": "tool_output",            "tokens": 700  },
    { "type": "final_answer",           "tokens": 600  }
  ]
}

Supported step types:

Type Description
system_prompt Developer or operator system prompt
conversation_history Prior turns passed back to the model
retrieved_context RAG chunks sent as context
tool_output Output from tool or API calls
final_answer Model response to the user

Waste Taxonomy

AgentShark classifies token waste into named categories.

Category Trigger Description
RAG_OVERLOAD Retrieved context > 40% of tokens Too many chunks retrieved
HISTORY_BLOAT Conversation history > 30% of tokens Old turns bloating input
TOOL_OUTPUT_BLOAT Tool output > 25% of tokens Verbose tool responses
PROMPT_BLOAT System prompt > 20% of tokens Oversized system prompt
MULTI_CALL_OVERHEAD Many LLM steps for simple request Unnecessary model calls
CACHE_MISS Repeated similar questions recomputed No caching in place

Roadmap

v0.1 — CLI Analyzer ✅

  • JSON trace input
  • Token breakdown
  • Waste classification
  • Savings recommendations
  • Sample traces

v0.2 — Langfuse Connector

  • Import traces directly from Langfuse
  • Batch analysis across sessions
  • Markdown report export

v0.3 — Simple Dashboard

  • Cost by trace, client, and session
  • Top waste reasons
  • Estimated savings over time

v0.4 — More Connectors

  • Arize Phoenix
  • Helicone
  • OpenTelemetry

v0.5 — Quality Measurement

  • Before/after optimization comparison
  • LLM judge scoring
  • Cost vs quality tradeoff report

How It Fits With Existing Tools

AgentShark is not a replacement for Langfuse, Arize Phoenix, Helicone, or LangSmith.

Those tools are excellent for tracing, monitoring, and evaluation.

AgentShark adds a focused layer on top:

Langfuse / Phoenix / Helicone / Custom Logs
        ↓
AgentShark
        ↓
Token breakdown + Waste diagnosis + Recommendations

Think of it as the analysis engine that explains what your existing traces are telling you.


Example Traces

Three sample traces are included:

File Scenario
examples/sample_trace_simple.json Clean FAQ response, minimal waste
examples/sample_trace_rag_overload.json RAG overload, 8 chunks retrieved
examples/sample_trace_tool_loop.json Agent loop, repeated tool calls

Contributing

AgentShark is early and welcomes contributions.

Good first issues:

  • Add a new connector (Langfuse, Phoenix, Helicone)
  • Add a new waste rule
  • Improve the report format
  • Add a new sample trace
  • Write a test

Please open an issue before starting a large PR.


Origin

AgentShark was built because the author needed it.

Running AI chatbots in production surfaces a problem quickly: you know what your LLM bill is, but you do not know why. Existing tools show the total. AgentShark shows the breakdown, names the waste, and tells you what to fix.

Built by a finance and AI builder who wanted clearer visibility into AI token costs.

License

MIT


Status

AgentShark v0.1 is a working CLI analyzer. It is early, focused, and intentionally small.

The goal is to be useful immediately — not to be complete eventually.


AgentShark. See where your tokens go.

About

Wireshark for AI-agent token traffic

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages