# LLM vs Greedy playground

Run end-to-end games between the LLM agent (via OpenRouter) and a baseline Greedy agent.

Requirements:
- Set `OPENROUTER_API_KEY` in your environment.
- Network access enabled.

You can adjust seeds, budgets, and model names as needed.

In [2]:
import os
from azul_engine import GameEngine, LLMAgent, GreedyFillAgent, play_game


# Configure agents
llm_agent = LLMAgent(model="openai/gpt-oss-120b", provider_priority=("fireworks", "together", "novita/fp4"))
llm_agent2 = LLMAgent(model="openai/gpt-oss-20b", provider_priority=("fireworks", "together", "novita/fp4"))
greedy_agent = GreedyFillAgent()

assert os.getenv("OPENROUTER_API_KEY"), "Set OPENROUTER_API_KEY before running"

In [3]:
def play_llm_vs_greedy(seed: int = 0):
    engine = GameEngine(seed=seed)
    state = engine.reset()
    agents = [llm_agent, llm_agent2]
    turn = 0
    while not state.is_terminal():
        current = state.current_player
        agent = agents[current]
        action = agent.select_action(state)
        state = engine.step(action)
        print(state)
        print([p.score for p in state.players])
        print(current, agent)
        print(action)
        turn += 1
    return state

state = play_llm_vs_greedy(seed=0)
scores = [p.score for p in state.players]
scores

GameState(players=[PlayerBoard(pattern_lines=[[], [<TileColor.RED: 'red'>, <TileColor.RED: 'red'>], [], [], []], wall=[[False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False]], floor_line=[], has_first_player_token=False, score=0), PlayerBoard(pattern_lines=[[], [], [], [], []], wall=[[False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False]], floor_line=[], has_first_player_token=False, score=0)], current_player=1, phase=<GamePhase.DRAFTING: 'drafting'>, supply=Supply(bag=[<TileColor.YELLOW: 'yellow'>, <TileColor.BLUE: 'blue'>, <TileColor.BLUE: 'blue'>, <TileColor.BLUE: 'blue'>, <TileColor.RED: 'red'>, <TileColor.BLUE: 'blue'>, <TileColor.BLUE: 'blue'>, <TileColor.WHITE: 'white'>, <TileColor.WHITE: 'white'>, <TileColor.RED: 'red'>, <Ti

[22, 9]

In [7]:
# Inspect the last LLM reasoning/raw output after a game
llm_agent.last_reasoning, llm_agent.last_raw


(None,
 '{\n  "action_id": 0,\n  "rationale": "Taking the single red tile adds only one more penalty (-2) to the floor, minimizing point loss and removes the only red from the center, forcing the opponent to take the two whites."\n}')

In [4]:
llm_agent.last_error

In [1]:
from evals.arena import run_arena
import os 
run_arena(
  ["openai/gpt-oss-120b", "google/gemini-3-flash-preview", "x-ai/grok-4.1-fast", "openai/gpt-5-mini"],
  games_per_pair=16,
  parallel=24,
  out_dir="runs_final_new",
  providers=[["fireworks", "together"], None, None, None],
  progress=True,
  swap_sides=True
)


openai/gpt-5-mini vs openai/gpt-oss-120b | 33-27
openai/gpt-oss-120b vs openai/gpt-5-mini | 44-45


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=None error=The read operation timed out reason=missing_action_id raw=None)


openai/gpt-5-mini vs openai/gpt-oss-120b | 32-42


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=200 error=None reason=missing_action_id raw={"final{"             :      1, "rationale": "Completes row 0 (white tile) for +2 row bonus, scores highest immediate points (white tile 3, yellow tile 4) despite -2 floor penalty, net gain > other op...)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=200 error=None reason=missing_action_id raw={"final_schema{"            : 0, "rationale": "Placing the single red onto line 5 incurs no penalty and begins building the long line for row 5, whereas taking to the floor would lose points."})


openai/gpt-oss-120b vs google/gemini-3-flash-preview | 46-61
google/gemini-3-flash-preview vs openai/gpt-oss-120b | 63-24
openai/gpt-5-mini vs openai/gpt-oss-120b | 76-34
openai/gpt-oss-120b vs openai/gpt-5-mini | 62-62
google/gemini-3-flash-preview vs openai/gpt-5-mini | 54-49
openai/gpt-oss-120b vs google/gemini-3-flash-preview | 20-60
google/gemini-3-flash-preview vs openai/gpt-oss-120b | 61-27
openai/gpt-5-mini vs x-ai/grok-4.1-fast | 20-43
google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 63-39
x-ai/grok-4.1-fast vs openai/gpt-5-mini | 34-31
x-ai/grok-4.1-fast vs openai/gpt-5-mini | 34-55
openai/gpt-5-mini vs x-ai/grok-4.1-fast | 52-66
openai/gpt-5-mini vs x-ai/grok-4.1-fast | 49-57
x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 50-49
google/gemini-3-flash-preview vs openai/gpt-oss-120b | 37-56
openai/gpt-5-mini vs x-ai/grok-4.1-fast | 62-14
x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 22-37
openai/gpt-oss-120b vs google/gemini-3-flash-preview | 44-70
x-ai/grok-4.1-fast vs openai/

LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=200 error=None reason=missing_action_id raw={"final{"  	 	   		:   "action_id"    })
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=None error=IncompleteRead(429 bytes read) reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=200 error=None reason=missing_action_id raw={"final{"   : 2, "rationale": "Adds a white tile to line 3 (row 3), a legal color that advances a pattern line without incurring floor penalty, setting up future scoring."})
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=200 error=None reason=missing_action_id raw={"final{"      : {"action_id": 2, "rationale": "Place black from center onto line 1 (fills it, scores 2 points) without overflow and removes a useful tile from the center."}  })
LLMAgent fallback to random action after 2 attempts (model=openai/gp

openai/gpt-5-mini vs google/gemini-3-flash-preview | 41-55
openai/gpt-5-mini vs google/gemini-3-flash-preview | 74-61
google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 32-43
x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 17-54
openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 50-42
openai/gpt-5-mini vs google/gemini-3-flash-preview | 42-45
google/gemini-3-flash-preview vs openai/gpt-oss-120b | 41-33
openai/gpt-oss-120b vs google/gemini-3-flash-preview | 63-56


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempt

x-ai/grok-4.1-fast vs openai/gpt-5-mini | 2-10


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempt

x-ai/grok-4.1-fast vs openai/gpt-5-mini | 10-44


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempt

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 32-25


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

google/gemini-3-flash-preview vs openai/gpt-5-mini | 47-46


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 17-29


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attem

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 3-21


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action aft

x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 58-55


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attem

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 2-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action afte

openai/gpt-oss-120b vs google/gemini-3-flash-preview | 0-16


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action afte

openai/gpt-oss-120b vs openai/gpt-5-mini | 2-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempt

openai/gpt-5-mini vs x-ai/grok-4.1-fast | 2-5


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 20-20


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 at

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 15-44


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attemp

openai/gpt-oss-120b vs openai/gpt-5-mini | 6-0


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts

google/gemini-3-flash-preview vs openai/gpt-5-mini | 19-31


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 41-8


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attemp

openai/gpt-5-mini vs google/gemini-3-flash-preview | 14-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attemp

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 20-2


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

openai/gpt-5-mini vs openai/gpt-oss-120b | 2-3


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts

openai/gpt-5-mini vs openai/gpt-oss-120b | 41-2


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

openai/gpt-5-mini vs google/gemini-3-flash-preview | 15-28


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attemp

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 8-3


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action aft

google/gemini-3-flash-preview vs openai/gpt-5-mini | 2-2


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallb

openai/gpt-5-mini vs google/gemini-3-flash-preview | 5-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attem

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 12-7


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallbac

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 2-2
openai/gpt-5-mini vs openai/gpt-oss-120b | 2-1


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action a

openai/gpt-5-mini vs x-ai/grok-4.1-fast | 12-0


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action aft

openai/gpt-oss-120b vs google/gemini-3-flash-preview | 0-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 21-2
google/gemini-3-flash-preview vs openai/gpt-5-mini | 18-0
openai/gpt-oss-120b vs google/gemini-3-flash-preview | 0-5


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to rando

google/gemini-3-flash-preview vs openai/gpt-oss-120b | 6-0
google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 2-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action afte

openai/gpt-oss-120b vs openai/gpt-5-mini | 6-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

x-ai/grok-4.1-fast vs openai/gpt-5-mini | 17-10


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action aft

google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 2-0


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action afte

x-ai/grok-4.1-fast vs openai/gpt-5-mini | 10-2


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to rando

x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 3-2


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

google/gemini-3-flash-preview vs x-ai/grok-4.1-fast | 2-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attemp

openai/gpt-5-mini vs google/gemini-3-flash-preview | 2-0


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 2-7


LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action a

openai/gpt-5-mini vs openai/gpt-oss-120b | 2-0


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to rand

x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 0-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 2-2


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action a

google/gemini-3-flash-preview vs openai/gpt-oss-120b | 8-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

openai/gpt-oss-120b vs x-ai/grok-4.1-fast | 3-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after

openai/gpt-5-mini vs x-ai/grok-4.1-fast | 2-25


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

openai/gpt-oss-120b vs openai/gpt-5-mini | 2-0


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random

google/gemini-3-flash-preview vs openai/gpt-5-mini | 0-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action 

x-ai/grok-4.1-fast vs openai/gpt-oss-120b | 11-47


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action a

google/gemini-3-flash-preview vs openai/gpt-oss-120b | 0-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 att

x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 5-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempt

google/gemini-3-flash-preview vs openai/gpt-5-mini | 4-6


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 att

google/gemini-3-flash-preview vs openai/gpt-oss-120b | 12-2


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attem

openai/gpt-5-mini vs openai/gpt-oss-120b | 0-9


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 atte

openai/gpt-oss-120b vs openai/gpt-5-mini | 2-6


LLMAgent fallback to random action after 2 attempts (model=openai/gpt-5-mini status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=openai/gpt-oss-120b status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action af

openai/gpt-oss-120b vs openai/gpt-5-mini | 2-0


LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=x-ai/grok-4.1-fast status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)
LLMAgent fallback to random action after 2 attempts (model=google/gemini-3-flash-preview status=403 error={"error":{"message":"Key limit exceeded (total limit). Manage it using https://openrouter.ai/settings/keys","code":403}} reason=None raw=None)


x-ai/grok-4.1-fast vs google/gemini-3-flash-preview | 2-9


[PosixPath('runs_final_new/x-ai_grok-4.1-fast_vs_openai_gpt-5-mini_012.jsonl'),
 PosixPath('runs_final_new/google_gemini-3-flash-preview_vs_x-ai_grok-4.1-fast_005.jsonl'),
 PosixPath('runs_final_new/x-ai_grok-4.1-fast_vs_openai_gpt-5-mini_010.jsonl'),
 PosixPath('runs_final_new/google_gemini-3-flash-preview_vs_openai_gpt-5-mini_007.jsonl'),
 PosixPath('runs_final_new/x-ai_grok-4.1-fast_vs_openai_gpt-5-mini_014.jsonl'),
 PosixPath('runs_final_new/openai_gpt-oss-120b_vs_openai_gpt-5-mini_014.jsonl'),
 PosixPath('runs_final_new/openai_gpt-oss-120b_vs_google_gemini-3-flash-preview_001.jsonl'),
 PosixPath('runs_final_new/openai_gpt-oss-120b_vs_google_gemini-3-flash-preview_007.jsonl'),
 PosixPath('runs_final_new/openai_gpt-oss-120b_vs_openai_gpt-5-mini_015.jsonl'),
 PosixPath('runs_final_new/google_gemini-3-flash-preview_vs_x-ai_grok-4.1-fast_006.jsonl'),
 PosixPath('runs_final_new/x-ai_grok-4.1-fast_vs_openai_gpt-5-mini_004.jsonl'),
 PosixPath('runs_final_new/openai_gpt-oss-120b_vs_openai_

In [1]:
#!uv pip install tqdm
from tqdm import tqdm

In [6]:
  from analysis import summarize
  results, matchups = summarize("runs_test")



model_a                        model_b              total  wins_a  wins_b  draws  wr_a  wr_b
google/gemini-3-flash-preview  openai/gpt-oss-120b  6      4       2       0      0.67  0.33
google/gemini-3-flash-preview  openai/gpt-oss-20b   6      6       0       0      1.00  0.00
openai/gpt-oss-120b            openai/gpt-oss-20b   6      6       0       0      1.00  0.00
