Skip to content

feat: adds ability to optimize for cost#172

Merged
andrewklatzke merged 10 commits into
aklatzke/AIC-2263/sdk-dx-improvementsfrom
aklatzke/AIC-2465/cost-optimization
May 18, 2026
Merged

feat: adds ability to optimize for cost#172
andrewklatzke merged 10 commits into
aklatzke/AIC-2263/sdk-dx-improvementsfrom
aklatzke/AIC-2465/cost-optimization

Conversation

@andrewklatzke
Copy link
Copy Markdown
Contributor

@andrewklatzke andrewklatzke commented May 6, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Describe the solution you've provided

Implements cost optimization in the same manner as latency optimization. Searches the acceptance statement for keywords pertaining to token usage/cost (e.g. costs, pricing, bill) and adds instructions to the variation generation to try to optimize for costs. Additionally has the acceptance statement prompt return instructions for the variation generation (ie, cheaper model, etc).

Describe alternatives you've considered

This is a feature addition.

Additional context

We'll be adding UI options for both latency and cost with adjustable thresholds, but these are still valid once those arrive since a mention of cost/latency means the user is trying to optimize for it.


Note

Medium Risk
Adds new cost-gating logic and changes iteration/batch bookkeeping (baseline tracking, history trimming, token-limit handling), which can affect optimization outcomes and persisted result records. Risk is moderated by extensive new unit tests covering the new gates and edge cases.

Overview
Adds cost optimization support alongside existing latency optimization: acceptance statements are scanned for cost keywords, agent calls get per-turn estimated_cost_usd (via model pricing when available), and a new _cost_gate is applied similarly to _latency_gate, with both gates recorded as synthetic judge scores for visibility.

Improves optimization loop correctness and observability by explicitly tracking baselines (duration and cost), trimming _history to bounded windows (standard and GT), counting variation-generation tokens into the run total, stamping accumulated_token_usage into result payloads, and refining token-limit behavior (treat 0 as unlimited and evaluate pass/fail before halting on budget). Also tightens model ID prefix stripping to avoid breaking Bedrock region-style IDs and updates package metadata naming/description.

Reviewed by Cursor Bugbot for commit 4fc1ecf. Bugbot is set up for automated code reviews on this repo. Configure here.

@andrewklatzke andrewklatzke requested a review from a team as a code owner May 6, 2026 23:10
Comment thread packages/optimization/src/ldai_optimizer/client.py
Comment thread packages/optimization/src/ldai_optimizer/prompts.py
Comment thread packages/optimization/src/ldai_optimizer/prompts.py
@andrewklatzke andrewklatzke requested a review from jsonbailey May 7, 2026 22:03
**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

This is intended to demystify some of the results we're receiving from
the optimization package - namely:
- Total token counts are now accrued and reported with each result so
that we can see if a user crosses the total allowed tokens threshold
- Score results are reported for cost or latency if they're being
optimized against as an item in the `score` result so that it can be
shown on the UI
- Finally, if quality has already met the required threshold the prompt
now contains instructions to optimize only against cost (if cost is
being optimized against)

**Describe alternatives you've considered**

This is in some ways a bug fix since this information wasn't clear to
the user as to what was causing the failure. Technically additional
feature/functionality but likely required to express the required
information to make it actionable for the user.

**Additional context**

Cost and latency are only optimized for/include scores if they trigger
the keywords that would lead to them being optimized. "Base"
implementations without these features being used are unaffected.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes optimization pass/fail logic and persisted result payloads
(new gate scores, baseline handling, token-budget semantics), which
could affect when runs succeed/fail and what the UI/API receives.
> 
> **Overview**
> Improves optimization run reporting by tracking and persisting a
single `accumulated_token_usage` total across agent, judge, and
variation calls, and including it in result PATCH payloads (extending
`generationTokens` to allow `accumulated_total`).
> 
> Refactors latency/cost optimization to use explicit baseline values
(not `history[0]`), caps history growth (`_trim_history`) for both
standard and ground-truth flows, and adds synthetic
`_latency_gate`/`_cost_gate` score entries so gate failures are visible
in results.
> 
> Adjusts run control flow so pass/fail is evaluated before token-limit
checks (including GT batches and validation), and updates variation
prompting to focus purely on cost reduction when quality is already
passing; also relaxes the cost gate tolerance from 20% to 10%
improvement and expands tests accordingly.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
365fa94. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Comment thread packages/optimization/src/ldai_optimizer/client.py
f"The agent's response used {agent_usage.input} input tokens "
f"and {agent_usage.output} output tokens "
f"(estimated cost: ${current_cost:.6f}). "
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token count f-string may print None values

Low Severity

When agent_usage.input or agent_usage.output is None (which TokenUsage allows — estimate_cost explicitly guards against this), the f-string at this location would produce text like "used None input tokens and 40 output tokens" in the judge instructions. This happens because estimate_cost can return a non-None cost using only the non-None token count, but the f-string unconditionally formats both fields.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d267832. Configure here.

Comment thread packages/optimization/src/ldai_optimizer/util.py Outdated
Comment thread packages/optimization/src/ldai_optimizer/client.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f2f0894. Configure here.

Comment thread packages/optimization/src/ldai_optimizer/client.py
@andrewklatzke andrewklatzke merged commit 3b4baa3 into aklatzke/AIC-2263/sdk-dx-improvements May 18, 2026
5 checks passed
@andrewklatzke andrewklatzke deleted the aklatzke/AIC-2465/cost-optimization branch May 18, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants