Skip to content

feat: Budget skill metadata and surface trimming as a warning#18298

Merged
xl-openai merged 10 commits intomainfrom
xl/skills
Apr 18, 2026
Merged

feat: Budget skill metadata and surface trimming as a warning#18298
xl-openai merged 10 commits intomainfrom
xl/skills

Conversation

@xl-openai
Copy link
Copy Markdown
Collaborator

Cap the model-visible skills section to a small share of the context window, with a fallback character budget, and keep only as many implicit skills as fit within that budget.

Emit a non-fatal warning when enabled skills are omitted, and add a new app-server warning notification

Record thread-start skill metrics for total enabled skills, kept skills, and whether truncation happened

Comment thread codex-rs/core/src/codex.rs Outdated
)
})
.or_else(|| implicit_skills.is_empty().then_some((0, 0, false)));
emit_thread_start_skill_metrics(&self.services.session_telemetry, skill_counts);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move into rendered_skills 1 sec.

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce11cd4e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +74 to 76
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Surface skill-list truncation inside the rendered skills section

When entries are dropped by budget, render_skills_section silently omits them and leaves ### Available skills without any truncation notice. That makes the model treat the list as complete and miss that additional enabled skills exist. The new behavior should include an explicit in-prompt trim warning whenever report.omitted_count > 0.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes any difference. Model can't do anything here.

Comment on lines +74 to 76
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit a trim warning inside the skills prompt

When report.omitted_count > 0, the rendered ### Available skills list is shortened but no in-prompt notice is added. The model sees an apparently complete list and cannot infer missing enabled skills, which breaks discoverability of trimmed skills.

Useful? React with 👍 / 👎.

Comment on lines +69 to 78
let (skill_lines, report) = render_skill_lines(skills, budget);
let mut lines: Vec<String> = Vec::new();
lines.push("## Skills".to_string());
lines.push("A skill is a set of local instructions to follow that is stored in a `SKILL.md` file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.".to_string());
lines.push("### Available skills".to_string());

for skill in skills {
let path_str = skill.path_to_skills_md.to_string_lossy().replace('\\', "/");
let name = skill.name.as_str();
let description = skill.description.as_str();
lines.push(format!("- {name}: {description} (file: {path_str})"));
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}

lines.push("### How to use skills".to_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count fixed skills-section text in the metadata budget

The budget is only applied to per-skill lines, while the long static header/how-to text is always emitted. On small context windows this can exceed the intended context share even when zero skills are included, so the section is not actually capped by the configured budget.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current budget intentionally applies only to the per-skill metadata entries

Comment on lines +149 to +154
fn prompt_scope_rank(scope: SkillScope) -> u8 {
match scope {
SkillScope::System => 0,
SkillScope::Admin => 1,
SkillScope::Repo => 2,
SkillScope::User => 3,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep repo/user skills ahead of system/admin in trimming

Budget ordering now prioritizes System/Admin before Repo/User. Existing skill precedence treats repo-local skills as highest priority, so truncation can drop the most task-relevant repo skills first and keep less relevant global ones.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok.

Comment on lines +74 to 76
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit a trim warning inside the skills prompt

When skill entries are omitted by budget, the rendered ### Available skills section is silently shortened. The model sees an apparently complete list and cannot infer that more enabled skills exist, which undermines discoverability of trimmed skills.

Useful? React with 👍 / 👎.

Comment on lines +69 to 78
let (skill_lines, report) = render_skill_lines(skills, budget);
let mut lines: Vec<String> = Vec::new();
lines.push("## Skills".to_string());
lines.push("A skill is a set of local instructions to follow that is stored in a `SKILL.md` file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.".to_string());
lines.push("### Available skills".to_string());

for skill in skills {
let path_str = skill.path_to_skills_md.to_string_lossy().replace('\\', "/");
let name = skill.name.as_str();
let description = skill.description.as_str();
lines.push(format!("- {name}: {description} (file: {path_str})"));
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}

lines.push("### How to use skills".to_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count fixed skills text against metadata budget

Budgeting is applied only to per-skill lines. The long static header/how-to text is always emitted, so small context windows can still get a large skills block even when zero skills fit. This violates the intended cap for model-visible skill metadata.

Useful? React with 👍 / 👎.

Comment on lines +149 to +154
fn prompt_scope_rank(scope: SkillScope) -> u8 {
match scope {
SkillScope::System => 0,
SkillScope::Admin => 1,
SkillScope::Repo => 2,
SkillScope::User => 3,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prioritize repo/user skills before system/admin when trimming

Trimming order now prefers System/Admin skills first. Existing skill precedence elsewhere is repo/user-first, so budget pressure can drop the most task-relevant local skills while keeping less relevant global ones.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/codex_tests.rs Outdated
Comment on lines +4711 to +4714
explicit_invocation_only: vec![],
allowlisted: vec![],
denylisted: vec![],
activated: vec![
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Construct SkillLoadOutcome with current fields in test

This test initializes SkillLoadOutcome with fields like explicit_invocation_only and activated, which do not match the struct shape used by core-skills (skills, errors, disabled_paths, ...). The test target will not compile as written.

Useful? React with 👍 / 👎.

Comment on lines +6570 to +6574
pub struct WarningNotification {
/// Concise warning message for the user.
pub message: String,
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add thread_id and make the warning thread-scoped? otherwise the notification isnt bound to a single thread and would show up globally regardless of what the active thread is

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it be tied to a specific thread? The backend should decide when to send it, but the UI should just render it regardless?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there's a notion of global vs thread server notification targets here. if that distinction is not relevant for how we plan to render it in app-server clients, global is fine

Copy link
Copy Markdown
Collaborator Author

@xl-openai xl-openai Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make thread_id optional. In general, WarningNotification can be global, so tying it to a thread should be an option rather than a requirement. For the current case it does apply.

Comment thread codex-rs/tui/src/chatwidget.rs Outdated
Comment thread codex-rs/core/src/codex/session.rs Outdated
pub(crate) guardian_review_session: GuardianReviewSessionManager,
pub(crate) services: SessionServices,
pub(super) js_repl: Arc<JsReplHandle>,
pub(super) thread_start_skill_reported: AtomicBool,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont like tracking this in top-level Session. IIUC this is necessary because we can render skills multiple times in a thread (compaction), but we dont want to emit the warning multiple times.

i cant think of a better alternative though, other than making build_initial_context take or return a bool for the same reason (which also sucks).

@aibrahim-oai any ideas?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it for now.

pub const SKILL_METADATA_CONTEXT_WINDOW_PERCENT: usize = 2;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum SkillMetadataBudget {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does this need to be pub?

Copy link
Copy Markdown
Collaborator Author

@xl-openai xl-openai Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one need to be used in core but I clean up other ones.

Copy link
Copy Markdown
Collaborator

@sayan-oai sayan-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the changes to how we render skills make sense. my main concern is how we track warning state.

also should have an integration test

xl-openai and others added 2 commits April 17, 2026 11:12
Co-authored-by: Codex <noreply@openai.com>
Comment thread codex-rs/core/src/session/tests.rs
@jgershen-oai
Copy link
Copy Markdown
Collaborator

@codex review

Comment thread codex-rs/app-server/README.md Outdated

Recoverable configuration and initialization warnings use the existing `configWarning` notification: `{ summary, details?, path?, range? }`. App-server may emit it during initialization for config parsing and related setup diagnostics.

Generic runtime warnings use the `warning` notification: `{ message }`. App-server emits this for non-fatal warnings from the core event stream, including cases where not all enabled skills are included in the model-visible skills list for a session.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to show optional threadid

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a514dd4cd4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 96 to 104
let mut lines: Vec<String> = Vec::new();
lines.push("## Skills".to_string());
lines.push("A skill is a set of local instructions to follow that is stored in a `SKILL.md` file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.".to_string());
lines.push("### Available skills".to_string());

for skill in skills {
let path_str = skill.path_to_skills_md.to_string_lossy().replace('\\', "/");
let name = skill.name.as_str();
let description = skill.description.as_str();
lines.push(format!("- {name}: {description} (file: {path_str})"));
if !skill_lines.is_empty() {
lines.extend(skill_lines);
}

lines.push("### How to use skills".to_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Budget the full skills section, not only skill lines

render_skill_lines enforces budget only for per-skill bullets, but render_skills_section always appends the long static skills instructions afterward. With very small context windows, this still injects a large fixed block, so the model-visible skills section is not actually capped to the configured budget and can still consume substantial prompt space.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the design.

Comment on lines +2425 to +2435
SkillRenderSideEffects::ThreadStart {
session_telemetry: &self.services.session_telemetry,
},
);
if let Some(rendered_skills) = rendered_skills {
if rendered_skills.emit_warning {
self.send_event_raw(Event {
id: String::new(),
msg: EventMsg::Warning(WarningEvent {
message: THREAD_START_SKILLS_TRIMMED_WARNING_MESSAGE.to_string(),
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate thread-start side effects to true thread start

build_initial_context always passes SkillRenderSideEffects::ThreadStart and emits the trimmed-skills warning on truncation. This helper is reused outside initial thread setup (e.g., context reinjection/compaction), so "thread-start" metrics and warnings can be emitted multiple times in one thread, skewing telemetry and producing repeated warnings mid-session.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consequence of not tracking metric emission state in Session, which i still think is worth it. should follow up with an improvement to this

Comment on lines +325 to +326
warning.message,
"Some enabled skills were not included in the model-visible skills list for this session. Mention a skill by name or path if you need it."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; this uses "warning emitted" as a proxy for "skills trimmed". can we check that the skills are actually trimmed/omitted from the outgoing request?

you can extend the app-server integration test or use test_codex, whichever is clearer.

@xl-openai xl-openai merged commit 3f7222e into main Apr 18, 2026
25 checks passed
@xl-openai xl-openai deleted the xl/skills branch April 18, 2026 01:11
@github-actions github-actions bot locked and limited conversation to collaborators Apr 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants