Conversation
…in agentic analysis Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e0315538-7b99-4c78-9bf3-2dd647e93fdf Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes inflated writeCount values used in agentic analysis by removing double-counting between createdItems and SafeItemsCount, improving downstream ActuationStyle / ResourceProfile classification and assessment evidence.
Changes:
- Update
buildBehaviorFingerprintto derivewriteCountfromSafeItemsCount, falling back tolen(createdItems)only whenSafeItemsCountis zero. - Update
buildAgenticAssessmentsto use the samewriteCountderivation, preventing over-reporting ofwrite_actionsevidence.
Comments suppressed due to low confidence (1)
pkg/cli/audit_agentic_analysis.go:232
- Same writeCount fallback logic is repeated here (also present in buildBehaviorFingerprint). To reduce drift risk, consider using a shared helper so both the fingerprint thresholds and assessment evidence always use the exact same write action count heuristic.
writeCount := processedRun.Run.SafeItemsCount
if writeCount == 0 {
writeCount = len(createdItems)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| writeCount := processedRun.Run.SafeItemsCount | ||
| if writeCount == 0 { | ||
| writeCount = len(createdItems) | ||
| } |
There was a problem hiding this comment.
The writeCount “SafeItemsCount with fallback to len(createdItems)” logic is now duplicated here and again in buildAgenticAssessments. Consider extracting a small helper (e.g., deriveWriteCount(processedRun.Run.SafeItemsCount, createdItems)) so the classification logic stays consistent if this heuristic changes (and so other call sites like computeAgenticFraction can reuse the same definition of write actions).
This issue also appears on line 229 of the same file.
See below for a potential fix:
func deriveWriteCount(safeItemsCount int, createdItems []CreatedItemReport) int {
if safeItemsCount > 0 {
return safeItemsCount
}
return len(createdItems)
}
func buildBehaviorFingerprint(processedRun ProcessedRun, metrics MetricsData, toolUsage []ToolUsageInfo, createdItems []CreatedItemReport, awContext *AwContext) *BehaviorFingerprint {
toolTypes := len(toolUsage)
writeCount := deriveWriteCount(processedRun.Run.SafeItemsCount, createdItems)
| writeCount := processedRun.Run.SafeItemsCount | ||
| if writeCount == 0 { | ||
| writeCount = len(createdItems) | ||
| } |
There was a problem hiding this comment.
There’s no unit test asserting the intended behavior change (avoid double-counting when both createdItems and SafeItemsCount are populated). Adding a focused test that covers cases like (SafeItemsCount>0, createdItems non-empty) and (SafeItemsCount==0, createdItems non-empty) would help prevent regressions in ActuationStyle/ResourceProfile classification and assessment evidence.
|
writeCountin bothbuildBehaviorFingerprintandbuildAgenticAssessmentswas computed aslen(createdItems) + SafeItemsCount— but both fields derive from the sameextractCreatedItemsFromManifestcall, so the same signal was summed twice. This inflatedwriteCount, misclassifyingActuationStyle/ResourceProfileand bloatingwrite_actionsevidence inresource_heavy_for_domainassessments.Changes
buildBehaviorFingerprintandbuildAgenticAssessments: derivewriteCountfrom a single source —SafeItemsCountas primary (canonical persisted field), falling back tolen(createdItems)only whenSafeItemsCountis zero