Skip to content

Fix: firewall proxy cap inflating blocked-request severity to "high"#28890

Merged
pelikhan merged 1 commit intomainfrom
copilot/investigate-blocked-requests-report
Apr 28, 2026
Merged

Fix: firewall proxy cap inflating blocked-request severity to "high"#28890
pelikhan merged 1 commit intomainfrom
copilot/investigate-blocked-requests-report

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

The Squid firewall proxy truncates blocked_requests at 50. Because the network insight's severity = "high" upgrade fired on blocked >= 10, any run hitting the cap was guaranteed high severity regardless of actual block rate, inflating risk scores across the fleet.

Changes

  • EpisodeData: new BlockedRequestAtCap bool field (blocked_request_at_cap,omitempty); set when any contributing run's FirewallAnalysis.BlockedRequests >= firewallBlockedRequestCap (50).
  • workflowObservabilityStats: new blockedAtCap bool field, propagated during firewall accumulation in buildLogsObservabilityInsights.
  • Severity logic (both single-run buildAuditObservabilityInsights and multi-run buildLogsObservabilityInsights): the absolute-count path (blocked >= 10 → "high") is gated on !blockedAtCap. The rate-based path (>= 50% block rate) is unchanged — a genuinely high block rate still produces "high" even when at cap.
blockedAtCap := processedRun.FirewallAnalysis.BlockedRequests >= firewallBlockedRequestCap
if blockedRate >= 0.5 || (processedRun.FirewallAnalysis.BlockedRequests >= 10 && !blockedAtCap) {
    severity = "high"
}
  • 5 new unit tests: cap detection (positive/negative), audit severity suppressed at cap, audit high severity preserved at high block rate, logs hotspot severity suppressed at cap.

…count is at cap

Add `firewallBlockedRequestCap = 50` constant documenting the Squid proxy
truncation limit. When any contributing run's `FirewallAnalysis.BlockedRequests`
equals or exceeds this value, `EpisodeData.BlockedRequestAtCap` is set to true
so consumers can distinguish a real high count from a proxy-truncated sentinel.

In both `buildAuditObservabilityInsights` (single run) and
`buildLogsObservabilityInsights` (multi-run hotspot), the absolute-count upgrade
path (`blocked >= 10 → severity = "high"`) is suppressed when the count is at
cap. The rate-based upgrade path (`>= 50% block rate`) is unaffected, so a
genuinely high block rate still surfaces as high severity even at cap.

Adds 5 new unit tests covering cap detection, below-cap false-negative, and
both severity paths for audit and logs insights.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f5658b1d-9be4-445c-908c-ab434c633611

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review April 28, 2026 09:36
Copilot AI review requested due to automatic review settings April 28, 2026 09:36
@pelikhan pelikhan merged commit 88d1f1e into main Apr 28, 2026
32 checks passed
@pelikhan pelikhan deleted the copilot/investigate-blocked-requests-report branch April 28, 2026 09:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts network observability severity so Squid firewall proxy truncation at 50 blocked requests no longer automatically inflates insights to "high" based on absolute blocked counts.

Changes:

  • Adds “at-cap” tracking (blockedAtCap / BlockedRequestAtCap) when FirewallAnalysis.BlockedRequests >= 50.
  • Gates the "high" severity absolute-count trigger (blocked >= 10) on !blockedAtCap in both audit and logs insights; keeps the rate-based "high" path unchanged.
  • Adds unit tests covering cap detection and the updated severity behavior.
Show a summary per file
File Description
pkg/cli/observability_insights.go Tracks firewall cap and gates absolute-count escalation for network severity in audit and aggregated logs insights.
pkg/cli/observability_insights_test.go Adds tests ensuring severity is suppressed at cap (unless block rate is high).
pkg/cli/logs_episode.go Introduces firewallBlockedRequestCap constant and records EpisodeData.BlockedRequestAtCap.
pkg/cli/logs_episode_test.go Adds tests validating episode cap detection behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/cli/observability_insights.go:287

  • The new blockedAtCap gating fixes severity inflation, but the emitted hotspot insight still reports blocked as an exact count in Summary/Evidence even when blockedAtCap is true (the underlying per-run counts may have been truncated at the proxy cap). Consider adjusting the hotspot message to reflect that the blocked total is a lower bound when blockedAtCap is set.
		if networkRate >= 0.5 || (networkHotspot.blocked >= 10 && !networkHotspot.blockedAtCap) {
			severity = "high"
		}
		insights = append(insights, ObservabilityInsight{
			Category: "network",
  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment thread pkg/cli/logs_episode.go
Comment on lines +17 to +19
// firewallBlockedRequestCap is the maximum blocked-request count the Squid firewall
// proxy reports before truncating. A BlockedRequestCount equal to this value may
// represent any number of requests >= 50 and should not be treated as an exact figure.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for firewallBlockedRequestCap refers to BlockedRequestCount, but this cap is about the per-run firewall metric (FirewallAnalysis.BlockedRequests) being truncated by Squid. As written it can be confused with the aggregated episode field EpisodeData.BlockedRequestCount (which can exceed 50 across runs). Consider rewording to explicitly reference FirewallAnalysis.BlockedRequests (and that 50 means “>= 50”, not an exact count).

Suggested change
// firewallBlockedRequestCap is the maximum blocked-request count the Squid firewall
// proxy reports before truncating. A BlockedRequestCount equal to this value may
// represent any number of requests >= 50 and should not be treated as an exact figure.
// firewallBlockedRequestCap is the maximum per-run blocked-request count that the
// Squid firewall proxy reports in FirewallAnalysis.BlockedRequests before truncating.
// A FirewallAnalysis.BlockedRequests value equal to this cap means ">= 50", not an
// exact count of 50.

Copilot uses AI. Check for mistakes.
Comment on lines 109 to 115
summary := fmt.Sprintf("The firewall observed %d request(s) with %d blocked, for a %.0f%% block rate.", processedRun.FirewallAnalysis.TotalRequests, processedRun.FirewallAnalysis.BlockedRequests, blockedRate*100)
if processedRun.FirewallAnalysis.BlockedRequests > 0 {
title = "Network friction detected"
severity = "medium"
if blockedRate >= 0.5 || processedRun.FirewallAnalysis.BlockedRequests >= 10 {
blockedAtCap := processedRun.FirewallAnalysis.BlockedRequests >= firewallBlockedRequestCap
if blockedRate >= 0.5 || (processedRun.FirewallAnalysis.BlockedRequests >= 10 && !blockedAtCap) {
severity = "high"
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When blockedAtCap is true, the blocked count is a lower bound (Squid truncates at the cap), but the insight Summary/Evidence still presents the blocked count as an exact number (e.g., “50 blocked”). Consider adjusting the wording when at cap (e.g., “>= 50 blocked” and/or appending a note that the count may be truncated) to avoid misleading output.

This issue also appears on line 283 of the same file.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Excellent test quality

Metric Value
New/modified tests analyzed 5
✅ Design tests (behavioral contracts) 5 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 5 (100%)
Duplicate test clusters 0
Test inflation detected Yes (62 test lines vs 9 production lines)
🚨 Coding-guideline violations None

Test Classification Details

Test File Classification Issues Detected
TestBuildEpisodeDataSetsBlockedAtCapWhenFirewallCountHitsCap pkg/cli/logs_episode_test.go ✅ Design None
TestBuildEpisodeDataDoesNotSetBlockedAtCapBelowThreshold pkg/cli/logs_episode_test.go ✅ Design None
TestBuildAuditObservabilityInsightsSuppressesHighSeverityAtFirewallCap pkg/cli/observability_insights_test.go ✅ Design None
TestBuildAuditObservabilityInsightsHighSeverityWhenHighBlockRate pkg/cli/observability_insights_test.go ✅ Design None
TestBuildLogsObservabilityInsightsSuppressesHighSeverityAtFirewallCap pkg/cli/observability_insights_test.go ✅ Design None

Flagged Tests — Requires Review

No tests flagged. All new tests demonstrate strong behavioral coverage.


Test Inflation Note

The test files added 138 lines total (62 + 76) versus 16 production lines (9 + 7), a ratio well above 2:1. However, this is expected and appropriate here — the fix targets nuanced boundary conditions in the severity-classification logic that require several explicit scenario tests to cover the cap/rate matrix. The inflation penalty (−10 pts) is applied but does not indicate a quality problem in this case.


Language Support

Tests analyzed:


Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 5 tests verify observable behavioral contracts — BlockedRequestAtCap flag semantics and severity output under different cap/block-rate combinations. Build tags present, no mock libraries, assertion messages included throughout.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25045413707

🧪 Test quality analysis by Test Quality Sentinel · ● 384.9K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 5 new tests verify observable behavioral contracts with proper build tags, assertion messages, and no mock libraries.

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (154 new lines in pkg/) but does not have a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help you get started:

📄 Draft ADR: docs/adr/28890-suppress-firewall-severity-inflation-at-proxy-cap.md

What to do next

  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — add context the AI couldn't infer, refine the decision rationale, and list real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-28890: Suppress Firewall Severity Inflation at Proxy Cap

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.


📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 28890-suppress-firewall-severity-inflation-at-proxy-cap.md for PR #28890).

🔒 This PR cannot merge until an ADR is linked in the PR body.

References: §25045413687

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 118.2K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants