Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/79f6e4e0-65c5-4b39-9685-7a75154546cc Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Expands the homepage “Guardrails Built-In” section to give non-security readers a clearer mental model of GitHub Agentic Workflows’ layered security design.
Changes:
- Rewrites the security section into five plain-English subsections (tokens, secrets isolation, firewalling, safe outputs, threat detection).
- Replaces the minimal 3-node diagram with a fuller Mermaid flow showing sandbox → artifact → detection gate → write job or block.
- Updates the agent factory status table (DeepReport engine + adds Smoke Service Ports workflow entry).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/src/content/docs/index.mdx | Replaces dense paragraph with layered security explanation + expanded Mermaid flow diagram. |
| docs/src/content/docs/agent-factory-status.mdx | Updates/extends the workflow status table entries (DeepReport engine + new smoke workflow). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ### Zero secrets in the agent | ||
|
|
||
| The agent process never receives write tokens, API keys, or other sensitive credentials. Those secrets exist only in separate, isolated jobs that run _after_ the agent has finished and its output has passed review. A compromised agent has nothing to steal and no credentials to misuse. |
There was a problem hiding this comment.
"passed review" is ambiguous here and reads like a human review step, but the security model described is automated gating (safe-outputs validation + optional threat detection) before any secrets/write-token job runs. Consider rephrasing to something like "after the agent completes and its output passes validation/threat-detection gates" to avoid implying manual review.
| The agent process never receives write tokens, API keys, or other sensitive credentials. Those secrets exist only in separate, isolated jobs that run _after_ the agent has finished and its output has passed review. A compromised agent has nothing to steal and no credentials to misuse. | |
| The agent process never receives write tokens, API keys, or other sensitive credentials. Those secrets exist only in separate, isolated jobs that run _after_ the agent has finished and its output has passed automated validation and threat-detection gates. A compromised agent has nothing to steal and no credentials to misuse. |
|
|
||
| ### Containerized with a network firewall | ||
|
|
||
| The agent runs inside an isolated container. A built-in network firewall—the [Agent Workflow Firewall](/gh-aw/introduction/architecture/#agent-workflow-firewall-awf)—routes all outbound traffic through a Squid proxy enforcing an explicit domain allowlist. Traffic to any other destination is dropped at the kernel level, so a compromised agent cannot exfiltrate data or call out to unexpected servers. |
There was a problem hiding this comment.
The AWF description here doesn’t match the architecture doc: AWF uses iptables to redirect HTTP/HTTPS through a Squid proxy (domain allowlist), rather than routing all outbound traffic through Squid. Also, the allowlist is enforced at the proxy while iptables performs the redirection/blocking. Suggest tightening this sentence to reflect the documented behavior (e.g., "redirects HTTP/HTTPS through a Squid proxy with a domain allowlist" and "blocks non-allowed egress").
| The agent runs inside an isolated container. A built-in network firewall—the [Agent Workflow Firewall](/gh-aw/introduction/architecture/#agent-workflow-firewall-awf)—routes all outbound traffic through a Squid proxy enforcing an explicit domain allowlist. Traffic to any other destination is dropped at the kernel level, so a compromised agent cannot exfiltrate data or call out to unexpected servers. | |
| The agent runs inside an isolated container. A built-in network firewall—the [Agent Workflow Firewall](/gh-aw/introduction/architecture/#agent-workflow-firewall-awf)—uses iptables to redirect HTTP and HTTPS traffic through a Squid proxy with an explicit domain allowlist, and to drop non-allowed egress at the kernel level. This prevents a compromised agent from exfiltrating data or calling out to unexpected servers. |
|
|
||
| ### Agentic threat detection | ||
|
|
||
| Before any output is applied, a dedicated [threat detection job](/gh-aw/reference/threat-detection/) runs an AI-powered scan of the agent's proposed changes. It checks for prompt injection attacks, leaked credentials, and malicious code patterns. If anything looks suspicious, the workflow fails immediately and nothing is written to your repository. |
There was a problem hiding this comment.
Threat detection can be explicitly disabled in workflows (e.g., threat-detection: false). To keep the homepage accurate, consider qualifying this with "by default" / "when enabled" or "when safe outputs are configured" rather than stating it always runs before any output is applied.
| Before any output is applied, a dedicated [threat detection job](/gh-aw/reference/threat-detection/) runs an AI-powered scan of the agent's proposed changes. It checks for prompt injection attacks, leaked credentials, and malicious code patterns. If anything looks suspicious, the workflow fails immediately and nothing is written to your repository. | |
| When threat detection is enabled, before any output is applied, a dedicated [threat detection job](/gh-aw/reference/threat-detection/) runs an AI-powered scan of the agent's proposed changes. It checks for prompt injection attacks, leaked credentials, and malicious code patterns. If anything looks suspicious, the workflow fails immediately and nothing is written to your repository. |
The "Guardrails Built-In" section on the index page was a single dense paragraph with a 3-node diagram (
Agent → Detection → SafeOutputs) — too thin to give readers a meaningful mental model of the security properties.Changes
Rewrote the section into five named subsections, each explaining one security layer in plain English:
Replaced the minimal diagram with a full-flow Mermaid chart showing the isolated sandbox → artifact → threat detection gate → blocked or write job → GitHub API path