Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 37 additions & 159 deletions docs/src/content/docs/guides/threat-detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,7 @@ GitHub Agentic Workflows includes automatic threat detection to analyze agent ou

## How It Works

Threat detection provides an additional security layer that:

1. **Analyzes Agent Output**: Reviews all safe output items (issues, comments, PRs) for malicious content
2. **Scans Code Changes**: Examines git patches for suspicious patterns, backdoors, and vulnerabilities
3. **Uses Workflow Context**: Leverages the workflow source to distinguish legitimate actions from threats
4. **Runs Automatically**: Executes after the main agentic job completes but before safe outputs are applied
Threat detection provides an additional security layer by analyzing agent output for malicious content, scanning code changes for suspicious patterns, using workflow context to distinguish legitimate actions from threats, and running automatically after the main job completes but before safe outputs are applied.

**Security Architecture:**

Expand Down Expand Up @@ -49,11 +44,7 @@ safe-outputs:
create-pull-request:
```

The default configuration uses AI-powered analysis with the Agentic engine to detect:

- **Prompt Injection**: Malicious instructions attempting to manipulate AI behavior
- **Secret Leaks**: Exposed API keys, tokens, passwords, or credentials
- **Malicious Patches**: Code changes introducing vulnerabilities, backdoors, or suspicious patterns
The default configuration uses AI-powered analysis to detect prompt injection (malicious instructions manipulating AI behavior), secret leaks (exposed API keys, tokens, passwords, credentials), and malicious patches (code changes introducing vulnerabilities, backdoors, or suspicious patterns).

## Configuration Options

Expand Down Expand Up @@ -89,13 +80,12 @@ safe-outputs:

**Configuration Fields:**

- **`enabled`** (boolean): Enable or disable threat detection. Default: `true` when safe-outputs exist
- **`prompt`** (string): Additional custom instructions appended to the default threat detection prompt
- **`engine`** (string | object | false): AI engine configuration for detection
- String format: `"copilot"`
- Object format: Full engine configuration (same as main workflow engine)
- `false`: Disable AI-based detection, run only custom steps
- **`steps`** (array): Additional GitHub Actions steps to run after AI analysis
| Field | Type | Description |
|-------|------|-------------|
| `enabled` | boolean | Enable or disable detection (default: `true` when safe-outputs exist) |
| `prompt` | string | Custom instructions appended to default detection prompt |
| `engine` | string/object/false | AI engine config (`"copilot"`, full config object, or `false` for no AI) |
| `steps` | array | Additional GitHub Actions steps to run after AI analysis |

## AI-Based Detection (Default)

Expand All @@ -116,10 +106,7 @@ safe-outputs:
Analyze code and create pull requests with improvements.
```

The AI engine receives the workflow source context and analyzes:
- Agent output items (issues, comments, PRs)
- Git patch files with code changes
- Workflow intent and legitimate use cases
The AI engine receives the workflow source context and analyzes agent output items (issues, comments, PRs), git patch files with code changes, and workflow intent to distinguish legitimate actions.

**Output Format:**

Expand Down Expand Up @@ -212,20 +199,9 @@ safe-outputs:
path: /tmp/gh-aw/threat-detection/
```

**Available Artifacts:**

Custom steps have access to these downloaded artifacts:
**Available Artifacts:** Custom steps have access to `/tmp/gh-aw/threat-detection/prompt.txt` (workflow prompt), `agent_output.json` (safe output items), and `aw.patch` (git patch file).

- `/tmp/gh-aw/threat-detection/prompt.txt` - Workflow prompt
- `/tmp/gh-aw/threat-detection/agent_output.json` - Safe output items
- `/tmp/gh-aw/threat-detection/aw.patch` - Git patch file

**Execution Order:**

1. Download artifacts (prompt, output, patch)
2. Run AI-based analysis (if engine not disabled)
3. Execute custom steps
4. Upload detection log artifact
**Execution Order:** Download artifacts → Run AI analysis (if enabled) → Execute custom steps → Upload detection log.

## Example: LlamaGuard Integration

Expand All @@ -243,74 +219,32 @@ safe-outputs:
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');

// Install Ollama
// Install and start Ollama service
await exec.exec('curl', ['-fsSL', 'https://ollama.com/install.sh', '-o', '/tmp/install.sh']);
await exec.exec('sh', ['/tmp/install.sh']);

// Start Ollama service
exec.exec('ollama', ['serve'], { detached: true });

// Wait for service
let ready = false;
for (let i = 0; i < 30; i++) {
try {
await exec.exec('curl', ['-f', 'http://localhost:11434/api/version'], { silent: true });
ready = true;
break;
} catch (e) {
await new Promise(r => setTimeout(r, 1000));
}
}

if (!ready) {
core.setFailed('Ollama service failed to start');
return;
}

// Pull LlamaGuard model

Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified code example in lines 222-238 has a critical flaw: the Ollama service is started in detached mode without any wait or readiness check. This can lead to race conditions where the subsequent commands (pull model, curl API) fail because the service isn't ready yet.

The original implementation included proper service readiness polling which was essential. While the goal is to simplify the example, removing all error handling and readiness checks makes this code unreliable in practice. Consider keeping at least a minimal wait/retry mechanism or adding a comment warning that service readiness checking is needed for production use.

Suggested change
// Wait for Ollama service to be ready (minimal polling)
const http = require('http');
let ready = false;
for (let i = 0; i < 20; i++) { // up to ~10 seconds
try {
await new Promise((resolve, reject) => {
const req = http.get('http://localhost:11434/api/tags', res => {
if (res.statusCode === 200) {
ready = true;
resolve();
} else {
setTimeout(resolve, 500);
}
});
req.on('error', () => setTimeout(resolve, 500));
});
if (ready) break;
} catch (e) {}
}
if (!ready) {
core.setFailed('Ollama service did not become ready in time');
return;
}

Copilot uses AI. Check for mistakes.
// Pull model and scan output
await exec.exec('ollama', ['pull', 'llama-guard3:1b']);

// Scan agent output
const outputPath = '/tmp/gh-aw/threat-detection/agent_output.json';
if (fs.existsSync(outputPath)) {
const content = fs.readFileSync(outputPath, 'utf8');

const response = await exec.getExecOutput('curl', [
'-X', 'POST',
'http://localhost:11434/api/chat',
'-H', 'Content-Type: application/json',
'-d', JSON.stringify({
model: 'llama-guard3:1b',
messages: [{ role: 'user', content }],
stream: false
})
]);

const result = JSON.parse(response.stdout);
const output = result.message?.content || '';

// Check if safe
const isSafe = output.toLowerCase().trim() === 'safe' || output.includes('s8');

if (!isSafe) {
core.setFailed(`LlamaGuard detected threat: ${output}`);
} else {
core.info('✅ Content appears safe');
}
}

timeout-minutes: 20 # Allow time for model download
const content = require('fs').readFileSync('/tmp/gh-aw/threat-detection/agent_output.json', 'utf8');
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified code at lines 229-230 lacks error handling for missing files. If agent_output.json doesn't exist, readFileSync will throw an exception and crash the script. Consider wrapping this in a try-catch block or adding an existence check, or at minimum, add a comment noting that production implementations should check file existence (as referenced in the complete implementation).

Suggested change
const content = require('fs').readFileSync('/tmp/gh-aw/threat-detection/agent_output.json', 'utf8');
// In production, check file existence before reading. Here, we handle missing file gracefully.
let content;
try {
content = require('fs').readFileSync('/tmp/gh-aw/threat-detection/agent_output.json', 'utf8');
} catch (err) {
core.setFailed('agent_output.json not found: ' + err.message);
return;
}

Copilot uses AI. Check for mistakes.
const response = await exec.getExecOutput('curl', [
'-X', 'POST', 'http://localhost:11434/api/chat',
'-H', 'Content-Type: application/json',
'-d', JSON.stringify({ model: 'llama-guard3:1b', messages: [{ role: 'user', content }] })
]);

const result = JSON.parse(response.stdout);
const isSafe = result.message?.content.toLowerCase().includes('safe');
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition isSafe check on line 237 is overly simplified and may produce incorrect results. The original implementation checked for both output.toLowerCase().trim() === 'safe' and output.includes('s8'), but this only checks if 'safe' appears anywhere in the content (case-insensitive). This could lead to false negatives if the response contains 'safe' as part of a larger warning message. Consider adding a comment noting this simplification or being more explicit about the check (e.g., checking for 'safe' at the start of the response).

Suggested change
const isSafe = result.message?.content.toLowerCase().includes('safe');
// Check for exact "safe" response or model-specific code (e.g., "s8").
const output = result.message?.content?.toLowerCase().trim();
const isSafe = output === 'safe' || output.includes('s8');

Copilot uses AI. Check for mistakes.
if (!isSafe) core.setFailed('LlamaGuard detected threat');
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "LlamaGuard detected threat" on line 238 is less informative than the original implementation which included the actual threat output. When a threat is detected, users need to know what the threat was, not just that one exists. Consider including at least a reference to checking the logs or adding ${result.message?.content} to provide actionable information.

Suggested change
if (!isSafe) core.setFailed('LlamaGuard detected threat');
if (!isSafe) core.setFailed(`LlamaGuard detected threat: ${result.message?.content}`);

Copilot uses AI. Check for mistakes.

timeout-minutes: 20
---

# Code Review Agent

Analyze and improve code with LlamaGuard threat scanning.
```

:::tip
For a complete LlamaGuard implementation, see `.github/workflows/shared/ollama-threat-scan.md` in the repository.
For a complete implementation with error handling and service readiness checks, see `.github/workflows/shared/ollama-threat-scan.md` in the repository.
:::

## Combined AI and Custom Detection
Expand Down Expand Up @@ -354,78 +288,22 @@ If the detection process itself fails (e.g., network issues, tool errors), the w

## Best Practices

### When to Use AI Detection

**Use AI-based detection when:**
- Analyzing natural language content (issues, comments, discussions)
- Detecting sophisticated prompt injection attempts
- Understanding context-specific security risks
- Identifying intent-based threats

### When to Use Custom Steps

**Add custom steps when:**
- Integrating specialized security tools (Semgrep, Snyk, TruffleHog)
- Enforcing organization-specific security policies
- Scanning for domain-specific vulnerabilities
- Meeting compliance requirements

### Performance Considerations
**Use AI-based detection** for analyzing natural language content, detecting sophisticated prompt injection, understanding context-specific risks, and identifying intent-based threats.

- **AI Analysis**: Typically completes in 10-30 seconds
- **Custom Tools**: Varies by tool (LlamaGuard: 5-15 minutes with model download)
- **Timeout**: Set appropriate `timeout-minutes` for custom tools
- **Artifact Size**: Large patches may require truncation for analysis
**Add custom steps** for integrating specialized security tools (Semgrep, Snyk, TruffleHog), enforcing organization policies, scanning domain-specific vulnerabilities, and meeting compliance requirements.

### Security Recommendations
**Performance:** AI analysis typically completes in 10-30 seconds. Custom tools vary (LlamaGuard: 5-15 minutes with model download). Set appropriate `timeout-minutes` and truncate large patches if needed.

1. **Defense in Depth**: Use both AI and custom detection for critical workflows
2. **Regular Updates**: Keep custom security tools and models up to date
3. **Test Thoroughly**: Validate detection with known malicious samples
4. **Monitor False Positives**: Review blocked outputs to refine detection logic
5. **Document Rationale**: Comment why specific detection rules exist
**Security:** Use defense-in-depth with both AI and custom detection for critical workflows. Keep tools updated, validate with known malicious samples, monitor false positives, and document detection rationale.

## Troubleshooting

### AI Detection Always Fails

**Symptom**: Every workflow execution reports threats

**Solutions**:
- Review custom prompt for overly strict instructions
- Check if legitimate workflow patterns trigger detection
- Adjust prompt to provide better context
- Use `threat-detection.enabled: false` temporarily to test

### Custom Steps Not Running

**Symptom**: Steps in `threat-detection.steps` don't execute

**Check**:
- Verify YAML indentation is correct
- Ensure steps array is properly formatted
- Review workflow compilation output for errors
- Check if AI detection failed before custom steps

### Large Patches Cause Timeouts

**Symptom**: Detection times out with large code changes

**Solutions**:
- Increase `timeout-minutes` in workflow frontmatter
- Configure `max-patch-size` to limit patch size
- Truncate content before analysis in custom steps
- Split large changes into smaller PRs

### False Positives

**Symptom**: Legitimate content flagged as malicious

**Solutions**:
- Refine custom prompt with specific exclusions
- Adjust custom detection tool thresholds
- Add workflow context explaining legitimate patterns
- Review detection logs to understand trigger patterns
| Issue | Solution |
|-------|----------|
| **AI detection always fails** | Review custom prompt for overly strict instructions, check if legitimate patterns trigger detection, adjust prompt context, or temporarily disable to test |
| **Custom steps not running** | Verify YAML indentation, ensure steps array is properly formatted, review compilation output, check if AI detection failed first |
| **Large patches cause timeouts** | Increase `timeout-minutes`, configure `max-patch-size`, truncate content before analysis, or split changes into smaller PRs |
| **False positives** | Refine prompt with specific exclusions, adjust tool thresholds, add workflow context explaining patterns, review detection logs |

## Related Documentation

Expand Down