Skip to content

Conversation

@shivasurya
Copy link
Owner

Objective

Implement human-readable text output for the scan command with detection type badges, code snippets, severity grouping, and taint flow visualization.

Changes

New Files

  • output/text_formatter.go (268 lines) - TextFormatter implementation
  • output/text_formatter_test.go (596 lines) - Comprehensive tests

Modified Files

  • cmd/scan.go - Integrated enrichment pipeline and text formatter

Features

Detection Type Badges: [Pattern], [Taint-Local], [Taint-Global]
Severity Grouping: Critical → High → Medium → Low (ordered by priority)
Detail Levels:

  • Critical/High: Full details with code snippets, taint flow, confidence
  • Medium/Low: Single-line abbreviated format
    Code Snippets: Line numbers with highlight markers (>)
    Taint Flow Visualization: Source → Sink with variable tracking
    Summary Statistics: Total findings, severity breakdown
    Verbose Mode: Detection method breakdown

Test Results

  • ✅ All Go tests passing (19 packages)
  • ✅ Text formatter coverage: 100%
  • ✅ Output package coverage: 98.4%
  • ✅ All Python tests passing (185 tests)
  • ✅ Linting: 0 issues

Commits

  1. Add text formatter with rich output - Core formatter implementation with 100% test coverage
  2. Integrate text formatter in scan command - Replace old output with enrichment pipeline

Example Output

Code Pathfinder Security Scan

Results:

Critical Issues (1):

  [critical] [Taint-Local] command-injection: Command Injection
    CWE-78 | A03:2021

    auth/login.py:10

      > 10 | eval(user_input)

    Flow: user_input (line 5) -> eval (line 10)
    Tainted variable 'user_input' reaches dangerous sink without sanitization

    Confidence: High | Detection: Intra-procedural taint analysis

Summary:
  1 findings across 5 rules
  1 critical

Dependencies

Tech Spec Reference

Implements Section 4.1 of output-standardization tech spec


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@safedep
Copy link

safedep bot commented Nov 21, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 90.11628% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.66%. Comparing base (962a10c) to head (3f7d794).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sourcecode-parser/cmd/scan.go 0.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #393      +/-   ##
==========================================
+ Coverage   79.33%   79.66%   +0.33%     
==========================================
  Files          74       75       +1     
  Lines        7378     7542     +164     
==========================================
+ Hits         5853     6008     +155     
- Misses       1283     1292       +9     
  Partials      242      242              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Owner Author

shivasurya commented Nov 22, 2025

Merge activity

  • Nov 22, 12:38 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Nov 22, 12:39 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 22, 12:40 AM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from shiva/output-logging-system to graphite-base/393 November 22, 2025 00:38
@shivasurya shivasurya changed the base branch from graphite-base/393 to main November 22, 2025 00:39
shivasurya and others added 2 commits November 22, 2025 00:39
- Detection type badges (Pattern, Taint-Local, Taint-Global)
- Severity-based grouping with detail levels
- Code snippets with line numbers and highlight
- Taint flow visualization
- Summary statistics
- Comprehensive tests with 100% coverage

Part of output standardization feature.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace old printDetections() with enrichment pipeline
- Add enricher to add context and metadata to detections
- Connect enricher -> formatter flow for rich output
- Keep printDetections() for query command compatibility

Part of output standardization feature.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the shiva/output-text-formatter branch from d6b8b1c to 3f7d794 Compare November 22, 2025 00:39
@shivasurya shivasurya merged commit dd0a468 into main Nov 22, 2025
3 checks passed
@shivasurya shivasurya deleted the shiva/output-text-formatter branch November 22, 2025 00:40
shivasurya added a commit that referenced this pull request Nov 22, 2025
## Summary
Implements JSON and CSV output formatters for the `ci` command, replacing the old inline JSON generation with a modular, well-tested implementation.

**Part of output-standardization tech spec (Stacked PRs)**
- ✅ PR #1: Logging System Infrastructure (#391) - **Merged**
- ✅ PR #2: Output Package Foundation (#392) - **In Review**
- ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review**
- 🔄 PR #4: JSON and CSV Formatters ← **This PR**

## Changes

### New Files
- `output/json_formatter.go` (235 lines)
  - Enhanced JSON output with rich metadata structure
  - Tool, scan, results, summary, and errors sections
  - Code snippets with configurable context lines
  - Taint flow source/sink information
  - CWE, OWASP, and reference metadata
  
- `output/csv_formatter.go` (123 lines)
  - CSV output for CI/CD integration
  - 17 columns: severity, confidence, rule_id, rule_name, cwe, owasp, file, line, column, function, message, detection_type, detection_scope, source_line, sink_line, tainted_var, sink_call
  - Proper escaping via encoding/csv package

- `output/json_formatter_test.go` (415 lines)
  - Comprehensive tests achieving 100% coverage
  - Structure validation, snippet handling, metadata, pattern vs taint detection

- `output/csv_formatter_test.go` (395 lines)
  - Comprehensive tests achieving 100% coverage
  - Header validation, escaping, multiple rows, zero values

### Modified Files
- `cmd/ci.go`
  - Replaced old `generateJSONOutput()` with new formatter integration
  - Added enrichment pipeline using `output.NewEnricher()`
  - Updated output format validation to include "csv"
  - Added CSV formatter support
  - Updated help text and examples
  - Exit code 1 when vulnerabilities found (for CI/CD)

- `cmd/ci_test.go`
  - Skipped obsolete `TestGenerateJSONOutput` (replaced by new formatter tests)

- `main_test.go`
  - Updated expected help text to include CSV output format

## JSON Output Structure
```json
{
  "tool": {
    "name": "Code Pathfinder",
    "version": "1.0.0",
    "url": "https://codepathfinder.dev"
  },
  "scan": {
    "target": "/path/to/project",
    "timestamp": "2025-01-21T10:30:00Z",
    "duration": 5.43,
    "rules_executed": 12
  },
  "results": [{
    "rule_id": "sql-injection",
    "rule_name": "SQL Injection",
    "message": "Unsanitized user input flows to SQL query",
    "severity": "critical",
    "confidence": "high",
    "location": {
      "file": "src/main.py",
      "line": 42,
      "column": 8,
      "function": "process_user",
      "snippet": {
        "start_line": 40,
        "end_line": 44,
        "lines": ["...", "query = f\"SELECT * FROM users WHERE id={user_id}\"", "..."]
      }
    },
    "detection": {
      "type": "taint-local",
      "scope": "intra-procedural",
      "confidence_score": 0.95,
      "source": {"line": 38, "variable": "user_id"},
      "sink": {"line": 42, "call": "execute"}
    },
    "metadata": {
      "cwe": ["CWE-89"],
      "owasp": ["A03:2021"],
      "references": ["https://..."]
    }
  }],
  "summary": {
    "total": 5,
    "by_severity": {"critical": 2, "high": 3},
    "by_detection_type": {"taint-local": 4, "pattern": 1}
  },
  "errors": []
}
```

## CSV Output Format
```csv
severity,confidence,rule_id,rule_name,cwe,owasp,file,line,column,function,message,detection_type,detection_scope,source_line,sink_line,tainted_var,sink_call
critical,high,sql-injection,SQL Injection,CWE-89,A03:2021,src/main.py,42,8,process_user,Unsanitized user input flows to SQL query,taint-local,intra-procedural,38,42,user_id,execute
```

## Testing
- All tests passing (100% coverage for both formatters)
- Output package overall: 98.1% coverage
- Linting checks passed
- Integration tests with ci command verified

## Usage Examples
```bash
# Generate JSON report
pathfinder ci --rules rules/ --project . --output json > results.json

# Generate CSV report  
pathfinder ci --rules rules/ --project . --output csv > results.csv

# Generate SARIF report (existing)
pathfinder ci --rules rules/ --project . --output sarif > results.sarif
```

## Breaking Changes
- Old `generateJSONOutput()` function removed from cmd/ci.go
- JSON output structure changed to new rich format (snake_case fields)
- Exit code behavior unchanged (exits 1 when vulnerabilities found)

## Stack Status
This PR stacks on:
- **PR #3**: shiva/output-text-formatter (#393) ← base branch
- **PR #2**: shiva/output-logging-system (#392)
- **main**: Production branch

Next PR:
- PR #5: SARIF Formatter Enhancement (will stack on this PR)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
shivasurya added a commit that referenced this pull request Nov 22, 2025
## Summary
Implements enhanced SARIF formatter with code flows, related locations, and rich metadata for optimal GitHub Code Scanning integration.

**Part of output-standardization tech spec (Stacked PRs)**
- ✅ PR #1: Logging System Infrastructure (#391) - **Merged**
- ✅ PR #2: Output Package Foundation (#392) - **In Review**
- ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review**
- ✅ PR #4: JSON and CSV Formatters (#394) - **In Review**
- 🔄 PR #5: Enhanced SARIF Formatter ← **This PR**

## Changes

### New Files
- `output/sarif_formatter.go` (290 lines)
  - SARIF 2.1.0 compliant output formatter
  - Code flows for taint path visualization (source → sink)
  - Related locations for taint sources
  - Help text with markdown and CWE references
  - Security severity scores (9.0, 7.0, 5.0, 3.0)
  - Rule properties: tags, precision
  - Deduplicates rules across multiple detections

- `output/sarif_formatter_test.go` (519 lines)
  - Comprehensive tests achieving 97.5% coverage
  - Tests for version, tool metadata, rules, results
  - Code flow generation tests (taint-local, taint-global)
  - Related locations validation
  - Pattern vs taint detection differentiation

### Modified Files
- `cmd/ci.go`
  - Replaced old `generateSARIFOutput()` with new formatter
  - Uses enriched detections for rich output
  - Removed unused imports (sarif library, json, encoding/json)
  - Consistent pattern with JSON and CSV formatters

- `cmd/ci_test.go`
  - Skipped obsolete SARIF tests
  - Removed unused helper functions

## Key Features

### Code Flows
Taint detections automatically include code flows showing the path from source to sink:

```json
{
  "codeFlows": [{
    "message": {"text": "Taint flow from line 10 to line 20"},
    "threadFlows": [{
      "locations": [
        {
          "location": {"physicalLocation": {"region": {"startLine": 10}}},
          "message": {"text": "Taint source: user_input"}
        },
        {
          "location": {"physicalLocation": {"region": {"startLine": 20}}},
          "message": {"text": "Taint sink: os.system"}
        }
      ]
    }]
  }]
}
```

### Help Text with Markdown
Rules include rich help text with CWE references:

```markdown
## Command Injection

User input flows to shell command without sanitization

### References
- [CWE-78](https://cwe.mitre.org/data/definitions/78.html)
```

### Security Severity Scores
GitHub-compatible severity scores for prioritization:
- Critical: 9.0
- High: 7.0
- Medium: 5.0
- Low: 3.0

### Rule Properties
```json
{
  "properties": {
    "tags": ["security"],
    "security-severity": "9.0",
    "precision": "high"
  }
}
```

## Benefits over Old Implementation

| Feature | Old | New |
|---------|-----|-----|
| Code flows | ❌ None | ✅ Source → Sink visualization |
| Related locations | ❌ None | ✅ Taint sources highlighted |
| Help text | ❌ Plain text | ✅ Markdown with references |
| Security severity | ❌ Level only | ✅ Numeric scores for GitHub |
| Rule properties | ❌ None | ✅ Tags, precision |
| Pattern detection | ❌ Same as taint | ✅ No code flows (correct) |
| Test coverage | ❌ ~60% | ✅ 97.5% |

## Testing
- All tests passing (97.5% coverage on SARIF formatter)
- Output package overall: 97.5% coverage
- Linting checks passed
- Integration with ci command verified

## Usage Examples
```bash
# Generate enhanced SARIF report with code flows
pathfinder ci --rules rules/ --project . --output sarif > results.sarif

# Upload to GitHub Code Scanning
gh api /repos/:owner/:repo/code-scanning/sarifs -F sarif=@results.sarif

# View in GitHub UI with code flows highlighted
```

## SARIF Output Sample
```json
{
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "Code Pathfinder",
        "version": "0.0.25",
        "rules": [{
          "id": "sql-injection",
          "name": "SQL Injection",
          "fullDescription": {"text": "Unsanitized user input flows to SQL query (CWE-89, A03:2021)"},
          "helpUri": "https://github.com/shivasurya/code-pathfinder",
          "defaultConfiguration": {"level": "error"},
          "properties": {
            "tags": ["security"],
            "security-severity": "9.0",
            "precision": "high"
          }
        }]
      }
    },
    "results": [{
      "ruleId": "sql-injection",
      "message": {"text": "Unsanitized user input flows to SQL query (sink: execute, confidence: 95%)"},
      "locations": [{
        "physicalLocation": {
          "artifactLocation": {"uri": "src/db/queries.py"},
          "region": {"startLine": 42, "startColumn": 8}
        }
      }],
      "codeFlows": [...],
      "relatedLocations": [...]
    }]
  }]
}
```

## Breaking Changes
- Old `generateSARIFOutput()` function removed
- SARIF output structure enhanced with additional fields
- Pattern matches no longer include code flows (correct behavior)

## Stack Status
This PR stacks on:
- **PR #4**: shiva/output-json-csv-formatters (#394) ← base branch
- **PR #3**: shiva/output-text-formatter (#393)
- **PR #2**: shiva/output-logging-system (#392)
- **main**: Production branch

Next PR:
- PR #6: Exit Code Standardization (will stack on this PR)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants