Reduce PR Iteration Time: Too Many CI Failures Before Success

# Reduce PR Iteration Time: Too Many CI Failures Before Success

## Problem Statement

Currently, it takes **6-8 iterations** for a PR to pass CI, which is inefficient and frustrating. The industry standard is **2-4 iterations**, and we should aim for **~3 commits to green CI**.

### Recent Example (PR #506)

Test fixes after PYTHONPATH removal required multiple iterations:
1. Initial commit: Test fixes
2. Pre-commit hook failures (ruff config paths)
3. YAML syntax errors in pre-commit config
4. Secrets detection failures
5. Ansible-lint broken
6. Playwright import errors in CI
7. Final success

**Total time wasted:** ~45 minutes across iterations

## Root Causes

### 1. **The "Unseen State" Problem**
Issues only surface at runtime:
- Pre-commit hooks have specific configurations
- CI environment differs from local (Python versions, dependencies, paths)
- Some hooks are broken (ansible-lint) but hidden until commit
- Test collection happens differently in CI vs local

### 2. **The "Cascade Effect"**
```
Fix tests → Commit → Pre-commit fails → Fix config → Commit → New issue → Repeat
```

Each fix reveals a new problem because:
- Can't run pre-commit hooks **before** committing (chicken-and-egg)
- Each hook has different requirements/environments
- 3 different environments with different failure modes:
  - Local development
  - Pre-commit isolated virtualenvs
  - GitHub Actions CI

### 3. **No Pre-flight Checks**
We lack automated checks that run **before** attempting to commit, leading to:
- Discovering issues one at a time
- Repeated commit → fail → fix → commit cycles
- Wasted time waiting for CI

## Proposed Solutions

### Strategy 1: **Pre-flight Check Script** (Priority: HIGH)

Create `.local/pre-commit-check.sh`:

```bash
#!/bin/bash
set -e

echo "🔍 Running pre-flight checks..."

# 1. Run all test suites
echo "1️⃣ Running tests..."
make test-unit-fast || exit 1
make test-atomic || exit 1
make test-integration || exit 1

# 2. Run pre-commit on all files
echo "2️⃣ Running pre-commit checks..."
pre-commit run --all-files || echo "⚠️  Pre-commit issues found - review above"

# 3. Check for common issues
echo "3️⃣ Checking for common issues..."
echo "  Checking for backend. prefix in tests..."
if grep -r "backend\.rag_solution" tests/ 2>/dev/null; then
    echo "  ❌ Found backend. prefix in test files"
    exit 1
fi

echo "  Checking for PYTHONPATH in Makefile..."
if grep "PYTHONPATH" Makefile 2>/dev/null; then
    echo "  ❌ PYTHONPATH still in Makefile"
    exit 1
fi

# 4. Check pytest can collect all tests
echo "4️⃣ Checking pytest collection..."
poetry run pytest --collect-only tests/ -q || exit 1

# 5. Validate YAML configs
echo "5️⃣ Validating configs..."
pre-commit validate-config || exit 1
python3 -c "import yaml; yaml.safe_load(open('.pre-commit-config.yaml'))" || exit 1

echo "✅ All pre-flight checks passed! Safe to commit."
```

**Usage:**
```bash
chmod +x .local/pre-commit-check.sh
./.local/pre-commit-check.sh  # Run before every commit
```

**Benefits:**
- Catches 80% of issues before committing
- Runs in 2-3 minutes (faster than commit → CI → fix cycle)
- Can be added to git hooks or Makefile

### Strategy 2: **Local CI Testing with `act`** (Priority: MEDIUM)

Install and use `act` to run GitHub Actions locally:

```bash
# Install act (macOS)
brew install act

# Run CI pipeline locally BEFORE pushing
act -j pytest              # Run pytest job
act -j lint                # Run lint job
act -j security            # Run security job

# Run all jobs
act
```

**Benefits:**
- Catches CI-specific issues before pushing
- Tests in same environment as GitHub Actions
- Saves CI minutes and iteration time

### Strategy 3: **Incremental Commits with Draft PRs** (Priority: MEDIUM)

Change workflow to fail fast:

```bash
# Old way: Try to get everything perfect, then commit
# New way: Commit early, iterate in draft PR

# 1. Make changes, commit immediately
git add tests/
git commit -m "fix: test fixes (WIP)"
git push origin feature-branch

# 2. Create DRAFT PR immediately
gh pr create --draft --title "[WIP] Test fixes"

# 3. Let CI run, see what breaks
# 4. Fix CI issues in separate small commits
git commit -m "fix(ci): exclude playwright"
git push

# 5. Mark ready when green
gh pr ready
```

**Benefits:**
- See CI failures incrementally
- Smaller commits easier to debug
- Less "surprise" issues at the end
- CI runs while you work on other things

### Strategy 4: **CI Simulation Make Target** (Priority: HIGH)

Add to Makefile:

```makefile
.PHONY: ci-local
ci-local: clean
	@echo "$(CYAN)🚀 Running full CI pipeline locally...$(NC)"
	@echo ""
	@echo "$(CYAN)1️⃣ Validating configuration...$(NC)"
	@pre-commit validate-config
	@poetry check --lock
	@echo "$(GREEN)✅ Config valid$(NC)"
	@echo ""
	@echo "$(CYAN)2️⃣ Running linters...$(NC)"
	@$(MAKE) lint
	@echo "$(GREEN)✅ Linting passed$(NC)"
	@echo ""
	@echo "$(CYAN)3️⃣ Running security checks...$(NC)"
	@$(MAKE) security-check
	@echo "$(GREEN)✅ Security passed$(NC)"
	@echo ""
	@echo "$(CYAN)4️⃣ Running test suite...$(NC)"
	@$(MAKE) test-atomic
	@$(MAKE) test-unit-fast
	@$(MAKE) test-integration
	@echo "$(GREEN)✅ All tests passed$(NC)"
	@echo ""
	@echo "$(CYAN)5️⃣ Checking pytest collection...$(NC)"
	@poetry run pytest --collect-only tests/ -q
	@echo "$(GREEN)✅ Test collection passed$(NC)"
	@echo ""
	@echo "$(GREEN)✅✅✅ Local CI pipeline passed! Safe to push. ✅✅✅$(NC)"
```

**Usage:**
```bash
make ci-local  # Run before pushing
```

### Strategy 5: **Fix Broken Pre-commit Hooks** (Priority: HIGH)

Current broken hooks need fixing:

1. **ansible-lint**: Missing ansible core dependency
   ```bash
   # Fix: Either install ansible-core in pre-commit env or skip for Python projects
   # Recommendation: Remove ansible-lint hook (not applicable to Python backend)
   ```

2. **detect-secrets**: Flags test fixtures as secrets
   ```bash
   # Fix: Update .secrets.baseline regularly
   detect-secrets scan --baseline .secrets.baseline
   git add .secrets.baseline
   ```

3. **markdownlint**: Line length too strict (80 chars)
   ```bash
   # Already fixed: .markdownlint.json with 120 char limit
   ```

### Strategy 6: **Pre-commit Staging Configuration** (Priority: LOW)

Update `.pre-commit-config.yaml` to separate fast vs slow hooks:

```yaml
# Fast hooks (run on commit)
- repo: https://github.com/pre-commit/pre-commit-hooks
  hooks:
    - id: trailing-whitespace
      stages: [commit]
    - id: end-of-file-fixer
      stages: [commit]

# Slow hooks (run on push)
- repo: https://github.com/astral-sh/ruff-pre-commit
  hooks:
    - id: ruff
      stages: [push]  # Already configured
    - id: ruff-format
      stages: [push]
```

## Implementation Plan

### Phase 1: Quick Wins (1-2 hours)
- [ ] Create `.local/pre-commit-check.sh` script
- [ ] Add `make ci-local` target to Makefile
- [ ] Update `.secrets.baseline` to include test fixtures
- [ ] Remove broken ansible-lint hook from pre-commit config
- [ ] Document in CLAUDE.md: "Always run `make ci-local` before pushing"

### Phase 2: Environment Improvements (2-3 hours)
- [ ] Install and configure `act` for local GitHub Actions testing
- [ ] Create `.actrc` configuration file
- [ ] Add `make test-ci-local` using act
- [ ] Test act with current CI workflows

### Phase 3: Process Changes (Ongoing)
- [ ] Adopt "draft PR early, iterate in public" workflow
- [ ] Update contributing guide with new workflow
- [ ] Add pre-flight check to git pre-push hook (optional)
- [ ] Train team on new tools and processes

### Phase 4: CI/CD Optimization (Separate Issue)
- [ ] Audit all GitHub Actions workflows for duplication
- [ ] Optimize parallel job execution
- [ ] Review caching strategies
- [ ] See issue #XXX for full CI/CD optimization

## Success Metrics

**Current State:**
- Average iterations to green CI: 6-8
- Average time per PR: 45-60 minutes wasted on iterations
- Developer frustration: High

**Target State:**
- Average iterations to green CI: 2-3
- Average time per PR: 15-20 minutes (pre-flight check + 1-2 quick fixes)
- Developer frustration: Low
- Confidence before pushing: High

**How to Measure:**
- Track "commits per PR" metric
- Track "CI failure rate" (failures / total runs)
- Survey developer satisfaction quarterly

## Related Issues

- #505 - E2E test refactoring
- #348 - Makefile simplification
- #XXX - CI/CD optimization (to be created)

## Questions for Discussion

1. Should we make `make ci-local` mandatory before pushing (pre-push hook)?
2. Should we invest in `act` for local CI testing or just improve pre-flight checks?
3. Should we adopt "draft PR early" as standard workflow?
4. Should we remove non-applicable hooks (ansible-lint) or fix them?

---

**Priority**: Medium  
**Effort**: 4-6 hours total (spread across phases)  
**Impact**: High - reduces developer friction, increases velocity, improves morale


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce PR Iteration Time: Too Many CI Failures Before Success #507

Reduce PR Iteration Time: Too Many CI Failures Before Success

Problem Statement

Recent Example (PR #506)

Root Causes

1. The "Unseen State" Problem

2. The "Cascade Effect"

3. No Pre-flight Checks

Proposed Solutions

Strategy 1: Pre-flight Check Script (Priority: HIGH)

Strategy 2: Local CI Testing with `act` (Priority: MEDIUM)

Strategy 3: Incremental Commits with Draft PRs (Priority: MEDIUM)

Strategy 4: CI Simulation Make Target (Priority: HIGH)

Strategy 5: Fix Broken Pre-commit Hooks (Priority: HIGH)

Strategy 6: Pre-commit Staging Configuration (Priority: LOW)

Implementation Plan

Phase 1: Quick Wins (1-2 hours)

Phase 2: Environment Improvements (2-3 hours)

Phase 3: Process Changes (Ongoing)

Phase 4: CI/CD Optimization (Separate Issue)

Success Metrics

Related Issues

Questions for Discussion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Reduce PR Iteration Time: Too Many CI Failures Before Success #507

Description

Reduce PR Iteration Time: Too Many CI Failures Before Success

Problem Statement

Recent Example (PR #506)

Root Causes

1. The "Unseen State" Problem

2. The "Cascade Effect"

3. No Pre-flight Checks

Proposed Solutions

Strategy 1: Pre-flight Check Script (Priority: HIGH)

Strategy 2: Local CI Testing with act (Priority: MEDIUM)

Strategy 3: Incremental Commits with Draft PRs (Priority: MEDIUM)

Strategy 4: CI Simulation Make Target (Priority: HIGH)

Strategy 5: Fix Broken Pre-commit Hooks (Priority: HIGH)

Strategy 6: Pre-commit Staging Configuration (Priority: LOW)

Implementation Plan

Phase 1: Quick Wins (1-2 hours)

Phase 2: Environment Improvements (2-3 hours)

Phase 3: Process Changes (Ongoing)

Phase 4: CI/CD Optimization (Separate Issue)

Success Metrics

Related Issues

Questions for Discussion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Strategy 2: Local CI Testing with `act` (Priority: MEDIUM)