Skip to content

Reduce PR Iteration Time: Too Many CI Failures Before Success #507

@manavgup

Description

@manavgup

Reduce PR Iteration Time: Too Many CI Failures Before Success

Problem Statement

Currently, it takes 6-8 iterations for a PR to pass CI, which is inefficient and frustrating. The industry standard is 2-4 iterations, and we should aim for ~3 commits to green CI.

Recent Example (PR #506)

Test fixes after PYTHONPATH removal required multiple iterations:

  1. Initial commit: Test fixes
  2. Pre-commit hook failures (ruff config paths)
  3. YAML syntax errors in pre-commit config
  4. Secrets detection failures
  5. Ansible-lint broken
  6. Playwright import errors in CI
  7. Final success

Total time wasted: ~45 minutes across iterations

Root Causes

1. The "Unseen State" Problem

Issues only surface at runtime:

  • Pre-commit hooks have specific configurations
  • CI environment differs from local (Python versions, dependencies, paths)
  • Some hooks are broken (ansible-lint) but hidden until commit
  • Test collection happens differently in CI vs local

2. The "Cascade Effect"

Fix tests → Commit → Pre-commit fails → Fix config → Commit → New issue → Repeat

Each fix reveals a new problem because:

  • Can't run pre-commit hooks before committing (chicken-and-egg)
  • Each hook has different requirements/environments
  • 3 different environments with different failure modes:
    • Local development
    • Pre-commit isolated virtualenvs
    • GitHub Actions CI

3. No Pre-flight Checks

We lack automated checks that run before attempting to commit, leading to:

  • Discovering issues one at a time
  • Repeated commit → fail → fix → commit cycles
  • Wasted time waiting for CI

Proposed Solutions

Strategy 1: Pre-flight Check Script (Priority: HIGH)

Create .local/pre-commit-check.sh:

#!/bin/bash
set -e

echo "🔍 Running pre-flight checks..."

# 1. Run all test suites
echo "1️⃣ Running tests..."
make test-unit-fast || exit 1
make test-atomic || exit 1
make test-integration || exit 1

# 2. Run pre-commit on all files
echo "2️⃣ Running pre-commit checks..."
pre-commit run --all-files || echo "⚠️  Pre-commit issues found - review above"

# 3. Check for common issues
echo "3️⃣ Checking for common issues..."
echo "  Checking for backend. prefix in tests..."
if grep -r "backend\.rag_solution" tests/ 2>/dev/null; then
    echo "  ❌ Found backend. prefix in test files"
    exit 1
fi

echo "  Checking for PYTHONPATH in Makefile..."
if grep "PYTHONPATH" Makefile 2>/dev/null; then
    echo "  ❌ PYTHONPATH still in Makefile"
    exit 1
fi

# 4. Check pytest can collect all tests
echo "4️⃣ Checking pytest collection..."
poetry run pytest --collect-only tests/ -q || exit 1

# 5. Validate YAML configs
echo "5️⃣ Validating configs..."
pre-commit validate-config || exit 1
python3 -c "import yaml; yaml.safe_load(open('.pre-commit-config.yaml'))" || exit 1

echo "✅ All pre-flight checks passed! Safe to commit."

Usage:

chmod +x .local/pre-commit-check.sh
./.local/pre-commit-check.sh  # Run before every commit

Benefits:

  • Catches 80% of issues before committing
  • Runs in 2-3 minutes (faster than commit → CI → fix cycle)
  • Can be added to git hooks or Makefile

Strategy 2: Local CI Testing with act (Priority: MEDIUM)

Install and use act to run GitHub Actions locally:

# Install act (macOS)
brew install act

# Run CI pipeline locally BEFORE pushing
act -j pytest              # Run pytest job
act -j lint                # Run lint job
act -j security            # Run security job

# Run all jobs
act

Benefits:

  • Catches CI-specific issues before pushing
  • Tests in same environment as GitHub Actions
  • Saves CI minutes and iteration time

Strategy 3: Incremental Commits with Draft PRs (Priority: MEDIUM)

Change workflow to fail fast:

# Old way: Try to get everything perfect, then commit
# New way: Commit early, iterate in draft PR

# 1. Make changes, commit immediately
git add tests/
git commit -m "fix: test fixes (WIP)"
git push origin feature-branch

# 2. Create DRAFT PR immediately
gh pr create --draft --title "[WIP] Test fixes"

# 3. Let CI run, see what breaks
# 4. Fix CI issues in separate small commits
git commit -m "fix(ci): exclude playwright"
git push

# 5. Mark ready when green
gh pr ready

Benefits:

  • See CI failures incrementally
  • Smaller commits easier to debug
  • Less "surprise" issues at the end
  • CI runs while you work on other things

Strategy 4: CI Simulation Make Target (Priority: HIGH)

Add to Makefile:

.PHONY: ci-local
ci-local: clean
	@echo "$(CYAN)🚀 Running full CI pipeline locally...$(NC)"
	@echo ""
	@echo "$(CYAN)1️⃣ Validating configuration...$(NC)"
	@pre-commit validate-config
	@poetry check --lock
	@echo "$(GREEN)✅ Config valid$(NC)"
	@echo ""
	@echo "$(CYAN)2️⃣ Running linters...$(NC)"
	@$(MAKE) lint
	@echo "$(GREEN)✅ Linting passed$(NC)"
	@echo ""
	@echo "$(CYAN)3️⃣ Running security checks...$(NC)"
	@$(MAKE) security-check
	@echo "$(GREEN)✅ Security passed$(NC)"
	@echo ""
	@echo "$(CYAN)4️⃣ Running test suite...$(NC)"
	@$(MAKE) test-atomic
	@$(MAKE) test-unit-fast
	@$(MAKE) test-integration
	@echo "$(GREEN)✅ All tests passed$(NC)"
	@echo ""
	@echo "$(CYAN)5️⃣ Checking pytest collection...$(NC)"
	@poetry run pytest --collect-only tests/ -q
	@echo "$(GREEN)✅ Test collection passed$(NC)"
	@echo ""
	@echo "$(GREEN)✅✅✅ Local CI pipeline passed! Safe to push. ✅✅✅$(NC)"

Usage:

make ci-local  # Run before pushing

Strategy 5: Fix Broken Pre-commit Hooks (Priority: HIGH)

Current broken hooks need fixing:

  1. ansible-lint: Missing ansible core dependency

    # Fix: Either install ansible-core in pre-commit env or skip for Python projects
    # Recommendation: Remove ansible-lint hook (not applicable to Python backend)
  2. detect-secrets: Flags test fixtures as secrets

    # Fix: Update .secrets.baseline regularly
    detect-secrets scan --baseline .secrets.baseline
    git add .secrets.baseline
  3. markdownlint: Line length too strict (80 chars)

    # Already fixed: .markdownlint.json with 120 char limit

Strategy 6: Pre-commit Staging Configuration (Priority: LOW)

Update .pre-commit-config.yaml to separate fast vs slow hooks:

# Fast hooks (run on commit)
- repo: https://github.com/pre-commit/pre-commit-hooks
  hooks:
    - id: trailing-whitespace
      stages: [commit]
    - id: end-of-file-fixer
      stages: [commit]

# Slow hooks (run on push)
- repo: https://github.com/astral-sh/ruff-pre-commit
  hooks:
    - id: ruff
      stages: [push]  # Already configured
    - id: ruff-format
      stages: [push]

Implementation Plan

Phase 1: Quick Wins (1-2 hours)

  • Create .local/pre-commit-check.sh script
  • Add make ci-local target to Makefile
  • Update .secrets.baseline to include test fixtures
  • Remove broken ansible-lint hook from pre-commit config
  • Document in CLAUDE.md: "Always run make ci-local before pushing"

Phase 2: Environment Improvements (2-3 hours)

  • Install and configure act for local GitHub Actions testing
  • Create .actrc configuration file
  • Add make test-ci-local using act
  • Test act with current CI workflows

Phase 3: Process Changes (Ongoing)

  • Adopt "draft PR early, iterate in public" workflow
  • Update contributing guide with new workflow
  • Add pre-flight check to git pre-push hook (optional)
  • Train team on new tools and processes

Phase 4: CI/CD Optimization (Separate Issue)

  • Audit all GitHub Actions workflows for duplication
  • Optimize parallel job execution
  • Review caching strategies
  • See issue #XXX for full CI/CD optimization

Success Metrics

Current State:

  • Average iterations to green CI: 6-8
  • Average time per PR: 45-60 minutes wasted on iterations
  • Developer frustration: High

Target State:

  • Average iterations to green CI: 2-3
  • Average time per PR: 15-20 minutes (pre-flight check + 1-2 quick fixes)
  • Developer frustration: Low
  • Confidence before pushing: High

How to Measure:

  • Track "commits per PR" metric
  • Track "CI failure rate" (failures / total runs)
  • Survey developer satisfaction quarterly

Related Issues

Questions for Discussion

  1. Should we make make ci-local mandatory before pushing (pre-push hook)?
  2. Should we invest in act for local CI testing or just improve pre-flight checks?
  3. Should we adopt "draft PR early" as standard workflow?
  4. Should we remove non-applicable hooks (ansible-lint) or fix them?

Priority: Medium
Effort: 4-6 hours total (spread across phases)
Impact: High - reduces developer friction, increases velocity, improves morale

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions