AI-powered test automation quality evaluator - Assess your test automation projects with structured, repeatable evaluations
Agentic QA is a specialized evaluation tool for test automation projects. It provides comprehensive, structured assessments of test automation quality across multiple dimensions, helping teams identify issues and track improvements over time.
Agentic QA uses Claude Code's CLAUDE.md feature to provide structured, multi-step evaluations of test automation projects. Unlike generic AI assistance, it follows a systematic evaluation protocol with:
- Structured workflows - Consistent 5-step evaluation process
- Metric-specific analysis - Deep-dive into dependency health, performance, and code quality
- Actionable recommendations - Specific commands and steps to improve
- Historical tracking - Compare evaluations over time to measure progress
- Framework-agnostic - Works with .NET, Node.js, Python, and Java projects
Evaluate package freshness and security vulnerabilities.
What it checks:
- Outdated packages (major, minor, patch versions)
- Security vulnerabilities (critical, high, moderate, low)
- Framework-specific dependency commands
- Generates actionable update commands
How to use:
cd /path/to/agentic-qa
claude
# Then type:
"Evaluate this project /path/to/your-automation-project"Measure test execution speed and identify bottlenecks.
Planned checks:
- Average test duration
- Total suite execution time
- Slow test identification (>30s threshold)
- Parallelization opportunities
- Setup/teardown overhead
Evaluate test code maintainability using Clean Code principles.
Planned checks:
- Long methods (>50 lines)
- Large classes (>300 lines)
- Code duplication
- Magic numbers
- Deep nesting (>3 levels)
- Naming conventions
- Claude Code installed (CLI, Desktop, or Web)
- A test automation project to evaluate
- Project-specific CLI tools (dotnet, npm, pip, or mvn)
Option 1: Use as Standalone Tool (Recommended)
Clone this repository to use as a central evaluation workspace:
git clone https://github.com/lovelynfnt/agentic-qa.git
cd agentic-qaOption 2: Copy into Your Project
Copy the .claude directory into your test automation project:
cp -r agentic-qa/.claude /path/to/your-test-project/
cd /path/to/your-test-projectFrom Standalone Workspace:
cd agentic-qa
claude
# Then ask Claude:
"Evaluate this project /path/to/your-automation-project"From Your Project:
cd /path/to/your-test-project
claude
# Then ask Claude:
"Evaluate this project"- High-Level Pre-Check - Detects project type, test framework, test count
- Metric Selection - Choose which metric to evaluate
- Metric-Specific Pre-Check - Validates requirements for selected metric
- Run Evaluation - Executes analysis and calculates score
- Generate Report - Creates markdown report and JSON history
π€ Test Automation Evaluator v1.0
βββββββββββββββββββββββββββββββββββ
Analyzing project structure...
β
Detected: .NET project with SpecFlow
π Found: 42 test scenarios across 8 files
Ready to evaluate!
βββββββββββββββββββββββββββββββββββ
π Protocol: High-Level Pre-Check | π Step 1 of 5
[After selecting Dependency Health...]
## Dependency Health Evaluation Complete - Score: 85/100 π‘
### π¦ Package Analysis
- Total Packages: 42 packages (15 direct, 27 transitive)
- Outdated: 8 packages (2 major, 3 minor, 3 patch)
- Vulnerabilities: 1 security issue (High: 1)
### π¨ Critical Issues (2 found)
1. Selenium.WebDriver is 1 major version behind
- Current: 3.141.0 β Latest: 4.15.0
- Fix: `dotnet add package Selenium.WebDriver --version 4.15.0`
2. CVE-2023-12345: High vulnerability in Newtonsoft.Json
- Fix: Update to version 13.0.2 or higher
π Detailed report: .claude/evaluations/2026-05-28_091234_dependency-health.md
.claude/
βββ CLAUDE.md # Entry point (routes to evaluator)
βββ personas/
β βββ evaluator/
β βββ evaluator.md # Main orchestrator (Steps 1-2)
β βββ metrics/
β βββ dependency-health.md # Metric implementation
β βββ dependency-health-scoring.json
β βββ test-performance.md # Coming soon
β βββ test-performance-scoring.json
β βββ code-quality.md # Coming soon
β βββ code-quality-scoring.json
βββ evaluations/ # Generated reports
βββ history/ # JSON data for trends
Works with any test automation stack:
- β .NET (SpecFlow, NUnit, xUnit, MSTest)
- β JavaScript/TypeScript (Jest, Playwright, Cypress, Mocha)
- β Python (pytest, unittest, Robot Framework)
- β Java (JUnit, TestNG, Cucumber)
Evaluations are saved for trend analysis:
- Markdown reports - Human-readable, shareable (.md files)
- JSON history - Machine-readable for tracking trends (.json files)
- Configurable scoring - Adjust thresholds per metric
Run evaluations at the start of each sprint to identify technical debt and prioritize improvements.
Evaluate dependency security before major releases to ensure no critical vulnerabilities.
Use evaluation reports to help new QA engineers understand project health and improvement areas.
Regular evaluations show trends over time, helping teams measure the impact of quality initiatives.
Generate reports showing dependency health and security posture for compliance reviews.
Edit .claude/personas/evaluator/metrics/dependency-health-scoring.json:
{
"criteria": {
"outdated_major": {
"threshold": 0,
"severity": "critical",
"points_deduction": 20
},
"security_vulnerabilities": {
"points_deduction_per_vuln": {
"critical": 25,
"high": 15,
"moderate": 8,
"low": 3
}
}
}
}{
"grade_thresholds": {
"A": 90,
"B": 80,
"C": 70,
"D": 60,
"F": 0
}
}- Create
metrics/my-metric.mdwith evaluation instructions - Create
metrics/my-metric-scoring.jsonwith scoring rules - Update
evaluator.mdto include the new metric option
agentic-qa/
βββ README.md # This file
βββ .claude/
β βββ CLAUDE.md # Entry point
β βββ settings.json # Claude Code settings
β βββ personas/
β β βββ evaluator/
β β βββ evaluator.md # Main persona
β β βββ metrics/
β β βββ dependency-health.md
β β βββ dependency-health-scoring.json
β β βββ test-performance.md
β β βββ test-performance-scoring.json
β β βββ code-quality.md
β β βββ code-quality-scoring.json
β βββ evaluations/ # Generated reports
β βββ history/ # JSON data
βββ docs/ # Additional documentation (Coming Soon)
We welcome contributions! Whether it's:
- β¨ New metrics (test coverage, flakiness detection, etc.)
- π Documentation improvements
- π¨ Scoring templates for different project types
- π‘ Feature suggestions
- π Bug reports
- β Modular architecture with per-metric files
- β Dependency Health metric (complete)
- β Framework-agnostic detection (.NET, Node.js, Python, Java)
- β Structured 5-step evaluation flow
- β Historical tracking with JSON data
- β Visual protocol indicators (banners, step tracking)
- π§ Test Performance metric
- π§ Code Quality metric
- π§ "All Metrics" evaluation option
- π§ Trend comparison (current vs previous)
- π§ Grade improvement suggestions
Built with Claude Code by Anthropic.
Note: Agentic QA requires Claude Code and uses natural language commands (not custom slash commands). Type phrases like "Evaluate this project [path]" to activate the evaluator.