Medical device design verification and process validation sample size calculator compliant with ISO/TR 80002-2 standards.
Test project for:
- Spec-Driven Development (SDD) with coding-agents (I used Kiro, Antigravity, Opencode with different local llms)
- The feasibility of self-validating software (complient with ISO/TR 80002-2)
- Main functionality provided
- Software validation still far from complete
- Example applications are explained in a Jupyter Notebook (see ./testdata), including the generation of corresponding test data.
SDD development with coding agents is incredibly fast and virtually a must for any commercial software development. In some simple things, however, it's almost unbearable because of sheer stupidity. The problem here is that you often get “stuck” in the workflow (=lazy) and spend too much time looking for ways to solve it with the coding agent. But coding agents are improving so rapidly that this problem will quickly resolve itself.
The three encoding agents are in a neck-and-neck race. Especially with the latest update of the qwen3-coder-next LLM, my local assistant feels like it's on par with the “big ones”—I would even say that it works better in some ways. But this is constantly changing—at times, the same version worked better on Windows, then again on Linux. This shows how rapidly this technology is developing.
BUT
Without human support, even such a small, simple project is not possible. And SW validation and E2E testing with meaningful data are definitely not the strong points of current agents. In principle, however, the approach of self-validating software is feasible and, in my view, also more secure than other approaches - I'll stay on it.
The Sample Size Calculator is a Python-based web application for determining statistically valid sample sizes for medical device design verification and process validation. This is critical QMS (Quality Management System) software that ensures compliance with ISO/TR 80002-2 standards through comprehensive validation, hash-based integrity verification, and complete audit trail logging.
- Module A (Attribute Data Analysis): Binary Pass/Fail data analysis using Success Run Theorem and Cumulative Binomial Distribution
- Module V (Variable Data Analysis): Continuous measurement analysis with strict 4-phase sequential workflow
- Phase 1: Specification definition and pilot data input with IQR-based outlier detection
- Phase 2: Outlier exclusion and automatic transformation cascade (Log → Box-Cox → Yeo-Johnson)
- Phase 3: Sample size calculation using parametric or non-parametric methods
- Phase 4: Final validation data analysis with tolerance interval calculation
- SHA-256 Hash Verification: Ensures calculation engine integrity and validated state tracking
- Comprehensive Audit Trail: All user interactions and system events logged with timestamps
- Automated Validation Suite: IQ/OQ/PQ testing with Verification Traceability Matrix (VTM) generation
- PDF Report Generation: User calculation reports, validation certificates, and comprehensive full reports
- Method Transparency: Clear display of active mathematical paths and statistical methods
Docker Compose (no need to install playwright)
# build it
docker compose build
# start it
docker compose up -d
# optional validate it - need some time
docker compose exec sample-size-calculator uv run python scripts/run_validation.py --tester "Your Name"
# connect to it / use it
http://localhost:8080
# shut down
docker compose down- Python 3.11 or higher
- uv package manager
# Clone the repository
git clone <repository-url>
cd sample-size-calculator
# Install dependencies using uv
uv sync
# Install with development dependencies (for testing and validation)
uv sync --all-groups# Check Python version
uv run python --version
# Run a quick test
uv run pytest tests/validation/test_iq.py -qStart the application locally:
uv run python src/sample_size_calculator/main.pyThe web interface will be available at http://localhost:8080
Module A is designed for binary (Pass/Fail) test scenarios.
Workflow:
- Navigate to the Module A tab
- Enter Confidence Level (e.g., 95%)
- Enter Reliability Level (e.g., 95%)
- Enter Allowable Failures (c):
- Enter a specific value (e.g., 0, 1, 2) for single calculation
- Leave empty for sensitivity analysis (calculates for c=0, 1, 2, 3)
- Click Calculate Sample Size
- Review results showing required sample size
- Click Generate PDF Report to create documentation
Example Use Case:
- Confidence: 95%
- Reliability: 95%
- Allowable Failures: 0
- Result: n = 59 samples (Success Run Theorem)
Module V provides a comprehensive 4-phase workflow for continuous measurement data.
- Select Specification Type:
- One-Sided: Define either Lower Specification Limit (LSL) or Upper Specification Limit (USL)
- Two-Sided: Define both LSL and USL
- Enter specification limits
- Enter Confidence and Reliability levels
- Input pilot data:
- Dataset Method: Enter comma-separated measurements
- Statistics Method: Enter estimated mean and standard deviation
- Click Analyze Pilot Data
- Review outlier detection results (Q1, Q3, IQR, flagged outliers)
Note: Pilot datasets with fewer than 30 points will trigger a validation warning.
- Review detected outliers
- Optionally exclude outliers (requires engineering rationale)
- Choose transformation approach:
- Automatic Cascade: System tries Log → Box-Cox → Yeo-Johnson transformations
- Manual Override: Select specific transformation method
- Click Process Normality Testing
- Review results:
- Shapiro-Wilk p-values for each transformation
- Locked transformation method
- Analysis method (Parametric or Non-Parametric)
Transformation Cascade Logic:
- If original data is normal (p > 0.05): Lock as Parametric
- If not normal: Try Log transformation (requires all positive values)
- If Log fails: Try Box-Cox transformation (requires all positive values)
- If Box-Cox fails: Try Yeo-Johnson transformation (handles zero/negative values)
- If all fail: Lock as Non-Parametric (Wilks method)
- Review locked method and specification type
- Click Calculate Required Sample Size
- Review results:
- Capability margin (k_margin)
- Tolerance factor (k_factor)
- Required sample size (N)
- Formula used (e.g., Howe-Guenther Approximation)
- Collect final validation dataset of size N
- Enter final data (comma-separated)
- Click Calculate Tolerance Limits
- Review results:
- Tolerance limits in transformed and original space
- Comparison to specification limits
- Pass/Fail determination
- Process capability index (Ppk) for parametric methods
- Click Generate PDF Report to document results
The application includes a built-in validation runner accessible from the UI:
- Click the Run Full Validation (IQ/OQ/PQ) button in the UI header
- Enter your name as the validation tester
- Click Start Validation
- Monitor progress as the system runs:
- IQ (Installation Qualification): Verifies dependencies and installation
- OQ (Operational Qualification): Tests all calculation formulas against known values
- PQ (Performance Qualification): Runs end-to-end UI tests (skipped when app is running)
- Review validation results
- Download the validation certificate from
./reports/validation/
Note: PQ tests are automatically skipped when running validation from the UI since they require the application to be stopped. For complete validation including PQ tests, use the command-line approach below.
For complete validation including PQ tests:
# Stop the application first
# Then run the validation script
uv run python scripts/run_validation.py --tester "Your Name"This generates:
- Validation certificate PDF in
./reports/validation/ - Verification Traceability Matrix (VTM) CSV
- Updates validated hash in
config/validated_hash.json
# Build and start the container
docker compose up -d
# View logs
docker compose logs -f
# Stop the container
docker compose downThe application will be available at http://localhost:8080 (or custom port via PORT environment variable).
Create a .env file in the project root to customize deployment:
PORT=8080
LOG_LEVEL=INFO
LOG_RETENTION_DAYS=90The docker-compose configuration automatically mounts:
./logs: Audit trail logs (read/write)./config: Configuration files including validated_hash.json (read-only)./reports: Generated PDF reports (read/write)
All reports and logs persist across container restarts.
The Docker container includes automatic health checks:
- Endpoint:
http://localhost:8080/ - Interval: 30 seconds
- Timeout: 10 seconds
- Retries: 3
Check container health:
docker compose psThe Docker image includes Chromium and all dependencies required for automated UI testing (PQ tests). This ensures the validation suite can run completely within the container.
All generated reports are organized in the ./reports/ directory:
reports/
├── validation/ # IQ/OQ/PQ validation certificates
├── calculations/ # Sample size calculation reports
└── full/ # Comprehensive full reports
Generated when you click "Generate PDF Report" after completing a calculation. Includes:
- Timestamp and session information
- All input parameters
- Calculated results
- Statistical method used
- Engine hash and validation state
Naming: calculation_report_YYYYMMDD_HHMMSS.pdf
Generated by the IQ/OQ/PQ validation suite. Includes:
- Test execution date and tester name
- System information (OS, Python version)
- Complete test results with URS traceability
- Verification Traceability Matrix (VTM)
- Validated engine hash
Naming: validation_certificate_YYYYMMDD_HHMMSS.pdf
Comprehensive reports combining:
- Current calculation report
- Latest validation certificates
- Audit trail logs (filtered for session)
- Calculator signature (engine hash and validation state)
Naming: full_report_YYYYMMDD_HHMMSS.pdf
Click the Generate Full Report button in the UI after completing a calculation to create a comprehensive report with complete traceability.
# Run all tests
uv run pytest -q
# Run specific test suites
uv run pytest tests/validation/test_iq.py -q # Installation Qualification
uv run pytest tests/validation/test_oq.py -q # Operational Qualification
uv run pytest tests/validation/test_pq.py -q # Performance Qualification
# Run property-based tests
uv run pytest tests/property/ -q
# Run with coverage
uv run pytest --cov=src/sample_size_calculator --cov-report=html# Run linter
uv run ruff check src/
# Format code
uv run ruff format src/
# Type checking
uv run ty check src/# Add a runtime dependency
uv add <package-name>
# Add a development dependency
uv add --group dev <package-name>
# Sync dependencies after changes
uv sync┌─────────────────────────────────────────────────────────┐
│ NiceGUI Web Interface │
│ (Module A | Module V) │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ UI Controller │
│ (Session Management, Workflow Enforcement) │
└─────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌───────▼────────┐ ┌──────▼──────┐ ┌────────▼────────┐
│ Calculation │ │Transformation│ │ Tolerance │
│ Engine │ │ Engine │ │ Calculator │
└────────────────┘ └──────────────┘ └─────────────────┘
│ │ │
└───────────────────┼───────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌───────▼────────┐ ┌──────▼──────┐ ┌────────▼────────┐
│ Hash Verifier │ │Audit Logger │ │Report Generator │
└────────────────┘ └──────────────┘ └─────────────────┘
- Single Source of Truth: Pydantic models define all data structures
- Sequential Workflow Enforcement: UI prevents phase-skipping in Module V
- Method Transparency: Active mathematical paths clearly displayed
- Validation-First: Hash-based verification ensures calculation integrity
- Audit Trail: Comprehensive logging of all interactions
- Reproducibility: Deterministic calculations with locked transformations
- Web Framework: NiceGUI (Python reactive UI)
- Calculation Engine: NumPy, SciPy (statistical computations)
- Data Validation: Pydantic (models and validation)
- PDF Generation: ReportLab (reports and certificates)
- Testing: pytest (unit/OQ), playwright (UI/PQ), hypothesis (property-based)
- Logging: Python logging with rotation
- Deployment: Docker Compose
- Package Management: uv (hash-based lockfile)
Issue: Application fails to start or shows import errors
Solution:
# Ensure dependencies are installed
uv sync --all-groups
# Check Python version (requires 3.11+)
uv run python --version
# Check for port conflicts
lsof -i :8080 # On Unix/Linux/Mac
netstat -ano | findstr :8080 # On WindowsIssue: IQ/OQ/PQ tests fail during validation
Solution:
# Check dependency versions
uv run pip list
# Ensure scipy version is 1.x.x
uv run python -c "import scipy; print(scipy.__version__)"
# Run tests individually to identify failures
uv run pytest tests/validation/test_iq.py -v
uv run pytest tests/validation/test_oq.py -vIssue: Container fails health checks or won't start
Solution:
# Check container logs
docker compose logs
# Rebuild container
docker compose down
docker compose build --no-cache
docker compose up -d
# Verify volume permissions
ls -la ./logs ./reports ./configIssue: Reports show "VALIDATED STATE: NO - UNVERIFIED CHANGE"
Solution: This indicates the calculation engine has been modified since the last validation. To restore validated state:
- Review changes to
src/sample_size_calculator/calculations.py - If changes are intentional, run full validation:
uv run python scripts/run_validation.py --tester "Your Name" - This will update the validated hash in
config/validated_hash.json
Issue: PDF reports fail to generate or save
Solution:
# Check reports directory permissions
ls -la ./reports
# Ensure subdirectories exist
mkdir -p ./reports/validation ./reports/calculations ./reports/full
# Check disk space
df -h # On Unix/Linux/MacIssue: Module V Phase 2 fails or locks as Non-Parametric unexpectedly
Solution:
- Ensure pilot data has at least 3 data points
- Check for non-numeric values in dataset
- For Log/Box-Cox transformations, ensure all values are positive
- Review Shapiro-Wilk p-values in the UI
- Consider using manual override to select specific transformation
Issue: PQ tests fail with browser errors
Solution:
# Install Playwright browsers
uv run playwright install --with-deps chromium
# For Docker, rebuild the image
docker compose build --no-cacheIssue: Log directory consuming excessive disk space
Solution:
- Logs automatically rotate at 10MB per file
- Retention is 90 days by default
- Adjust retention in
.envfile:LOG_RETENTION_DAYS=30
- Manually clean old logs:
find ./logs -name "*.log.*" -mtime +30 -delete
sample-size-calculator/
├── src/
│ └── sample_size_calculator/
│ ├── __init__.py
│ ├── main.py # Application entry point
│ ├── models.py # Pydantic data models
│ ├── calculations.py # Core calculation engine
│ ├── transformations.py # Data transformation engine
│ ├── outliers.py # IQR outlier detection
│ ├── normality.py # Shapiro-Wilk testing
│ ├── tolerance.py # Tolerance interval calculations
│ ├── hash_verifier.py # SHA-256 verification
│ ├── audit_logger.py # Audit trail logging
│ ├── report_generator.py # PDF report generation
│ ├── full_report_generator.py # Comprehensive report generation
│ ├── vtm_generator.py # VTM generation
│ ├── ui_controller.py # NiceGUI interface
│ ├── validation_runner.py # IQ/OQ/PQ runner
│ └── report_paths.py # Report path management
├── tests/
│ ├── property/ # Property-based tests (Hypothesis)
│ ├── validation/ # IQ/OQ/PQ validation tests
│ │ ├── test_iq.py # Installation Qualification
│ │ ├── test_oq.py # Operational Qualification
│ │ └── test_pq.py # Performance Qualification
│ └── test_*.py # Unit and integration tests
├── config/
│ └── validated_hash.json # Validated engine hash storage
├── logs/
│ └── audit.log # Audit trail logs (rotated)
├── reports/
│ ├── validation/ # Validation certificates
│ ├── calculations/ # Calculation reports
│ └── full/ # Full comprehensive reports
├── scripts/
│ └── run_validation.py # Validation runner script
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # Docker image definition
├── pyproject.toml # Project metadata and dependencies
├── uv.lock # Locked dependency versions
└── README.md # This file
This application is designed for use in regulated medical device environments and follows ISO/TR 80002-2 guidelines for computer software validation.
- IQ (Installation Qualification): Verifies correct installation and dependency versions
- OQ (Operational Qualification): Tests all mathematical formulas against known standard values
- PQ (Performance Qualification): End-to-end UI testing with realistic workflows
- All requirements linked to test cases via URS markers
- Verification Traceability Matrix (VTM) generated automatically
- Complete audit trail of all user interactions
- SHA-256 hash verification of calculation engine
All events are logged with:
- ISO 8601 timestamps
- Session identifiers
- Event types and context
- Input/output values
- Validation results
Logs are stored in ./logs/ with 90-day retention and automatic rotation.
See LICENSE file for details.
For issues, questions, or contributions, please refer to the project repository.
Version: 0.1.0
Last Updated: 2026.02.26
Compliance: ISO/TR 80002-2