Sample Size Calculator

Medical device design verification and process validation sample size calculator compliant with ISO/TR 80002-2 standards.

Motivation

Test project for:

Spec-Driven Development (SDD) with coding-agents (I used Kiro, Antigravity, Opencode with different local llms)
The feasibility of self-validating software (complient with ISO/TR 80002-2)

Result and Current Status

Main functionality provided
Software validation still far from complete
Example applications are explained in a Jupyter Notebook (see ./testdata), including the generation of corresponding test data.

SDD development with coding agents is incredibly fast and virtually a must for any commercial software development. In some simple things, however, it's almost unbearable because of sheer stupidity. The problem here is that you often get “stuck” in the workflow (=lazy) and spend too much time looking for ways to solve it with the coding agent. But coding agents are improving so rapidly that this problem will quickly resolve itself.

The three encoding agents are in a neck-and-neck race. Especially with the latest update of the qwen3-coder-next LLM, my local assistant feels like it's on par with the “big ones”—I would even say that it works better in some ways. But this is constantly changing—at times, the same version worked better on Windows, then again on Linux. This shows how rapidly this technology is developing.

BUT

Without human support, even such a small, simple project is not possible. And SW validation and E2E testing with meaningful data are definitely not the strong points of current agents. In principle, however, the approach of self-validating software is feasible and, in my view, also more secure than other approaches - I'll stay on it.

Overview

The Sample Size Calculator is a Python-based web application for determining statistically valid sample sizes for medical device design verification and process validation. This is critical QMS (Quality Management System) software that ensures compliance with ISO/TR 80002-2 standards through comprehensive validation, hash-based integrity verification, and complete audit trail logging.

Key Features

Module A (Attribute Data Analysis): Binary Pass/Fail data analysis using Success Run Theorem and Cumulative Binomial Distribution
Module V (Variable Data Analysis): Continuous measurement analysis with strict 4-phase sequential workflow
- Phase 1: Specification definition and pilot data input with IQR-based outlier detection
- Phase 2: Outlier exclusion and automatic transformation cascade (Log → Box-Cox → Yeo-Johnson)
- Phase 3: Sample size calculation using parametric or non-parametric methods
- Phase 4: Final validation data analysis with tolerance interval calculation
SHA-256 Hash Verification: Ensures calculation engine integrity and validated state tracking
Comprehensive Audit Trail: All user interactions and system events logged with timestamps
Automated Validation Suite: IQ/OQ/PQ testing with Verification Traceability Matrix (VTM) generation
PDF Report Generation: User calculation reports, validation certificates, and comprehensive full reports
Method Transparency: Clear display of active mathematical paths and statistical methods

Installation

Prerequisites

Python 3.11 or higher
uv package manager

Local Installation

# Clone the repository
git clone <repository-url>
cd sample-size-calculator

# Install dependencies using uv
uv sync

# Install with development dependencies (for testing and validation)
uv sync --all-groups

Verify Installation

# Check Python version
uv run python --version

# Run a quick test
uv run pytest tests/validation/test_iq.py -q

Usage

Local Development

Start the application locally:

uv run python src/sample_size_calculator/main.py

The web interface will be available at http://localhost:8080

Module A: Attribute Data Analysis

Module A is designed for binary (Pass/Fail) test scenarios.

Workflow:

Navigate to the Module A tab
Enter Confidence Level (e.g., 95%)
Enter Reliability Level (e.g., 95%)
Enter Allowable Failures (c):
- Enter a specific value (e.g., 0, 1, 2) for single calculation
- Leave empty for sensitivity analysis (calculates for c=0, 1, 2, 3)
Click Calculate Sample Size
Review results showing required sample size
Click Generate PDF Report to create documentation

Example Use Case:

Confidence: 95%
Reliability: 95%
Allowable Failures: 0
Result: n = 59 samples (Success Run Theorem)

Module V: Variable Data Analysis

Module V provides a comprehensive 4-phase workflow for continuous measurement data.

Phase 1: Specification Definition and Pilot Data

Select Specification Type:
- One-Sided: Define either Lower Specification Limit (LSL) or Upper Specification Limit (USL)
- Two-Sided: Define both LSL and USL
Enter specification limits
Enter Confidence and Reliability levels
Input pilot data:
- Dataset Method: Enter comma-separated measurements
- Statistics Method: Enter estimated mean and standard deviation
Click Analyze Pilot Data
Review outlier detection results (Q1, Q3, IQR, flagged outliers)

Note: Pilot datasets with fewer than 30 points will trigger a validation warning.

Phase 2: Normality Testing and Transformation

Review detected outliers
Optionally exclude outliers (requires engineering rationale)
Choose transformation approach:
- Automatic Cascade: System tries Log → Box-Cox → Yeo-Johnson transformations
- Manual Override: Select specific transformation method
Click Process Normality Testing
Review results:
- Shapiro-Wilk p-values for each transformation
- Locked transformation method
- Analysis method (Parametric or Non-Parametric)

Transformation Cascade Logic:

If original data is normal (p > 0.05): Lock as Parametric
If not normal: Try Log transformation (requires all positive values)
If Log fails: Try Box-Cox transformation (requires all positive values)
If Box-Cox fails: Try Yeo-Johnson transformation (handles zero/negative values)
If all fail: Lock as Non-Parametric (Wilks method)

Phase 3: Sample Size Calculation

Review locked method and specification type
Click Calculate Required Sample Size
Review results:
- Capability margin (k_margin)
- Tolerance factor (k_factor)
- Required sample size (N)
- Formula used (e.g., Howe-Guenther Approximation)

Phase 4: Final Validation and Tolerance Limits

Collect final validation dataset of size N
Enter final data (comma-separated)
Click Calculate Tolerance Limits
Review results:
- Tolerance limits in transformed and original space
- Comparison to specification limits
- Pass/Fail determination
- Process capability index (Ppk) for parametric methods
Click Generate PDF Report to document results

Running Validation (IQ/OQ/PQ)

The application includes a built-in validation runner accessible from the UI:

Click the Run Full Validation (IQ/OQ/PQ) button in the UI header
Enter your name as the validation tester
Click Start Validation
Monitor progress as the system runs:
- IQ (Installation Qualification): Verifies dependencies and installation
- OQ (Operational Qualification): Tests all calculation formulas against known values
- PQ (Performance Qualification): Runs end-to-end UI tests (skipped when app is running)
Review validation results
Download the validation certificate from ./reports/validation/

Note: PQ tests are automatically skipped when running validation from the UI since they require the application to be stopped. For complete validation including PQ tests, use the command-line approach below.

Command-Line Validation

For complete validation including PQ tests:

# Stop the application first
# Then run the validation script
uv run python scripts/run_validation.py --tester "Your Name"

This generates:

Validation certificate PDF in ./reports/validation/
Verification Traceability Matrix (VTM) CSV
Updates validated hash in config/validated_hash.json

Docker Deployment

Quick Start

# Build and start the container
docker compose up -d

# View logs
docker compose logs -f

# Stop the container
docker compose down

The application will be available at http://localhost:8080 (or custom port via PORT environment variable).

Configuration

Create a .env file in the project root to customize deployment:

PORT=8080
LOG_LEVEL=INFO
LOG_RETENTION_DAYS=90

Volume Mounts

The docker-compose configuration automatically mounts:

./logs: Audit trail logs (read/write)
./config: Configuration files including validated_hash.json (read-only)
./reports: Generated PDF reports (read/write)

All reports and logs persist across container restarts.

Health Checks

The Docker container includes automatic health checks:

Endpoint: http://localhost:8080/
Interval: 30 seconds
Timeout: 10 seconds
Retries: 3

Check container health:

docker compose ps

Playwright Support

The Docker image includes Chromium and all dependencies required for automated UI testing (PQ tests). This ensures the validation suite can run completely within the container.

Reports Directory Structure

All generated reports are organized in the ./reports/ directory:

reports/
├── validation/     # IQ/OQ/PQ validation certificates
├── calculations/   # Sample size calculation reports
└── full/          # Comprehensive full reports

Report Types

Calculation Reports (`./reports/calculations/`)

Generated when you click "Generate PDF Report" after completing a calculation. Includes:

Timestamp and session information
All input parameters
Calculated results
Statistical method used
Engine hash and validation state

Naming: calculation_report_YYYYMMDD_HHMMSS.pdf

Validation Certificates (`./reports/validation/`)

Generated by the IQ/OQ/PQ validation suite. Includes:

Test execution date and tester name
System information (OS, Python version)
Complete test results with URS traceability
Verification Traceability Matrix (VTM)
Validated engine hash

Naming: validation_certificate_YYYYMMDD_HHMMSS.pdf

Full Reports (`./reports/full/`)

Comprehensive reports combining:

Current calculation report
Latest validation certificates
Audit trail logs (filtered for session)
Calculator signature (engine hash and validation state)

Naming: full_report_YYYYMMDD_HHMMSS.pdf

Generating Full Reports

Click the Generate Full Report button in the UI after completing a calculation to create a comprehensive report with complete traceability.

Development

Running Tests

# Run all tests
uv run pytest -q

# Run specific test suites
uv run pytest tests/validation/test_iq.py -q  # Installation Qualification
uv run pytest tests/validation/test_oq.py -q  # Operational Qualification
uv run pytest tests/validation/test_pq.py -q  # Performance Qualification

# Run property-based tests
uv run pytest tests/property/ -q

# Run with coverage
uv run pytest --cov=src/sample_size_calculator --cov-report=html

Code Quality

# Run linter
uv run ruff check src/

# Format code
uv run ruff format src/

# Type checking
uv run ty check src/

Adding Dependencies

# Add a runtime dependency
uv add <package-name>

# Add a development dependency
uv add --group dev <package-name>

# Sync dependencies after changes
uv sync

Architecture Overview

High-Level Components

┌─────────────────────────────────────────────────────────┐
│                    NiceGUI Web Interface                 │
│                  (Module A | Module V)                   │
└─────────────────────────────────────────────────────────┘
                            │
┌─────────────────────────────────────────────────────────┐
│                    UI Controller                         │
│         (Session Management, Workflow Enforcement)       │
└─────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
┌───────▼────────┐  ┌──────▼──────┐  ┌────────▼────────┐
│  Calculation   │  │Transformation│  │    Tolerance    │
│    Engine      │  │    Engine    │  │   Calculator    │
└────────────────┘  └──────────────┘  └─────────────────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
┌───────▼────────┐  ┌──────▼──────┐  ┌────────▼────────┐
│  Hash Verifier │  │Audit Logger │  │Report Generator │
└────────────────┘  └──────────────┘  └─────────────────┘

Key Design Principles

Single Source of Truth: Pydantic models define all data structures
Sequential Workflow Enforcement: UI prevents phase-skipping in Module V
Method Transparency: Active mathematical paths clearly displayed
Validation-First: Hash-based verification ensures calculation integrity
Audit Trail: Comprehensive logging of all interactions
Reproducibility: Deterministic calculations with locked transformations

Technology Stack

Web Framework: NiceGUI (Python reactive UI)
Calculation Engine: NumPy, SciPy (statistical computations)
Data Validation: Pydantic (models and validation)
PDF Generation: ReportLab (reports and certificates)
Testing: pytest (unit/OQ), playwright (UI/PQ), hypothesis (property-based)
Logging: Python logging with rotation
Deployment: Docker Compose
Package Management: uv (hash-based lockfile)

Troubleshooting

Application Won't Start

Issue: Application fails to start or shows import errors

Solution:

# Ensure dependencies are installed
uv sync --all-groups

# Check Python version (requires 3.11+)
uv run python --version

# Check for port conflicts
lsof -i :8080  # On Unix/Linux/Mac
netstat -ano | findstr :8080  # On Windows

Validation Tests Failing

Issue: IQ/OQ/PQ tests fail during validation

Solution:

# Check dependency versions
uv run pip list

# Ensure scipy version is 1.x.x
uv run python -c "import scipy; print(scipy.__version__)"

# Run tests individually to identify failures
uv run pytest tests/validation/test_iq.py -v
uv run pytest tests/validation/test_oq.py -v

Docker Container Issues

Issue: Container fails health checks or won't start

Solution:

# Check container logs
docker compose logs

# Rebuild container
docker compose down
docker compose build --no-cache
docker compose up -d

# Verify volume permissions
ls -la ./logs ./reports ./config

Validation State Shows "NO"

Issue: Reports show "VALIDATED STATE: NO - UNVERIFIED CHANGE"

Solution: This indicates the calculation engine has been modified since the last validation. To restore validated state:

Review changes to src/sample_size_calculator/calculations.py

If changes are intentional, run full validation:

uv run python scripts/run_validation.py --tester "Your Name"

This will update the validated hash in config/validated_hash.json

Reports Not Generating

Issue: PDF reports fail to generate or save

Solution:

# Check reports directory permissions
ls -la ./reports

# Ensure subdirectories exist
mkdir -p ./reports/validation ./reports/calculations ./reports/full

# Check disk space
df -h  # On Unix/Linux/Mac

Transformation Cascade Issues

Issue: Module V Phase 2 fails or locks as Non-Parametric unexpectedly

Solution:

Ensure pilot data has at least 3 data points
Check for non-numeric values in dataset
For Log/Box-Cox transformations, ensure all values are positive
Review Shapiro-Wilk p-values in the UI
Consider using manual override to select specific transformation

Playwright Tests Failing

Issue: PQ tests fail with browser errors

Solution:

# Install Playwright browsers
uv run playwright install --with-deps chromium

# For Docker, rebuild the image
docker compose build --no-cache

Log Files Growing Too Large

Issue: Log directory consuming excessive disk space

Solution:

Logs automatically rotate at 10MB per file
Retention is 90 days by default
Adjust retention in .env file:
```
LOG_RETENTION_DAYS=30
```

Manually clean old logs:

find ./logs -name "*.log.*" -mtime +30 -delete

Project Structure

sample-size-calculator/
├── src/
│   └── sample_size_calculator/
│       ├── __init__.py
│       ├── main.py                    # Application entry point
│       ├── models.py                  # Pydantic data models
│       ├── calculations.py            # Core calculation engine
│       ├── transformations.py         # Data transformation engine
│       ├── outliers.py                # IQR outlier detection
│       ├── normality.py               # Shapiro-Wilk testing
│       ├── tolerance.py               # Tolerance interval calculations
│       ├── hash_verifier.py           # SHA-256 verification
│       ├── audit_logger.py            # Audit trail logging
│       ├── report_generator.py        # PDF report generation
│       ├── full_report_generator.py   # Comprehensive report generation
│       ├── vtm_generator.py           # VTM generation
│       ├── ui_controller.py           # NiceGUI interface
│       ├── validation_runner.py       # IQ/OQ/PQ runner
│       └── report_paths.py            # Report path management
├── tests/
│   ├── property/                      # Property-based tests (Hypothesis)
│   ├── validation/                    # IQ/OQ/PQ validation tests
│   │   ├── test_iq.py                # Installation Qualification
│   │   ├── test_oq.py                # Operational Qualification
│   │   └── test_pq.py                # Performance Qualification
│   └── test_*.py                      # Unit and integration tests
├── config/
│   └── validated_hash.json            # Validated engine hash storage
├── logs/
│   └── audit.log                      # Audit trail logs (rotated)
├── reports/
│   ├── validation/                    # Validation certificates
│   ├── calculations/                  # Calculation reports
│   └── full/                          # Full comprehensive reports
├── scripts/
│   └── run_validation.py              # Validation runner script
├── docker-compose.yml                 # Docker Compose configuration
├── Dockerfile                         # Docker image definition
├── pyproject.toml                     # Project metadata and dependencies
├── uv.lock                            # Locked dependency versions
└── README.md                          # This file

Compliance and Validation

This application is designed for use in regulated medical device environments and follows ISO/TR 80002-2 guidelines for computer software validation.

Validation Approach

IQ (Installation Qualification): Verifies correct installation and dependency versions
OQ (Operational Qualification): Tests all mathematical formulas against known standard values
PQ (Performance Qualification): End-to-end UI testing with realistic workflows

Traceability

All requirements linked to test cases via URS markers
Verification Traceability Matrix (VTM) generated automatically
Complete audit trail of all user interactions
SHA-256 hash verification of calculation engine

Audit Trail

All events are logged with:

ISO 8601 timestamps
Session identifiers
Event types and context
Input/output values
Validation results

Logs are stored in ./logs/ with 90-day retention and automatic rotation.

License

See LICENSE file for details.

Support

For issues, questions, or contributions, please refer to the project repository.

Version: 0.1.0
Last Updated: 2026.02.26
Compliance: ISO/TR 80002-2

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.kiro/specs/sample-size-calculator		.kiro/specs/sample-size-calculator
config		config
notebooks		notebooks
reports		reports
requirements		requirements
scripts		scripts
src/sample_size_calculator		src/sample_size_calculator
testdata		testdata
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Sample Size Calculator

Motivation

Result and Current Status

Overview

Key Features

Installation

Recommended

Prerequisites

Local Installation

Verify Installation

Usage

Local Development

Module A: Attribute Data Analysis

Module V: Variable Data Analysis

Phase 1: Specification Definition and Pilot Data

Phase 2: Normality Testing and Transformation

Phase 3: Sample Size Calculation

Phase 4: Final Validation and Tolerance Limits

Running Validation (IQ/OQ/PQ)

Command-Line Validation

Docker Deployment

Quick Start

Configuration

Volume Mounts

Health Checks

Playwright Support

Reports Directory Structure

Report Types

Calculation Reports (./reports/calculations/)

Validation Certificates (./reports/validation/)

Full Reports (./reports/full/)

Generating Full Reports

Development

Running Tests

Code Quality

Adding Dependencies

Architecture Overview

High-Level Components

Key Design Principles

Technology Stack

Troubleshooting

Application Won't Start

Validation Tests Failing

Docker Container Issues

Validation State Shows "NO"

Reports Not Generating

Transformation Cascade Issues

Playwright Tests Failing

Log Files Growing Too Large

Project Structure

Compliance and Validation

Validation Approach

Traceability

Audit Trail

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Calculation Reports (`./reports/calculations/`)

Validation Certificates (`./reports/validation/`)

Full Reports (`./reports/full/`)

Packages