Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5c97555
Add seed and validate design spec
randoneering Apr 16, 2026
b0844fb
feat(seed): rewrite 01_seed_static_checks with full structural seeds
randoneering Apr 16, 2026
637de4f
fix(seed): drop pgfirstaid_seed_role before create for clean re-runs
randoneering Apr 16, 2026
71b7f5c
fix(seed): address code review feedback on 01_seed_static_checks
randoneering Apr 16, 2026
f132157
initial commit for testing harness
randoneering Apr 16, 2026
350bcf5
feat(seed): scaffold seed_and_validate.py with CLI and connection hel…
randoneering Apr 16, 2026
168a0a6
feat(seed): add threshold patching for test-sized thresholds
randoneering Apr 16, 2026
c8efbb8
feat(seed): add database lifecycle functions (create, drop, install)
randoneering Apr 16, 2026
da17007
feat(seed): add seed file runners and replication slot guard
randoneering Apr 16, 2026
996ce49
feat(seed): add live session thread functions and wait helpers
randoneering Apr 16, 2026
9b30f1a
feat(seed): add validation logic, expected check sets, and reporting
randoneering Apr 16, 2026
290cc5e
feat(seed): wire main() orchestrator with full seed-validate-cleanup …
randoneering Apr 16, 2026
a2f923e
fix(seed): handle pg_stat_statements installed-but-not-loadable edge …
randoneering Apr 19, 2026
c887825
chore: uv.lock
randoneering Apr 19, 2026
ecc3708
feat(seed): add --managed flag to test view_pgFirstAid_managed.sql
randoneering Apr 19, 2026
3163988
resolve false flags and skipped tests
randoneering Apr 19, 2026
56856c8
added seed/validation testing into workflow
randoneering Apr 19, 2026
0bd730a
resolving comments/reviews from greptile/qodo
randoneering Apr 21, 2026
bf1b3bb
runner is now nixos, adjusting workflows
randoneering Apr 21, 2026
7c2cc0d
uv added to runner
randoneering Apr 21, 2026
4db33c5
further troubleshooting
randoneering Apr 21, 2026
ff697da
uv at shell
randoneering Apr 21, 2026
396758d
changing env for nix runner
randoneering Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This repo keeps `managed-db-validate.yml` as a reusable validation workflow.

`managed-db-validate.yml` installs `pgFirstAid.sql`, recreates `view_pgFirstAid_managed.sql`, and runs integration tests, including the pgTAP-backed checks in the integration harness.
`managed-db-validate.yml` installs `pgFirstAid.sql`, recreates `view_pgFirstAid_managed.sql`, runs integration tests including the pgTAP-backed checks, and then runs `testing/seed_and_validate.py --managed` against the same target.

## Supported connection modes

Expand Down
25 changes: 18 additions & 7 deletions .github/workflows/integration-pg-matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:

defaults:
run:
shell: bash -l {0}
working-directory: testing/integration

env:
Expand All @@ -48,13 +49,12 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.14"

- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Add Nix profile paths
run: |
echo "/run/current-system/sw/bin" >> "$GITHUB_PATH"
echo "/nix/var/nix/profiles/default/bin" >> "$GITHUB_PATH"
echo "$HOME/.nix-profile/bin" >> "$GITHUB_PATH"
echo "/etc/profiles/per-user/$USER/bin" >> "$GITHUB_PATH"

- name: Validate required PG env vars
run: |
Expand All @@ -77,6 +77,14 @@ jobs:
fi
psql --version

- name: Verify uv is installed on runner
run: |
if ! command -v uv >/dev/null 2>&1; then
echo "::error::uv not found on runner. Install uv on the self-hosted NixOS runner."
exit 1
fi
uv --version

- name: Sync dependencies
run: uv sync

Expand All @@ -91,3 +99,6 @@ jobs:

- name: Run integration tests
run: uv run python -m pytest tests/integration -m integration

- name: Run seed and validate harness
run: uv run python ../seed_and_validate.py --managed
27 changes: 19 additions & 8 deletions .github/workflows/managed-db-validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ jobs:
contents: read
defaults:
run:
shell: bash -l {0}
working-directory: testing/integration
env:
PGHOST: ${{ inputs.pg_host }}
Expand All @@ -79,6 +80,13 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- name: Add Nix profile paths
run: |
echo "/run/current-system/sw/bin" >> "$GITHUB_PATH"
echo "/nix/var/nix/profiles/default/bin" >> "$GITHUB_PATH"
echo "$HOME/.nix-profile/bin" >> "$GITHUB_PATH"
echo "/etc/profiles/per-user/$USER/bin" >> "$GITHUB_PATH"

- name: Configure AWS credentials
if: ${{ inputs.cloud_provider == 'aws' }}
uses: aws-actions/configure-aws-credentials@v4
Expand Down Expand Up @@ -127,14 +135,6 @@ jobs:

echo "PGHOST=$host" >> "$GITHUB_ENV"

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.14"

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Validate required PG env vars
run: |
missing=0
Expand All @@ -156,6 +156,14 @@ jobs:
fi
psql --version

- name: Verify uv is installed on runner
run: |
if ! command -v uv >/dev/null 2>&1; then
echo "::error::uv not found on runner. Install uv on the self-hosted NixOS runner."
exit 1
fi
uv --version

- name: Sync dependencies
run: uv sync

Expand All @@ -169,3 +177,6 @@ jobs:

- name: Run integration tests (includes pgTAP suite)
run: uv run python -m pytest tests/integration -m integration

- name: Run seed and validate harness
run: uv run python ../seed_and_validate.py --managed
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ pgFirstAid is designed to be lightweight and safe to run on production systems:
- Query and health-check coverage is validated with pgTAP assertions grouped by severity.
- Integration tests cover live runtime behavior, function/view parity, and checks that need concurrent sessions or timing control.
- A coverage guard ensures every `check_name` in `pgFirstAid.sql` is referenced by at least one pgTAP assertion.
- The GitHub Actions validation workflows also run `testing/seed_and_validate.py --managed` against live database targets to confirm the seeded checks actually fire end-to-end.
- Managed database validation is exercised through the reusable workflow in `.github/workflows/managed-db-validate.yml`.

> **Important:** We currently validate managed-database testing against AWS, but we do not have the funding or credits needed to keep Azure and GCP test environments running. If you have access to Azure Database for PostgreSQL or GCP Cloud SQL and want to help validate pgFirstAid there, we would be happy to have the help.
Expand Down
188 changes: 188 additions & 0 deletions docs/superpowers/specs/2026-04-15-seed-and-validate-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# pgFirstAid Seed & Validation Script — Design Spec

**Date:** 2026-04-15
**Branch:** feature/load_testing
**Status:** Approved

---

## Goal

A Python script that creates a throwaway PostgreSQL database, seeds it with data that triggers every health check in `pgFirstAid.sql`, runs the function, and reports which checks fired vs. which were missing. Exit code 0 = all expected checks fired; exit code 1 = gaps found.

---

## Entry Point

**`testing/seed_and_validate.py`**

Single script. No third-party dependencies beyond `psycopg` (psycopg3). Invoked as:

```bash
python testing/seed_and_validate.py [--host localhost] [--port 5432] [--user postgres]
```

Connection parameters default to standard env vars (`PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`) with CLI overrides.

---

## Database Lifecycle

1. Connect to `postgres` (maintenance database) as superuser
2. Drop `pgfirstaid_test` if it exists
3. Create `pgfirstaid_test`
4. Run all seeding against `pgfirstaid_test`
5. Drop `pgfirstaid_test` on exit (success or failure) via a `finally` block

---

## Threshold Patching

`pgFirstAid.sql` contains thresholds that are impractical in a test environment. The script reads the file, applies regex substitutions in memory, and installs the patched function into the test DB. The original file is never modified.

| Check | Original threshold | Test threshold |
|---|---|---|
| Unused Large Index | `> 104857600` (100MB) | `> 8192` (8KB) |
| Tables larger than 100GB | `> 107374182400` | `> 1048576` (1MB) |
| Tables larger than 50GB | `between 53687091200 and 107374182400` | `between 524288 and 1048576` |

---

## SQL Seed Files

Located in `testing/healthcheck_seed/`. Each file is idempotent and targets the `pgfirstaid_seed` schema.

### `01_seed_static_checks.sql`

Seeds all structural checks that fire from schema state alone:

| Check triggered | Seeded object |
|---|---|
| Missing Primary Key (CRITICAL) | Table with no PK |
| Unused Large Index (CRITICAL) | Index on a table that is never scanned, sized above 8KB threshold |
| Duplicate Index (HIGH) | Two identical indexes on the same table and columns |
| Table with more than 200 columns (HIGH) | Table with 201 columns |
| Missing Statistics (HIGH) | Table with >1000 inserts, never analyzed |
| Outdated Statistics (MEDIUM) | Table with dead tuples exceeding autovacuum threshold |
| Table with more than 50 columns (MEDIUM) | Table with 60 columns |
| Low Index Efficiency (MEDIUM) | Non-selective index scanned >100 times; each scan reads many tuples (seeded via a loop of 110 queries using a predicate that matches most rows) |
| Excessive Sequential Scans (MEDIUM) | Table with >1000 seq scans produced by a seeding loop of sequential scans against a large table |
| Missing FK Index (LOW) | Table with FK constraint and no supporting index |
| Table With Single Or No Columns (LOW) | Table with 1 column |
| Table With No Activity Since Stats Reset (LOW) | Table created but never read or written |
| Role Never Logged In (LOW) | Role with LOGIN created but never connected |
| Empty Table (LOW) | Table with 0 rows |
| Index With Very Low Usage (LOW) | Index with 1–99 scans and size > 1MB — seeded via a loop that scans the index a small number of times |

> **Note:** Low Index Efficiency requires `idx_scan > 100` and `idx_tup_read / idx_scan > 1000`. The seed loop runs 110 queries using a predicate that hits the indexed column but matches a large fraction of rows, so each scan reads thousands of tuples.

### `02_seed_pg_stat_statements.sql`

Existing file. Seeds all pg_stat_statements checks. No changes needed.

---

## Live Session Strategy

Three background threads open `psycopg3` connections and hold them for the duration of the validation window.

| Thread | What it does | Checks triggered |
|---|---|---|
| **Blocker** | Opens `BEGIN`, runs `UPDATE` on a row, calls `pg_sleep(600)`, then `ROLLBACK` | Current Blocked/Blocking Queries, Lock-Wait-Heavy Active Queries |
| **Blocked** | Waits for blocker to establish, then attempts `UPDATE` on the same row | Current Blocked/Blocking Queries, Lock-Wait-Heavy Active Queries |
| **Idle-in-transaction** | Opens `BEGIN`, runs `SELECT 1`, then sleeps in Python for 6 minutes (transaction stays open) | Idle In Transaction Over 5 Minutes |
| **Long query** | Runs `SELECT pg_sleep(360)` | Long Running Queries (>5 min), Top 10 Expensive Active Queries (>30 sec) |

### Startup sequencing

1. Start Blocker thread; wait until its lock is confirmed held (poll `pg_locks`)
2. Start Blocked thread; wait until it appears in `pg_stat_activity` with `wait_event_type = 'Lock'`
3. Start Idle-in-transaction thread; wait until it appears in `pg_stat_activity` with `state = 'idle in transaction'`
4. Start Long query thread; wait until it appears in `pg_stat_activity` with runtime > 30 seconds
5. Proceed to validation

All threads are daemon threads. They are cancelled via `pg_terminate_backend()` during cleanup if they haven't exited naturally.

---

## Replication Slot Guard

```python
try:
# Requires wal_level = logical and superuser
conn.execute("SELECT pg_create_logical_replication_slot('pgfirstaid_test_slot', 'test_decoding')")
# Slot is inactive by definition (no consumer attached)
# Triggers: Inactive Replication Slots (HIGH)
replication_slot_created = True
except psycopg.errors.ObjectNotInPrerequisiteState:
print("SKIP: wal_level != logical — replication slot checks not seeded")
replication_slot_created = False
except psycopg.errors.InsufficientPrivilege:
print("SKIP: insufficient privilege to create replication slot")
replication_slot_created = False
```

The slot is dropped during cleanup if it was created.

---

## Checks Not Seeded

| Check | Reason |
|---|---|
| High Connection Count (>50 active) | Requires 50+ concurrent connections — out of scope for a seed script; pgbench covers this (existing `07_pgbench_active_query.sql`) |
| Inactive Replication Slots Near Max WAL | Requires sustained WAL generation to push retained WAL near `safe_wal_size` — not deterministic in a test environment |
| shared_buffers At Default / work_mem At Default | These fire based on server config, not seeded data — always present on a default-configured server |
| Server Role (standby) | Always fires as INFO, content depends on actual server role |
| INFO checks (version, uptime, extensions, log size, etc.) | Always fire — no seeding needed |

---

## Validation

After the 6-minute idle-in-transaction window is established, run:

```sql
SELECT check_name, count(*) AS findings
FROM pg_firstAid()
GROUP BY check_name
ORDER BY check_name;
```

Compare results against an expected set defined in the script. Print a table:

```
PASS Missing Primary Key
PASS Duplicate Index
PASS Idle In Transaction Over 5 Minutes
FAIL Replication Slots Near Max Wal Size (skipped — wal_level)
...
```

Exit 0 if all non-skipped checks fired. Exit 1 if any non-skipped check produced 0 rows.

---

## File Layout

```
testing/
seed_and_validate.py ← new: orchestrator
healthcheck_seed/
01_seed_static_checks.sql ← rewrite: full structural seed
02_seed_pg_stat_statements.sql ← existing, unchanged
03_session_blocker.sql ← kept for manual use
04_session_blocked.sql ← kept for manual use
05_session_idle_in_transaction.sql ← kept for manual use
06_session_long_running_query.sql ← kept for manual use
07_pgbench_active_query.sql ← kept for manual use
99_validate_seed_results.sql ← kept for manual use
```

---

## Dependencies

- Python 3.11+
- `psycopg` (psycopg3): `pip install psycopg[binary]`
- PostgreSQL superuser access to the target server
4 changes: 4 additions & 0 deletions pgFirstAid.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ begin
return;
end if;

begin
return query
with pss as (
select
Expand Down Expand Up @@ -264,6 +265,9 @@ with pss as (
order by
((to_jsonb(pss)->>'wal_bytes')::numeric / NULLIF(pss.calls, 0)) desc
limit 10;
exception when object_not_in_prerequisite_state then
return;
end;
Comment thread
randoneering marked this conversation as resolved.
end;
$$ language plpgsql;

Expand Down
11 changes: 11 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[project]
name = "pgfirstaid-testing"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"psycopg2-binary>=2.9.12",
"pytest>=8.0",
]

[tool.pytest.ini_options]
testpaths = ["testing"]
Loading
Loading