fix: replace per-test databases with shared DB + per-test pools by ilblackdragon · Pull Request #429 · nearai/cloud-api

ilblackdragon · 2026-02-12T17:07:22Z

Summary

Replaces heavyweight per-test CREATE DATABASE ... TEMPLATE / DROP DATABASE pattern with a single shared database (platform_api_e2e) and per-test connection pools
Eliminates all admin connections for test setup/teardown, reducing peak PG connections from ~300 to ~64
Fixes shared-state concurrency bugs where tests using models.data[0] or mutating shared model names would interfere with parallel tests
Deletes cleanup_test_databases.rs (no per-test DBs to clean up)
Adds nextest config for capping parallel e2e threads at 16

Performance

Metric	Before (main)	After	Change
Sum of test runtimes	116s	25s	4.6x faster
Slowest binary	18.4s	3.4s	5.4x faster
Test binary failures	23/37	1/37 (pre-existing)	stable

Test plan

cargo fmt --all -- --check
cargo clippy -p api --tests (zero warnings)
cargo test -p api --tests (all pass except 1 pre-existing failure unrelated to this PR)
No individual test over 4s, no timeouts

🤖 Generated with Claude Code

Note

Medium Risk
Moderate risk because it changes test infrastructure and database bootstrapping/concurrency behavior; failures are likely to manifest as flaky or deadlocked CI tests rather than production runtime issues.

Overview
Switches API e2e/integration tests from per-test database cloning/teardown to a single shared test database bootstrapped once with migrations, with each test creating its own 4-connection pool (via deadpool-postgres) and using advisory locks to serialize bootstrap across nextest binaries.

Updates CI/nextest to cap e2e parallelism at 16 and increases PostgreSQL max_connections (plus --shm-size) to prevent connection exhaustion; removes the orphaned test DB cleanup test and adjusts e2e tests to avoid shared-state flakiness (e.g., using explicit model IDs / higher list limits) and drops the now-unneeded DB guard plumbing.

^{Written by Cursor Bugbot for commit 1d45d41. This will update automatically on new commits. Configure here.}

E2E tests intermittently stalled for ~60s in CI due to PostgreSQL connection exhaustion. With num-cpus parallelism on 32+ core runners, 448+ tests each creating 4-6 admin connections exceeded PG's default max_connections=100. - Cache admin DB name probe in a static OnceCell (saves 1-2 connections per test) - Remove redundant pg_terminate_backend/DROP on UUID-named test DBs (can't pre-exist) - Use DROP DATABASE ... WITH (FORCE) instead of separate terminate + drop - Remove unnecessary 50ms sleep after template migration - Cap e2e tests at 16 parallel threads via nextest test group - Increase PG max_connections to 300 in CI as safety net - Remove debug println that dumped full config on every test server init Per-test admin connections: 4-6 -> 2. Peak connections: unbounded -> ~64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-12T17:08:53Z

Code Review

I've reviewed the changes for critical issues.

✅ No Critical Issues Found

The PR effectively addresses PostgreSQL connection exhaustion during parallel test execution. The implementation is sound with good safety measures.

Key Validations:

✅ PostgreSQL Compatibility - CI uses postgres:15, and DROP DATABASE ... WITH (FORCE) requires PG 13+, so this is safe.

✅ Thread Safety - ADMIN_DB_NAME caching with tokio::sync::OnceCell ensures only one probe happens across concurrent tests.

✅ Nextest Filter - The pattern test(/^e2e/) correctly matches test functions starting with e2e, properly capping parallelism to 16 threads.

✅ Service Restart - The workflow correctly restarts PostgreSQL after ALTER SYSTEM SET max_connections with a 30s retry loop using pg_isready.

✅ Connection Math - "16 parallel tests × 4 connections = 64" is validated by the optimization reducing admin connections from 4-6 to 2 per test.

✅ Safety Preserved - validate_db_identifier() checks remain in place, preventing SQL injection.

✅ Test-Only Changes - No production impact except removing debug println! in api/src/lib.rs:142.

Summary

Well-designed optimization that:

Reduces connection pressure through caching and removing redundant operations
Uses modern PostgreSQL features correctly
Maintains all safety validations
Properly bounds parallelism via nextest test-groups

✅ Approved - Ready to merge pending CI validation.

Code review by Claude Code

Copilot

Pull request overview

Reduces PostgreSQL connection pressure during api e2e/integration tests to prevent intermittent CI stalls caused by exhausting Postgres’ default max_connections.

Changes:

Cache the “admin DB” name probe (postgres vs template1) and remove a post-migration sleep to cut per-test admin connections.
Simplify test DB lifecycle: create UUID-named DBs without redundant terminate/drop, and drop DBs using DROP DATABASE ... WITH (FORCE).
Bound parallelism for DB-heavy tests via nextest test groups and raise Postgres max_connections in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`crates/api/tests/common/db_setup.rs`	Lowers per-test admin DB connections and simplifies create/drop operations for test databases.
`crates/api/src/lib.rs`	Removes debug `println!` dumping config during auth init.
`.github/workflows/test.yml`	Increases CI Postgres resources (`--shm-size`) and raises `max_connections` to reduce stalls.
`.config/nextest.toml`	Caps parallelism for `api` e2e tests to limit peak DB connections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T17:10:31Z

.github/workflows/test.yml

+        for i in $(seq 1 30); do
+          docker exec ${{ job.services.postgres.id }} pg_isready -U postgres && break
+          sleep 1
+        done


The readiness wait loop doesn’t fail the step if PostgreSQL never becomes ready after the restart (the for loop always exits 0). Add an explicit check after the loop (or use a timeout helper) to exit 1 when pg_isready still fails, so the job doesn’t proceed with a broken DB and produce misleading test failures/timeouts.

Suggested change

done

done

if ! docker exec ${{ job.services.postgres.id }} pg_isready -U postgres; then

echo "PostgreSQL did not become ready after restart within 30 seconds" >&2

exit 1

fi

.config/nextest.toml

gemini-code-assist · 2026-02-12T17:20:43Z

Summary of Changes

Hello @ilblackdragon, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses intermittent CI stalls caused by PostgreSQL connection exhaustion during parallel E2E tests. It introduces several optimizations to reduce the number of database connections used by tests and improves the efficiency of test database setup and teardown. The changes aim to stabilize CI by bounding peak connection usage and streamlining database operations, ensuring a more robust testing environment.

Highlights

Reduced E2E Test Connection Pressure: Implemented a nextest test group to cap e2e/integration tests at 16 parallel threads, significantly reducing peak PostgreSQL connections and preventing CI stalls.
Optimized Admin Database Connection: The get_admin_db_name function now caches the admin database name (postgres or template1) using OnceCell, reducing connection overhead by 1-2 connections per test.
Streamlined Test Database Lifecycle: Removed redundant pg_terminate_backend and DROP DATABASE IF EXISTS calls during the creation of UUID-named test databases, as they cannot pre-exist. Also, adopted DROP DATABASE ... WITH (FORCE) for cleanup, simplifying the process and automatically terminating active connections (PostgreSQL 13+).
Removed Unnecessary Delay and Debug Output: Eliminated an unnecessary 50ms sleep after template database migration and removed a debug println! statement that previously dumped configuration on every test server initialization.

Changelog

.config/nextest.toml
- Added a test-groups.e2e-db section to define a maximum of 16 threads for this group.
- Applied the e2e-db test group override to api package tests starting with e2e/.
crates/api/src/lib.rs
- Removed a println! debug statement within the init_auth_services function.
crates/api/tests/common/db_setup.rs
- Introduced a static OnceCell<String> named ADMIN_DB_NAME for caching the admin database name.
- Refactored get_admin_db_name to use ADMIN_DB_NAME.get_or_try_init for caching the admin database name.
- Removed a tokio::time::sleep call from create_template_database_internal.
- Eliminated pg_terminate_backend and DROP DATABASE IF EXISTS calls from create_test_database_from_template.
- Modified drop_test_database to use DROP DATABASE IF EXISTS {db_name} WITH (FORCE).
- Modified drop_all_test_databases to use DROP DATABASE IF EXISTS {db_name} WITH (FORCE) for each database.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/test.yml

Activity

The pull request summary and initial description were generated by Claude Code and Cursor Bugbot, indicating automated assistance in drafting the PR.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses CI stalls from PostgreSQL connection exhaustion through optimizations like caching the admin database name, capping test parallelism via nextest, and streamlining database operations with DROP DATABASE ... WITH (FORCE). Crucially, it also enhances security by removing a debug statement that logged sensitive application configuration. These improvements lead to cleaner, more efficient code, a more stable test pipeline, and a confirmed absence of other vulnerabilities, with SQL query modifications safely implemented to prevent injection.

gemini-code-assist · 2026-02-12T17:31:27Z

crates/api/src/lib.rs


    let auth_service: Arc<dyn AuthServiceTrait> = if config.auth.mock {
        // TODO: fix this, it should not use the database pool
-        println!("config: {config:?}");


The application previously logged the entire ApiConfig struct. This configuration object contains sensitive information, including database credentials, API keys, and encryption keys. Logging this data, even in a test context, poses a risk as such code could be accidentally activated in production or secrets could be exposed in CI/CD logs. This change correctly removes the sensitive logging statement, mitigating the vulnerability.

- Use binary(/^e2e/) instead of test(/^e2e/) so the 16-thread cap actually applies to e2e test binaries (test() matches function names) - Fail the CI step if PostgreSQL never becomes ready after restart Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ilblackdragon · 2026-02-12T18:04:56Z

Addressed both review comments in 227d53b:

Nextest filter (Cursor Bugbot): Changed test(/^e2e/) to binary(/^e2e/). The test() predicate matches function names (which start with test_), not binary names. binary() correctly matches the integration test binaries derived from e2e_*.rs filenames. Great catch, this was the most important part of the PR.
pg_isready guard (Copilot): Added explicit readiness check after the wait loop that exits 1 if PostgreSQL never becomes ready after restart.

Eliminates heavyweight per-test CREATE/DROP DATABASE operations by sharing a single database across all e2e tests, with each test getting its own 4-connection pool. Tests are already isolated via UUID-scoped orgs, workspaces, and API keys. Also fixes shared-state concurrency issues where tests using models.data[0] or mutating shared model names would interfere with parallel tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

crates/api/tests/common/db_setup.rs

- Replace check-then-create with idempotent CREATE DATABASE that catches SQLSTATE 42P04 (duplicate_database), fixing the TOCTOU race when multiple test binaries bootstrap concurrently - Bump admin model list queries to limit=500 so tests find their models in the shared DB which accumulates entries across runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

crates/api/tests/common/db_setup.rs

The previous fix checked for SQLSTATE 42P04 (DUPLICATE_DATABASE), but in CI the error doesn't carry a SQL state code, causing a panic. Since multiple test binaries race to create the shared DB (OnceCell is per-process), just swallow any CREATE DATABASE error and let the migration pool connection be the real validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

crates/api/tests/common/db_setup.rs

The previous fix swallowed CREATE DATABASE errors but multiple binaries still raced on refinery's schema_history table creation during migrations. Now the entire bootstrap (CREATE DATABASE + migrations) is serialized with a PostgreSQL advisory lock on the admin connection. OnceCell still gates within a single binary; the advisory lock handles cross-process coordination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CI runs clippy with -D warnings and caught unusual_byte_groupings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…with_pool return type Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 12, 2026 17:07

ilblackdragon temporarily deployed to Cloud API test env February 12, 2026 17:07 — with GitHub Actions Inactive

Copilot started reviewing on behalf of ilblackdragon February 12, 2026 17:07 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

cursor bot reviewed Feb 12, 2026

View reviewed changes

.config/nextest.toml Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

ilblackdragon temporarily deployed to Cloud API test env February 12, 2026 17:45 — with GitHub Actions Inactive

ilblackdragon and others added 3 commits February 12, 2026 11:04

Merge remote-tracking branch 'origin/main' into fix/300-slow-e2e-tests

d2fe76d

fix: update e2e_function_tools.rs for shared DB pattern after merge

9ec9ab9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ilblackdragon had a problem deploying to Cloud API test env February 12, 2026 21:06 — with GitHub Actions Failure

cursor bot reviewed Feb 12, 2026

View reviewed changes

crates/api/tests/common/db_setup.rs Show resolved Hide resolved

ilblackdragon changed the title ~~fix: reduce e2e test connection pressure to prevent CI stalls~~ fix: replace per-test databases with shared DB + per-test pools Feb 12, 2026

ilblackdragon had a problem deploying to Cloud API test env February 12, 2026 22:49 — with GitHub Actions Failure

cursor bot reviewed Feb 12, 2026

View reviewed changes

crates/api/tests/common/db_setup.rs Show resolved Hide resolved

ilblackdragon had a problem deploying to Cloud API test env February 12, 2026 23:20 — with GitHub Actions Failure

cursor bot reviewed Feb 12, 2026

View reviewed changes

crates/api/tests/common/db_setup.rs Show resolved Hide resolved

ilblackdragon had a problem deploying to Cloud API test env February 13, 2026 00:25 — with GitHub Actions Failure

fix: use consistent hex digit grouping for advisory lock key

1d45d41

CI runs clippy with -D warnings and caught unusual_byte_groupings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ilblackdragon temporarily deployed to Cloud API test env February 13, 2026 00:39 — with GitHub Actions Inactive

ilblackdragon requested a review from PierreLeGuen February 13, 2026 02:20

ilblackdragon temporarily deployed to Cloud API test env February 15, 2026 04:42 — with GitHub Actions Inactive

Merge branch 'main' into fix/300-slow-e2e-tests

9ae5fd0

PierreLeGuen had a problem deploying to Cloud API test env February 16, 2026 18:56 — with GitHub Actions Failure

fix: update e2e_provider_errors to match 4-element setup_test_server_…

01d5024

…with_pool return type Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PierreLeGuen temporarily deployed to Cloud API test env February 16, 2026 19:31 — with GitHub Actions Inactive

PierreLeGuen approved these changes Feb 16, 2026

View reviewed changes

PierreLeGuen merged commit f8a3cfc into main Feb 16, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace per-test databases with shared DB + per-test pools#429

fix: replace per-test databases with shared DB + per-test pools#429
PierreLeGuen merged 11 commits intomainfrom
fix/300-slow-e2e-tests

ilblackdragon commented Feb 12, 2026 •

edited by cursor bot

Loading

Uh oh!

claude bot commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

ilblackdragon commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        done
+        done
+        if ! docker exec ${{ job.services.postgres.id }} pg_isready -U postgres; then
+          echo "PostgreSQL did not become ready after restart within 30 seconds" >&2
+          exit 1
+        fi

Conversation

ilblackdragon commented Feb 12, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

Test plan

Uh oh!

claude bot commented Feb 12, 2026

Code Review

✅ No Critical Issues Found

Key Validations:

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ilblackdragon commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilblackdragon commented Feb 12, 2026 •

edited by cursor bot

Loading