Skip to content

Conversation

@jnovinger
Copy link
Contributor

@jnovinger jnovinger commented Nov 11, 2025

Fixes: #358

Branch-specific database connections never close after CONN_MAX_AGE expires, causing connection exhaustion when users visit multiple branches. Django's close_old_connections() iterates DATABASES.keys() to discover aliases/names for connections to cleanup, but branch aliases are served dynamically via DynamicSchemaDict.__getitem__() without being yielded by __iter__(), so they're never checked for expiry. They're essentially invisible to Django's CONN_MAX_AGE machinery.

Solution Approaches

  1. Caching aliases in DATABASES dict: Fixed cleanup but broke all 43 existing tests with "Database connections not allowed in this test." Django's test framework checks DATABASES.keys() during setup and blocks non-default connections. The issue wasn't just about adding databases = '__all__' to existing tests (or to some base test class), which this would have required. The deeper concern was that by making branch aliases visible in DATABASES.keys() during test setup, we'd be changing the fundamental isolation guarantees. Tests that never needed to know about branch databases would suddenly have them in scope, potentially causing cross-contamination between tests or allowing tests to accidentally depend on branch state.

  2. Custom __iter__ using private API: Used connections._connections._storage.__dict__ to discover cached aliases. Fixed cleanup and all tests passed, but relied on undocumented Django internals. Rejected for maintenance concerns.

  3. (what I ended up with) Track branch aliases in our own thread-local storage when first accessed, register cleanup handlers on request_started and request_finished signals. Uses only public APIs (asgiref.Local, Django signals, connections.close_if_unusable_or_obsolete). Matches Django's pattern where DATABASES.keys() is static and implements thread-specific tracking in the same manner as Django.

Django's close_old_connections() only handles database aliases found in
DATABASES.keys(). Branch aliases (schema_*) are served dynamically via
DynamicSchemaDict.__getitem__ but aren't yielded by __iter__, so they
were never cleaned up, causing connection leaks.

Solution: Track branch connection aliases in thread-local storage when
they're first accessed. Register cleanup handler on request_started and
request_finished signals to close expired branch connections using the
same timing as Django's built-in cleanup.

Implementation uses public APIs only (asgiref.Local, Django signals,
connections.close_if_unusable_or_obsolete) and matches Django's pattern
where DATABASES.keys() is effectively static - aliases are tracked once
and never removed.

Tests verify connections close after CONN_MAX_AGE for single branch,
multiple branches, and deleted branch schemas.
@jnovinger jnovinger requested a review from a team November 11, 2025 00:58
@mgaruccio
Copy link

this may not be worth fixing, but FYI that this seems to only close connections properly if they get initiated via api or UI, if you create a branch with mgmt commands it will leave the connection idle indefinitely.

probably not a problem outside of my testing scenario but could be if there's ever another path to create connections that bypasses the http api?

@jnovinger
Copy link
Contributor Author

this may not be worth fixing, but FYI that this seems to only close connections properly if they get initiated via api or UI, if you create a branch with mgmt commands it will leave the connection idle indefinitely.

probably not a problem outside of my testing scenario but could be if there's ever another path to create connections that bypasses the http api?

Thanks for mention that, @mgaruccio. Looking in to that now.

@jnovinger
Copy link
Contributor Author

From what I can tell, default RQ workers fork a new process for each job. When the job completes and the forked child exits via os._exit(0), the OS automatically closes all file descriptors including database connections. This means branch connections opened during job execution are automatically cleaned up without needing CONN_MAX_AGE handling.

Non-forking worker configurations (like SimpleWorker) would need additional connection cleanup handling. NetBox uses the default worker class that provides pre-fork connection cleanup.

You could argue that the mechanism that django-rq uses to close any open connections before the work loop starts will also miss our "invisible" branch-related connections, but none of those should be open until an individual background job is run needing one of those connections. As a worker-based connection, they should be automatically cleaned up by the mechanism described above.

@jnovinger
Copy link
Contributor Author

Sorry, @mgaruccio, the above only considers things running in background jobs in django-rq. You mentioned management commands as well, but my brain went immediately to django-rq. Any other management commands should be relatively short lived and either explicitly clean up their own connections or rely on them being cleaned up when the command finishes.

@mgaruccio
Copy link

No worries! And yea I only noticed since I had a test set up using manage.py to check for the connection leaks. Highly unlikely to be an issue in normal use but figured I’d mention in case there are other potential misses. Appreciate you looking into it!

Copy link
Contributor

@arthanson arthanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice work @jnovinger

@arthanson arthanson merged commit 7dfa0bb into main Nov 13, 2025
12 checks passed
@jeremystretch jeremystretch deleted the 358-fix-connection-leak branch November 17, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database connections increase over time, leading to errors

4 participants