Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Reduce Idle Connections from Health Checks [#1102] #1107

Merged
merged 4 commits into from
Aug 17, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Aug 17, 2022

Purpose

The health check endpoints are opening up too many connections against the fidesops application database. Shoutout to @RobertKeyser and @sanders41 for spotting.

Changes

  • Stop creating a new engine as part of the process of running a health check.
  • Create an engine to be used across the application, and pass this into get_db_session when we go to get a single connection, instead of creating a new engine. (Note we are currently using the default pool_size and max_overflow parameters. This is something that may need to be configurable in the future). This is used for all our API endpoints.
    • Note that our celery processes intentionally create their own separate engines for managing connections for running privacy requests.

Background

The typical usage of create_engine() is once per particular database URL, held globally for the lifetime of a single application process. A single Engine manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion. The Engine is not synonymous to the DBAPI connect function, which represents just one connection resource - the Engine is most efficient when created just once at the module level of an application, not per-object or per-function call.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #1102

@pattisdr pattisdr marked this pull request as ready for review August 17, 2022 23:27
@sanders41 sanders41 merged commit fc644d9 into main Aug 17, 2022
@sanders41 sanders41 deleted the fidesops_1102_health_check_connections branch August 17, 2022 23:35
@pattisdr
Copy link
Contributor Author

thanks for merging @sanders41

yield db
finally:
db.close()


def _get_session() -> Session:
"""Gets a database session"""
global _engine # pylint: disable=W0603
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good solution. Is there a limit to how many connections one engine can make?

Copy link
Contributor Author

@pattisdr pattisdr Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I noted this in the description, we're currently using the defaults, so I believe the pool_size is 5, and there's a default max_overflow of 10, so you take the sum - 15. This is something that we'll probably want to make configurable at some point.

sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* Don't create a new engine as part of running the health checks and share a single engine across the application, including for the health checks.  Currently we're using the default pool_size and max_overflow.

* Update changelog.

* Fix that health checks are still supposed to run, even if the database is disabled.

* Need to yield instead -  'generator' object has no attribute 'query'
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many health check db connections
3 participants