Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDS proxy #3827

Merged
merged 4 commits into from
Jan 29, 2024
Merged

RDS proxy #3827

merged 4 commits into from
Jan 29, 2024

Conversation

codyebberson
Copy link
Member

@codyebberson codyebberson commented Jan 28, 2024

Experimenting with RDS Proxy for better database upgrade recovery time.

Learn more: https://aws.amazon.com/rds/proxy/

Deploying to staging.


Update 1

Using RDS Proxy improved the database upgrade / server downtime impact.

When not using RDS proxy, the database upgrade led to 19 seconds of downtime.

When using RDS proxy, the database upgrade led to 4 seconds of downtime.

I believe those 4 seconds are actually due to this open issue in pg: brianc/node-postgres#2112


Update 2

Downtime reduced to 2 seconds using this workaround for brianc/node-postgres#2112

  process.on('uncaughtException', (err) => {
    globalLogger.error('Uncaught exception thrown', err);

    if (err.message && typeof err.message === 'string' && err.message.includes('Connection terminated unexpectedly')) {
      // The pg-pool library throws this error when the database connection is lost.
      // This can happen when the database server is restarted.
      // We do *not* want to exit the process in this case.
      return;
    }

    process.exit(1);
  });

Those 2 seconds due to connection error.

Server side log:

{
    "level": "ERROR",
    "timestamp": "2024-01-28T18:36:27.517Z",
    "msg": "Uncaught exception thrown",
    "error": "Error: Connection terminated unexpectedly",
    "stack": [
        "Error: Connection terminated unexpectedly",
        "    at Connection.<anonymous> (/usr/src/medplum/node_modules/pg/lib/client.js:132:73)",
        "    at Object.onceWrapper (node:events:632:28)",
        "    at Connection.emit (node:events:518:28)",
        "    at Socket.<anonymous> (/usr/src/medplum/node_modules/pg/lib/connection.js:63:12)",
        "    at Socket.emit (node:events:530:35)",
        "    at TCP.<anonymous> (node:net:337:12)",
        "    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)"
    ]
}

Client side log:

p [Error]: Client has encountered a connection error and is not queryable
    at Or.<anonymous> (C:\Users\cody\dev\medplum\packages\core\dist\cjs\index.cjs:42:7650)
    at Generator.next (<anonymous>)
    at fulfilled (C:\Users\cody\dev\medplum\packages\core\dist\cjs\index.cjs:5:58)
    at processTicksAndRejections (node:internal/process/task_queues:95:5) {
  outcome: {
    resourceType: 'OperationOutcome',
    issue: [ [Object] ],
    extension: [ [Object] ]
  },
  cause: undefined
}

That could potentially be fixed or handled by doing some kind of healthcheck on borrow from pool.


Update 3

Consider trying postgres-pool as a replacement for pg-pool:

It is less popular, but appears to have a more robust implementation.


Update 4

Abandoning postgres-pool - it is not completely API compatible with pg-pool. Given it's relative low popularity, I'm not inclined to do a major refactor just to try it.

@codyebberson codyebberson requested a review from a team as a code owner January 28, 2024 00:26
Copy link

vercel bot commented Jan 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
medplum-app ⬜️ Ignored (Inspect) Visit Preview Jan 28, 2024 6:46pm
medplum-storybook ⬜️ Ignored (Inspect) Visit Preview Jan 28, 2024 6:46pm
medplum-www ⬜️ Ignored (Inspect) Visit Preview Jan 28, 2024 6:46pm

Copy link

github-actions bot commented Jan 28, 2024

Messages
📖 @medplum/core: 153.9 kB
📖 @medplum/react: 338.4 kB

Generated by 🚫 dangerJS against 0609cfa

Copy link

sonarcloud bot commented Jan 28, 2024

@reshmakh reshmakh added this to the January 31st, 2024 milestone Jan 28, 2024
@codyebberson codyebberson added this pull request to the merge queue Jan 29, 2024
Merged via the queue into main with commit e3411d0 Jan 29, 2024
16 checks passed
@codyebberson codyebberson deleted the cody-rds-proxy branch January 29, 2024 01:43
medplumbot added a commit that referenced this pull request Jan 31, 2024
Fixes #3794 - MeasureReport.period search (#3850)
Extra check for vmcontext bots (#3863)
Add and use vite-plugin-turbosnap (#3849)
Downgrade chromatic (#3848)
Repo sql fixes for cockroachdb (#3844)
Remove Health Gorilla from medplum-demo-bots (#3845)
fix-3815 cache presigned s3 binary urls (#3834)
Use tsvector index for token text search (#3791)
rate limit should return `OperationOutcome` (#3843)
Add global var "module" to vm context bots (#3842)
Fix lookup table tsv indexes (#3841)
Always use estimate count first (#3840)
Disambiguate getClient (#3839)
Fix invalid mermaid graph in diagnostic catalog docs (#3836)
fix-3809 race condition in Subscription extension fhir-path-criteria-expression %previous value lookup (#3810)
Fix Sonar code smells: mark React props readonly (#3832)
RDS proxy (#3827)
Fixed lookup tables in migration generator (#3830)
Fixed deprecated jest matchers (#3831)
Update README.md (#3828)
Update fhir-basics.md (#3829)
Case study content and images (#3820)
Added rdsReaderInstanceType and RDS upgrade docs (#3826)
Dependency upgrades (#3825)
Separate search popup menus for 'text' and 'token' (#3824)
Improve performance of token sort (#3823)
Additional logging (#3790)
Fix calendar input button style (#3817)
Don't add _total default in SearchControl (#3818)
Dark mode (#3814)
Fixes #3812 - FHIR profile cache bug (#3813)
Document using medplum client to integrate with external FHIR servers (#3811)
Use specific advisory locks (#3805)
Nested transactions (#3788)
Fix signin page on graphiql (#3802)
fix(heartbeat): start heartbeat on first bind to sub (#3793)
Fix async job tests (#3795)
Document using vm context bots (#3784)
Refactored access policy docs based on customer feedback (#3785)
Support Redis TLS config from Env (#3787)
feat(subscriptions): add `heartbeat` for WS subs (#3740)
Update Bot metrics (#3763)
github-merge-queue bot pushed a commit that referenced this pull request Jan 31, 2024
Fixes #3794 - MeasureReport.period search (#3850)
Extra check for vmcontext bots (#3863)
Add and use vite-plugin-turbosnap (#3849)
Downgrade chromatic (#3848)
Repo sql fixes for cockroachdb (#3844)
Remove Health Gorilla from medplum-demo-bots (#3845)
fix-3815 cache presigned s3 binary urls (#3834)
Use tsvector index for token text search (#3791)
rate limit should return `OperationOutcome` (#3843)
Add global var "module" to vm context bots (#3842)
Fix lookup table tsv indexes (#3841)
Always use estimate count first (#3840)
Disambiguate getClient (#3839)
Fix invalid mermaid graph in diagnostic catalog docs (#3836)
fix-3809 race condition in Subscription extension fhir-path-criteria-expression %previous value lookup (#3810)
Fix Sonar code smells: mark React props readonly (#3832)
RDS proxy (#3827)
Fixed lookup tables in migration generator (#3830)
Fixed deprecated jest matchers (#3831)
Update README.md (#3828)
Update fhir-basics.md (#3829)
Case study content and images (#3820)
Added rdsReaderInstanceType and RDS upgrade docs (#3826)
Dependency upgrades (#3825)
Separate search popup menus for 'text' and 'token' (#3824)
Improve performance of token sort (#3823)
Additional logging (#3790)
Fix calendar input button style (#3817)
Don't add _total default in SearchControl (#3818)
Dark mode (#3814)
Fixes #3812 - FHIR profile cache bug (#3813)
Document using medplum client to integrate with external FHIR servers (#3811)
Use specific advisory locks (#3805)
Nested transactions (#3788)
Fix signin page on graphiql (#3802)
fix(heartbeat): start heartbeat on first bind to sub (#3793)
Fix async job tests (#3795)
Document using vm context bots (#3784)
Refactored access policy docs based on customer feedback (#3785)
Support Redis TLS config from Env (#3787)
feat(subscriptions): add `heartbeat` for WS subs (#3740)
Update Bot metrics (#3763)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
self-host Features and fixes related to self hosting
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

2 participants