-
Notifications
You must be signed in to change notification settings - Fork 2.2k
sqldb+lncfg: add retry logic for postgres #10564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Abdulkbk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances LND's resilience by introducing a retry mechanism for its initial connection to a PostgreSQL database. Previously, LND would fail and exit if the database wasn't immediately available, requiring manual intervention or external workarounds. The new changes allow LND to gracefully wait and retry connecting, making it more robust in dynamic environments like Kubernetes where database services might not start simultaneously with LND. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a retry mechanism for the initial PostgreSQL database connection, enhancing LND's resilience in environments like Kubernetes. However, a critical security flaw exists in the WaitForPostgresReady function in sqldb/postgres.go, where sensitive database credentials can be leaked into logs and error messages due to incomplete DSN sanitization. It is recommended to improve the sanitization logic or avoid logging the full DSN. Additionally, consider making the ping timeout configurable for better flexibility.
| sanitizedDSN, err := replacePasswordInDSN(cfg.Dsn) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| db, err := sql.Open("pgx", cfg.Dsn) | ||
| if err != nil { | ||
| return fmt.Errorf("error creating postgres connection: %w", | ||
| err) | ||
| } | ||
| defer db.Close() | ||
|
|
||
| for attempt := 0; attempt < cfg.StartupMaxRetries; attempt++ { | ||
| pingCtx, cancel := context.WithTimeout(ctx, 5*time.Second) | ||
| err = db.PingContext(pingCtx) | ||
| cancel() | ||
|
|
||
| if err == nil { | ||
| log.Infof("Postgres is ready at '%s'", sanitizedDSN) | ||
| return nil | ||
| } | ||
|
|
||
| log.Warnf("Failed to connect to postgres at '%s' "+ | ||
| "(attempt %d/%d): %v", sanitizedDSN, attempt+1, | ||
| cfg.StartupMaxRetries, err) | ||
|
|
||
| select { | ||
| case <-time.After(cfg.StartupRetryDelay): | ||
| case <-ctx.Done(): | ||
| return fmt.Errorf("context canceled while waiting "+ | ||
| "for postgres: %w", ctx.Err()) | ||
| } | ||
| } | ||
|
|
||
| return fmt.Errorf("failed to connect to postgres at '%s' after %d "+ | ||
| "attempts: %w", sanitizedDSN, cfg.StartupMaxRetries, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WaitForPostgresReady function in this section (lines 201-236) has a critical security flaw: sensitive database credentials (passwords) can be leaked into application logs and error messages. The replacePasswordInDSN function (defined at line 48) only sanitizes DSNs in URL format, failing to handle "keyword=value" formats. This can result in unsanitized passwords being logged (lines 219, 223) or included in errors (line 235). It is crucial to improve replacePasswordInDSN to robustly sanitize both DSN formats or avoid logging the full DSN. Furthermore, the ping timeout is currently hardcoded to 5 seconds (lines 214-215). For better flexibility and environmental tuning, consider making this configurable by utilizing cfg.Timeout from PostgresConfig if available, otherwise defaulting to 5 seconds.
| sanitizedDSN, err := replacePasswordInDSN(cfg.Dsn) | |
| if err != nil { | |
| return err | |
| } | |
| db, err := sql.Open("pgx", cfg.Dsn) | |
| if err != nil { | |
| return fmt.Errorf("error creating postgres connection: %w", | |
| err) | |
| } | |
| defer db.Close() | |
| for attempt := 0; attempt < cfg.StartupMaxRetries; attempt++ { | |
| pingCtx, cancel := context.WithTimeout(ctx, 5*time.Second) | |
| err = db.PingContext(pingCtx) | |
| cancel() | |
| if err == nil { | |
| log.Infof("Postgres is ready at '%s'", sanitizedDSN) | |
| return nil | |
| } | |
| log.Warnf("Failed to connect to postgres at '%s' "+ | |
| "(attempt %d/%d): %v", sanitizedDSN, attempt+1, | |
| cfg.StartupMaxRetries, err) | |
| select { | |
| case <-time.After(cfg.StartupRetryDelay): | |
| case <-ctx.Done(): | |
| return fmt.Errorf("context canceled while waiting "+ | |
| "for postgres: %w", ctx.Err()) | |
| } | |
| } | |
| return fmt.Errorf("failed to connect to postgres at '%s' after %d "+ | |
| "attempts: %w", sanitizedDSN, cfg.StartupMaxRetries, err) | |
| pingTimeout := 5 * time.Second | |
| if cfg.Timeout > 0 { | |
| pingTimeout = cfg.Timeout | |
| } | |
| pingCtx, cancel := context.WithTimeout(ctx, pingTimeout) | |
| err = db.PingContext(pingCtx) |
Add StartupMaxRetries and StartupRetryDelay fields to PostgresConfig to control connection retry behavior at startup. This prepares for retry logic that helps in environments like Kubernetes where the database container may not be ready when LND starts.
Add a WaitForPostgresReady function that pings postgres in a retry loop with a fixed delay before any backends are opened. This prevents LND from failing immediately with "connection refused" when postgres isn't ready yet. The function is called at the top of the PostgresBackend case in GetBackends, gating all downstream connection attempts. Retries are enabled by default (10 attempts, 1s delay) and can be disabled by setting startup-max-retries=0.
613df35 to
94fef43
Compare
🟠 PR Severity: HIGH
🟠 High (2 files)
🟡 Medium (1 file)
🟢 Low (4 files)
AnalysisThis PR adds retry logic for Postgres database connections during LND startup. The highest severity comes from changes to the sqldb/ package, which manages the Postgres backend implementation. Key areas of impact:
Why HIGH severity:
Mitigating factors:
To override, add a |
94fef43 to
85a8b5b
Compare
🟠 PR Severity: HIGH
🟠 High (2 files)
🟡 Medium (1 file)
🟢 Low (3 files)
🧪 Test Files (1 file, excluded from severity)
AnalysisThis PR is classified as HIGH severity because it modifies core database connection logic in the Key considerations:
The configuration changes in To override, add a |
Change Description
fixes #10522
When LND starts with a Postgres backend, and Postgre isn't ready yet LND fails immediately and exits. There is no retry logic for the initial database connection, forcing users to use shell script workarounds.
This PR does the following:
startup-max-retriesandstartup-retry-delayconfig options to the postgres backend.WaitForPostgresReady()that pings postgres in a fixed-delay retry loop before opening the database.startup-max-retries=0.Steps to test
cd sqldb && go test -v -run TestWaitForPostgres).