-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make db connect more robust #5738
Make db connect more robust #5738
Conversation
Please also rebase on to Gitea/master. For some reason these patches aren't applying properly on to master and I suspect your master might be out of date |
cc47fe3
to
5dc2abc
Compare
…ea into Make_DB_Connect_More_Robust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more little things, mostly just for consistency and nicer logging.
There are likely errors returned from models.NewEngine
that would mean we should stop earlier but we can deal with those later - so long as we get told them early (hence the logging).
routers/init.go
Outdated
// In case of delays connecting to DB, retry connection. Eg, PGSQL in Docker Container on Synolgy | ||
log.Info("Beginning ORM engine init...") | ||
for i := 0; i < setting.DBConnectRetries; i++ { | ||
if err := models.NewEngine(migrations.Migrate); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log at the start of the loop:
log.Info("ORM engine initialization (Attempt: #%d/%d).", i, setting.DBConnectRetries)
in the function turn the :=
into =
routers/init.go
Outdated
if err := models.NewEngine(migrations.Migrate); err != nil { | ||
log.Fatal(4, "Failed to initialize ORM engine: %v", err) | ||
// In case of delays connecting to DB, retry connection. Eg, PGSQL in Docker Container on Synolgy | ||
log.Info("Beginning ORM engine init...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think we should strip this out to a separate function:
func initDBEngine() (err error) {
// With the contents of lines 61 - 72 (as amended)
// return the last error which you then log.Fatal(4, "Failed to initialize ORM engine: %v", err)
}
Be consistent with the log message. So make this line:
log.Info("Beginning ORM engine initialization...")
After these changes, the new pertinent log sections looks like:
|
Codecov Report
@@ Coverage Diff @@
## master #5738 +/- ##
=========================================
Coverage ? 37.74%
=========================================
Files ? 327
Lines ? 47840
Branches ? 0
=========================================
Hits ? 18059
Misses ? 27187
Partials ? 2594
Continue to review full report at Codecov.
|
Anything happening here? I saw there was changes requested, but they appear to be outdated. NVM I see the one "log error" comment now. My browser must have gone stale. Just to clarify there.. the error is already being logged at the end of the connect attempt. You are saying now that you want to log the same error over and over for each attempt? As the error is very likely the same error each time, this seems redundant. If you insist on this, it may be advisable to increase the log level for the intermediate error logging. But, your project, so jsut let me know. |
You should log the error before sleeping in the backoff as suggested above and I'll then lgtm it. |
OK sure. So, did you want to keep it at the info log level as well? And then also did you want to remove the logging of the error at the final of all the attempts (which also would then become redundant). |
Yeah keep it at Info. The point is that if you're watching your logs and you see it fail you can terminate early if it's an unexpected failure. |
Well, as it is, you know the last error from the last try, if the ORM fails to init at all, since the last error is printed after the new fcn returns. But so if you want to print the error for each failed attempt & keep it all at info, it seems like you would want to remove the final info error log print after the fcn call, as it will be completely redundant. Let me know on that & I will proceed. |
Hmm I'm not sure I see it log twice from the current suggested change. Your current code returns early when there's an error at the end. If you really don't think it's helpful to log the backed off errors at Info, drop em to Debug. I'm just cautious that we're going to make Gitea take ten times longer to fail is there's a database problem and not actually tell people what the problem is until the end of that of 10 attempts. |
Doesn't this make it log the final error after the new fcn call:
I assume fatal gets logged no matter the log setting? Hopefully I didn't screw this code up - I think it was mainly recycled from the pre-this-PR code. Now I understand a little better where you were coming from. So for failure modes OTHER than the startup order issue, you may frustrate some real-time debuggers. I think that's a fair point. So now I am glad I was not dogmatically standing my ground there :) But yeah I think making the retry failures debug level makes the most sense. Peeps who are actively debugging will surely have their log levels turned up, and the rest will have less noisy logs. I will go ahead and do that. |
Yeah log.Fatal logs no matter the setting and kills the program too. I was meaning replace line 56 (the sleep command) with the 2 log lines plus sleep command. If you look carefully when you've backed off n -1 times and still don't get a dB on the nth, that is the last time, line 53's test evaluates true and line 54 returns the error, so on the nth time you only get the log.Fatal. One other thing, because this code is using a go log, I'm not sure that there is a log.Debug in that logger, but if there is feel free to drop it the logging down to debug. The other benefit of logging this stuff, even for those not debugging Gitea, is that we can find out what xorm returns when it can't connect because the db isn't ready and we can adjust this code in the future to report better information and/or not back off if the error isn't the right type. At present I at least have no idea what xorm would return, and in some ways noisy logs that annoy users means we get that information so we can make better code in future. |
This is what the new output looks like, with log level set to Debug. You can see the log level of each print with the letter in brackets between the timestamp and the message.
|
OK incoming. Looks like I only need to set the ini value as we already were reading it as a Duration & not a constant. I tested that out locally & was able to set the Duration in the file to 2s & it actually worked, lol. Stand by. |
…ea into Make_DB_Connect_More_Robust
Thanks for your work Paul sorry there's been so many little changes. |
Nah its nothing. Thank y'all. I do like giving back a little bit on projects that have helped me & yours is a good one. I was SO glad to get rid of GitLab, haha. Maybe c yall again u never know.... Keep up the good work! |
Ok Paul, you've already got my lgtm. Need one more maintainer to look at this and say it's a good idea. Btw Is it really the case that the xorm dB just returns a error when there's a failure to connect? That seems weird! |
NOOOooooo!!! Heh. Well I appreciate the merge, but guess what? We need one more change. After your last comment I went back to look at the output before this PR & saw that it was not |
BTW, with the patch, the output looks like:
|
Just push the changes to this branch we should be able to remerge off it. Neither of us reviewers caught it. |
log.Info("Beginning ORM engine initialization.") | ||
for i := 0; i < setting.DBConnectRetries; i++ { | ||
log.Info("ORM engine initialization attempt #%d/%d...", i+1, setting.DBConnectRetries) | ||
if err := models.NewEngine(migrations.Migrate); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should have been:
if err = ...
The reason the compiler didn't alarm is because there was a named err always available. I knew I never liked that feature of the language!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ur right. I did have it that way the first go around, then I changed it due to a compile error (which was for another unrelated problem). I currently have em on separate lines tho. u want me to condense it back down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push your current changes up, and let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed it already. It didn't show up here I guess becz its merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I hit the "compare & new PR" button. u reckon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i went ahead and condensed it back down to a one-liner, sans the re-declaration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK the new PR is is #5780
Refer to #2907
(It was auto-closed by your bot, but still an issue.)
Note that your contributing instructions for running drone CLI did not work for me. It seems like the instructions may be outdated or incomplete. Here is what I got:
/home/grokman/go_projects/src/code.gitea.io/gitea % drone exec --local --build-event "pull_request" Incorrect Usage: flag provided but not defined: -local
However, I did build & run this successfully. Here is a pertinent log snippet:
I hope this is helpful to your project which - other than this issue - has worked well for me on my Synology Diskstation. If you like my change and what to pull it, but want the tests working on my side first, just guide me in there. & I'd be happy to have another go at that.