-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix config reload crash via introducing on_config_inited in LogPipe #3176
Conversation
This user does not have permission to start the build. Can one of the admins verify this patch and start the build? |
2 similar comments
This user does not have permission to start the build. Can one of the admins verify this patch and start the build? |
This user does not have permission to start the build. Can one of the admins verify this patch and start the build? |
@kira-syslogng ok to test |
Build FAILURE |
I tried to copy what the failing test does, and I experience crash during shutdown. Syslog-ng started with
The http server does not matter, whatever answers with 200 ok for a http post is fine. For example: from flask import Flask
app = Flask(__name__)
from flask import request
@app.route('/', methods=["POST"])
def hello():
return "Hello World!"
if __name__ == '__main__':
app.run() After starting syslog-ng, commenting out the sql source, reloading, then stopping. During stop, syslog-ng crashes:
|
Build FAILURE |
Is it possible to let me know what the failing test does? |
… On Mon, Mar 16, 2020 at 1:35 PM 0140454 ***@***.***> wrote:
Is it possible to let me know what the failing test does?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5RHHX2NZDABAZVVWVDRHYMIXANCNFSM4LHBQXQQ>
.
--
Bazsi
|
It gives me HTTP 404 Error. |
Hmm. You should be able to log in via github credentials and then you can
see the results.
…On Mon, Mar 16, 2020, 16:43 0140454 ***@***.***> wrote:
It gives me HTTP 404 Error.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5RAO7ACEND7J7MZCHDRHZCKTANCNFSM4LHBQXQQ>
.
|
I can access this page https://ci.syslog-ng.com/jenkins/job/kira-starter/5486/, but it shows 404 when trying to access https://ci.syslog-ng.com/jenkins/job/kira-test-compile-automake-run-testdb/5424/ |
Hello @0140454, Thank you for your pull request. The macOS job has been fixed recently, rebasing your branch to our master will fix that check. Kira-starter runs our internal test suite, you have 2 failing tests there:
The test case starts syslog-ng with the above config, then activates
So you need a total of 3
|
4578f9d
to
bf7920e
Compare
Build SUCCESS |
bf7920e
to
09ace9a
Compare
Build SUCCESS |
@0140454 I checked this change with your original test from #3173 (executing syslog-ng without
However, with |
In my understanding, there are three stages in while (1) {
iv_run_timers(st);
iv_run_tasks(st);
iv_fd_poll_and_run(st, abs);
} The module To avoid crash when using Before reverting to old config, we need all threads to exit. The situation becomes that worker threads wait main thread for performing I have no idea how to deal with this issue currently. |
@0140454 I try to catch up with the details, and see if I can find out something. |
I think I found a resolution for the example-msg-generator problem. As that is a new problem, and has different root cause that you fixed here, I opened a pull request to handle it separately: #3196. Now I look into your patchset, and try to understand the details. |
I have tried to use timer to make revert_config job run earlier. However, it doesn't work. The following is the implementation of while (1) {
struct timespec _abs;
const struct timespec *abs;
if (run_timers)
iv_run_timers(st);
iv_run_tasks(st);
if (st->quit || !st->numobjs)
break;
if (iv_pending_tasks(st)) {
_abs.tv_sec = 0;
_abs.tv_nsec = 0;
abs = &_abs;
} else {
abs = iv_get_soonest_timeout(st);
}
run_timers = iv_fd_poll_and_run(st, abs);
} When syslog-ng received a signal, the handler is executed in
The return value from |
Ok, I see the timers cannot work. First I was confused because many of the poll methods in ivykis just return 1, but then I compiled and checked with gdb, and on my system In summary, I agree with your patch, and I do not know any better way to handle the problem. The timer related ordering problems need to be addressed separately. I would like to ask you a minor change though. I am afraid that it is easy to miss the reason of this register unregister dance, and someone might just refactor back it to the original version, not checking the commit message. Could you please encapsulate the Apart from that, it looks good to me. I am now moving forward to your second patch. I experience a little different thing that you mention in the commit message. After applying the first patch, but without the second, my internal messages about the failed reload are not emitted immediately, but not lost either. Interestingly, during shutdown they appear. I want to look into that, and understand how the patch works in details. |
1a54258
to
ee0af4e
Compare
Build FAILURE |
484c0cd
to
58ccbc9
Compare
Build FAILURE |
I am not sure the reason for build failed. However, I think debug output is different from the current version mentioned in #3176 (comment) because dedicated worker threads are started after syslog-ng applied new config successfully instead of in halfway now. Current
Patch
@furiel Could you review my patch again and provide some hints about build failed? I cannot view Jenkins job output. Thank you. |
@kira-syslogng retest this please test branch=furiel-followup-on-config-inited; |
Build FAILURE |
@kira-syslogng retest this please test branch=furiel-followup-on-config-inited; |
Build FAILURE |
@kira-syslogng retest this please test branch=furiel-followup-on-config-inited; |
Build SUCCESS |
@0140454 that test needed a followup. It started syslog-ng, and did a failed reload, and counted the number of I will review the code this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few drivers that are not implemented through threaded source or threaded destination interface, but through LogReader
. For example: program source/destination, network source/destination, file source/destination, journald-source (maybe others?)
I am not too afraid of the destinations. It was the sources that started to send the messages too early, which caused the crash.
For the logreader based sources: on the other hand, (just checking the code, I did not try out):log_reader_init
calls log_reader_start_watches
, so maybe the logreader based drivers may do not work properly either. @MrAnno @bazsi what do you think?
Do we want to check those too in the scope of this PR? Or handle them separately?
if (!cfg_tree_start(&cfg->tree)) | ||
return FALSE; | ||
|
||
g_assert(cfg_tree_on_inited(&cfg->tree)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserting on the return value: does it mean that cfg_tree_on_inited
should not have return value?
log_threaded_source_driver_start_worker
returns TRUE
unconditionally too. Just an idea, but we might want to turn every on_inited to void consistently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I think someone may want syslog-ng to be terminated if there is an error in cfg_tree_on_inited
, I use g_assert
here to achieve this goal.
@furiel In my understanding, with this modification and the partial revert of #2555 (8054b23#diff-d527f62d8ea146c1e6ac8145a3b466b8R281), all drivers should work now. |
In my understand,
Since the partial revert of #2555, config revert is performed immediately. |
When the config load is unsuccessful, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice work, thanks for that!
Based on my tests, non-threaded sources (internal() and msg-generator() too) and threaded sources/destinations are working correctly.
(The name I suggested (on_config_inited()
) might not be the best, I hope someone can suggest a better one.)
Thanks! This puzzle piece was missing. This PR reverts 6b7d9d0 too. As no new threads will be started (as long as all future drivers will start thread only during This is why @0140454 did not need to move |
\o/ |
with syslog-ng#3176 this function became obsolete Signed-off-by: Laszlo Szemere <laszlo.szemere@oneidentity.com>
with syslog-ng#3176 this function became obsolete Signed-off-by: Laszlo Szemere <laszlo.szemere@oneidentity.com>
We cannot fail in a threaded workers init(), because of lib/cfg.c:344. ``` /* * TLDR: A half-initialized pipeline turned out to be really hard to deinitialize * correctly when dedicated source/destination threads are spawned (because we * would have to wait for workers to stop and guarantee some internal * task/timer/fdwatch ordering in ivykis during this action). * See: syslog-ng#3176 (comment) */ g_assert(cfg_tree_post_config_init(&cfg->tree)); ``` Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
We cannot fail in a threaded workers init(), because of lib/cfg.c:344. ``` /* * TLDR: A half-initialized pipeline turned out to be really hard to deinitialize * correctly when dedicated source/destination threads are spawned (because we * would have to wait for workers to stop and guarantee some internal * task/timer/fdwatch ordering in ivykis during this action). * See: syslog-ng#3176 (comment) */ g_assert(cfg_tree_post_config_init(&cfg->tree)); ``` Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
In current version, syslog-ng registers revert_config task after cfg_init.
When applying config failed, ivykis task list can be illustrated as the following:
If there is a log before
_revert_config
finished, syslog-ng will use uninitialized log pipe to process it. This will lead to a crash.Therefore,
main_loop_reload_config_revert
is scheduled to be called immediately viamain_loop_worker_sync_call
after applying config failed.To avoid recursion in
main_loop_worker_sync_call
, there is a global flag namedsync_call_running
.According to this flag, syslog-ng can determine whether it has to call
_invoke_sync_call_actions
.fixes: #3173
closes #3196