-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix coverity erro (CID 349552) double lock #6970
Conversation
We had a double lock before, this commit fix this
Manage this branch in SquashTest this branch here: https://thiagoftsmdim-template-fix-neb5x.squash.io |
I don't think this is enough, since you are only holding the host lock and you allow the dimension to be deleted during: |
This commit fix the order process
I push another commit after to read your newest text in the ISSUE related to this PR. |
The only question remaining to be answered is whether a caller of |
@@ -204,7 +208,6 @@ RRDDIM *rrddim_add_custom(RRDSET *st, const char *id, const char *name, collecte | |||
return rd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to handle error cases such as this one. You could add something like:
if (host_locked)
rrdhost_unlock(host);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was verifying Netdata code here, the function rrdhost_wrlock
and rrdhost_unlock
are definitions to other internal functions, netdata_rwlock_wrlock_debug
and netdata_rwlock_unlock_debug
, these other two functions call pthread functions to try lock and try unlock, these functions already have error messages inside it.
I tried to use pthread_rwlock_trywrlock to check errors and I received a lot RW_LOCK: failed to obtain write lock (code 35)
and the alarms where not created, os I returned for the current code we have here, because it is working fine.
Without to use function
and after this Netdata exit with failure, so it is necessary to call this function. |
I am returning for the first solution, because the others are generating:
I tested all options that you wrote in the original issue and here, and I returned for the first solution that you gave us, because it is the unique that does not give us |
This will not work because the code is referencing the chart and dimension by pointers
and if for example a thread is deleting a dimension ( |
That is a theoretical problem of course, since it's hard to think of a practical case for this, since the dimension has just been created. But still, it's a potential bug, and if there's no chance for it now, there may be in the future. |
This solution try to lock the host before to move in front
You are correct, I thought the situation again and I found a way to avoid the error reported and do not raise errors with our current code in this last commit, please, verify this solution now, I think we can accept 5 tries to lock the host to create the alarms, mainly considering that this did not happen in my tests and probably during the review, because nobody reported errors related to this. But, I have a doubt related to the time I am waiting for to try lock again. |
database/rrddim.c
Outdated
hostlocked = pthread_rwlock_trywrlock(&host->rrdhost_rwlock); | ||
if(!hostlocked) | ||
{ | ||
rrdcalc_link_to_rrddim(rd, st, host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you are reusing the rd
dimension pointer without holding the lock for chart st
. If rd
is deleted in the meantime it would crash. Moreover, even if you retook the chart lock, still you would be reusing the rd
pointer which could have been freed during the period when you did not actually hold the chart lock.
We need to think this through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are creating rd
in this function, so we cannot delete it at this point. What we can do is to move back rrdset_unlock
this will avoid we delete the chart. Considering that the write lock for chart is different of write lock for host, I do not think the problem was there. Right now I really believe that the error reported from Coverity was related I try to lock something that was possibly locked. I moved the function back and I did not notice any error reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since you moved the lock again rd
is safe again but you still keep the wrong lock order only now you're try-locking instead of locking. So, in the previous cases of deadlock, now you're delaying and failing, which is probably why coverity does not write this down. But still, there's probably cases where we delay and fail during try-locking. Is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not detect any error while Netdata was trying to lock to create the alarm, but I had when I tried to do some changes in the previous commits.
About your question, I thought about it and I decided to try 5 times and finally raise an error. What can happen is we do not have an alarm created, but I used Netdata simple pattern to create 11 dimensions in a chart and I did not have any error when I tried to lock, so I think this is not a problem, at least it was not a problem with the charts I tried.
After to read your message I decided to repeat a test that I did in your PR, I changed the example.python to have 160 dimensions and I created an alarm using foreach
to create an alarm for each dimensions and again I did not have any error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you fix the grammar of the error message:
After to try creating an alarm for dimension %s on chart %s 5 times without success, I am moving in front
to something like:
Failed to create an alarm for dimension %s of chart %s 5 times. Skipping alarm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I executed the changes asked and Netdata continues working without raise errors and with all alarms created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please reduce the number of lines according to our latest code style specifications
@thiagoftsm clion can autoformat code, just use it, see code
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check this, thanks @ilyam8!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting for the code style change to approve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit fixes the format. Thanks!
To avoid the chart to be deleted while we are linking the alarm I am moving bak the chart lock
This commit fixes the grammar of an error message
Bring the defined pattern to the code and use netdata_rwlock_trywrlock
This commit fixes a format missing
* dim_template_fix: Fix lock We had a double lock before, this commit fix this * dim_template_fix: Fix order This commit fix the order process * dim_template_fix: Return I am returning for the first solution, because the others are generating: * dim_template_fix: Try to lock This solution try to lock the host before to move in front * dim_template_fix: Move chart lock To avoid the chart to be deleted while we are linking the alarm I am moving bak the chart lock * dim_template_fix: Fix grammar This commit fixes the grammar of an error message * dim_template_fix: bring pattern Bring the defined pattern to the code and use netdata_rwlock_trywrlock * dim_template_fix Fix format This commit fixes a format missing
* dim_template_fix: Fix lock We had a double lock before, this commit fix this * dim_template_fix: Fix order This commit fix the order process * dim_template_fix: Return I am returning for the first solution, because the others are generating: * dim_template_fix: Try to lock This solution try to lock the host before to move in front * dim_template_fix: Move chart lock To avoid the chart to be deleted while we are linking the alarm I am moving bak the chart lock * dim_template_fix: Fix grammar This commit fixes the grammar of an error message * dim_template_fix: bring pattern Bring the defined pattern to the code and use netdata_rwlock_trywrlock * dim_template_fix Fix format This commit fixes a format missing
* dim_template_fix: Fix lock We had a double lock before, this commit fix this * dim_template_fix: Fix order This commit fix the order process * dim_template_fix: Return I am returning for the first solution, because the others are generating: * dim_template_fix: Try to lock This solution try to lock the host before to move in front * dim_template_fix: Move chart lock To avoid the chart to be deleted while we are linking the alarm I am moving bak the chart lock * dim_template_fix: Fix grammar This commit fixes the grammar of an error message * dim_template_fix: bring pattern Bring the defined pattern to the code and use netdata_rwlock_trywrlock * dim_template_fix Fix format This commit fixes a format missing
Summary
Fixes #6969
We had a deadlock inside the database after to merge template to dimension, this commit fixes this error.
Component Name
Health
Additional Information