Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in slow timer #2120

Closed
lazedo opened this issue Oct 31, 2019 · 7 comments
Closed

crash in slow timer #2120

lazedo opened this issue Oct 31, 2019 · 7 comments

Comments

@lazedo
Copy link
Contributor

lazedo commented Oct 31, 2019

Description

on busy servers, there's a sporadic crash in slow timer.
we don't have any debugging data anymore nor a way to reliably reproduce it.
just posting here in case someone sees the same and can find a way to reliably reproduce it and maybe test the patch below.

Troubleshooting

Reproduction

Possible Solutions

with the patch below we don't see any crashes
either tl or tl->f is null when it crashes. this is probably some race condition

diff --git a/src/core/timer.c b/src/core/timer.c
index 0e0dc8812..3c59db3da 100644
--- a/src/core/timer.c
+++ b/src/core/timer.c
@@ -1128,7 +1128,7 @@ void slow_timer_main()
 #endif
 				SET_RUNNING_SLOW(tl);
 				UNLOCK_SLOW_TIMER_LIST();
-					ret=tl->f(*ticks, tl, tl->data);
+					ret= (tl->f ? tl->f(*ticks, tl, tl->data) : 0);
 					/* reset the configuration group handles */
 					cfg_reset_all();
 					if (ret==0){

Additional Information

  • Kamailio Version - output of kamailio -v
5.1 / 5.2
  • Operating System:
CentOS7
@miconda
Copy link
Member

miconda commented Nov 5, 2019

Are the latest 5.1 / 5.2 versions? It seems to be some reset timer struct used there.

I am fine to push such safety check, but I would do it with some warning log message. I am going to push a patch and then you can adjust further if you need something else.

@lazedo
Copy link
Contributor Author

lazedo commented Nov 5, 2019

@miconda yes, latest 5.1 / 5.2 under heavy load. thanks

@miconda
Copy link
Member

miconda commented Nov 5, 2019

Pushed 574b080 .

Have noticed some blocking of/slow message processing before that happening?

@lazedo
Copy link
Contributor Author

lazedo commented Nov 5, 2019

no warnings , just crash.
what are the odds of tl being null ? we had some crashes where fl was null. maybe something is grabbing or invalidating the timer between UNLOCK_SLOW_TIMER_LIST and next instruction ? is that even possible ?

@miconda
Copy link
Member

miconda commented Nov 5, 2019

Did you mean f or tl in the "we had some crashes where fl was null"?

Probably the tl should not be null, because its value (pointer) should be copied in the timer process list. But its content depends on each module that adds items to the timer list. So tl in timer list should be a non-null value, but can end up pointing to an address which is no longer a timer_ln struct.

@lazedo
Copy link
Contributor Author

lazedo commented Nov 5, 2019

sorry, meant to write tl

@miconda
Copy link
Member

miconda commented Nov 11, 2019

OK.

Closing this one, with the patch applied -- if you get new data, reopen or create a new issue.

@miconda miconda closed this as completed Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants