Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlops cause infinite loop if database server is unreachable #1690

Closed
qmphan opened this issue Oct 25, 2018 · 3 comments
Closed

sqlops cause infinite loop if database server is unreachable #1690

qmphan opened this issue Oct 25, 2018 · 3 comments

Comments

@qmphan
Copy link
Contributor

qmphan commented Oct 25, 2018

Description

The sqlops module run into an infinite loop and print continously this line to the stderr (or stdout):

22(17903) ERROR: [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]

Troubleshooting

Reproduction

Set log_stderror=yes
Load the sqlops module and configure it with a bad DB host.

Example:

log_stderror=yes
loadmodule "sqlops.so"
modparam("sqlops","sqlcon","main=>postgres://abc:abc@INVALID_HOST/test")

Debugging Data

Log Messages

22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
22(17903) ERROR: <core> [core/io_wait.h:1040]: io_wait_loop_epoll(): epoll_wait(10, 0x2ab72bb7b690, 0, 5000): Invalid argument [22]
...

SIP Traffic

(paste your sip traffic here)

Possible Solutions

Additional Information

  • Kamailio Version - output of kamailio -v
5.1.6
  • Operating System:
Centos 7
Kernel: 3.10.0-514.26.1.el7.x86_64
@miconda
Copy link
Member

miconda commented Oct 26, 2018

sqlops doesn't connect itself to external systems, likely there was an error related to sqlops before those that you can see. Being the start of kamailio, likely those are printed due to large number of children, but starting kamailio should fail at some point and everything should stop. For how long did you let kamailio run and you could see those messages in the logs?

@qmphan
Copy link
Contributor Author

qmphan commented Oct 26, 2018

You are right. It stopped printing (and stop running too) after ~70 seconds after being started.

I guess it tried to connect to the database and the socket get closed immediately by the OS and the handle to the socket became invalide. However, the handle was still used in some epoll_wait call and caused the epoll_wait function to return immediately, instead of waiting for the delay provided in argument...

@miconda
Copy link
Member

miconda commented Oct 26, 2018

The duration may be a result of waiting for child process to say it started successfully and exit_timeout global parameter value, which is 60sec. So if one of the child processes does not start properly (does not send back I am ok), the main process will start shutdown procedure.

This is how it was designed, maybe things can be improved, if anyone wants to submit something here, it's more than welcome.

I am closing this issue for now.

@miconda miconda closed this as completed Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants