New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Licode stops automatically after several days #688

Open
prakharsingh opened this Issue Jan 11, 2017 · 7 comments

Comments

Projects
None yet
7 participants
@prakharsingh

prakharsingh commented Jan 11, 2017

I have deployed licode on Ubuntu Server 14.04 with node v6.9.3. After some days licode automatically stops. I am posting last logs.

2017-01-07 18:36:29,593  - DEBUG [0x7b20f6ddd700] DtlsTransport - id: 124823771280335060,  message: closed
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29,593  - DEBUG [0x7b20f6ddd700] WebRtcConnection - id: 124823771280335060,  message: Destructor ended
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29,594  - DEBUG [0x7b20f6ddd700] DtlsTransport - id: 124823771280335060,  message: destroying
2017-01-07 18:36:29,594  - DEBUG [0x7b20f6ddd700] DtlsTransport - id: 124823771280335060,  message: destroyed
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29,594  - DEBUG [0x7b20f6ddd700] NiceConnection - id: 124823771280335060,  message: destroying
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29,594  - DEBUG [0x7b20f6ddd700] NiceConnection - id: 124823771280335060,  message: destroyed
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29.595  - INFO: ErizoJSController - message: muxer closed succesfully, id: 124823771280335060, OK
[erizo-b1e74e2c-1905-9f6a-217a-a67af5ce3246] 2017-01-07 18:36:29.595  - INFO: ErizoJSController - message: Removed all publishers. Killing process.
2017-01-07 18:36:29.604  - INFO: ErizoAgent - message: closed, erizoId: b1e74e2c-1905-9f6a-217a-a67af5ce3246
2017-01-07 18:36:29.608  - INFO: ErizoAgent - message: launched new ErizoJS, erizoId: b550798e-d480-f518-028b-3a416aa069aa
[erizo-b550798e-d480-f518-028b-3a416aa069aa] 2017-01-07 18:36:29.830  - INFO: ErizoJS - message: Started, erizoId: b550798e-d480-f518-028b-3a416aa069aa
2017-01-07 18:36:47.621  - INFO: EcCloudHandler - message: deleting erizoJS, erizoId: b1e74e2c-1905-9f6a-217a-a67af5ce3246
2017-01-07 23:53:33.641  - WARN: CloudHandler - I received a keepAlive message from an unknown erizoController
2017-01-07 23:53:33.642  - ERROR: ErizoController - message: This ErizoController does not exist in cloudHandler to avoid unexpected behavior this ErizoController will die
2017-01-07 23:53:33.646  - INFO: CloudHandler - [CLOUD HANDLER]: ErizoController in host  158.69.124.95 wants to be killed.```
@prakharsingh

This comment has been minimized.

Show comment
Hide comment
@prakharsingh

prakharsingh Jan 19, 2017

@lodoyun Do you see any abnormality in the logs? This happens for me after every two weeks. I am on an OVH server running ubuntu 14.04 server.

prakharsingh commented Jan 19, 2017

@lodoyun Do you see any abnormality in the logs? This happens for me after every two weeks. I am on an OVH server running ubuntu 14.04 server.

@zevarito

This comment has been minimized.

Show comment
Hide comment
@zevarito

zevarito Jan 19, 2017

Contributor
Contributor

zevarito commented Jan 19, 2017

@jcague

This comment has been minimized.

Show comment
Hide comment
@jcague

jcague Feb 1, 2017

Contributor

as @zevarito comments a possible reason is that rabbitmq connection gets closed at some point, so it might be a network issue. We've not seen this issue in our servers though.

Contributor

jcague commented Feb 1, 2017

as @zevarito comments a possible reason is that rabbitmq connection gets closed at some point, so it might be a network issue. We've not seen this issue in our servers though.

@HemanthAnakapalle

This comment has been minimized.

Show comment
Hide comment
@HemanthAnakapalle

HemanthAnakapalle Aug 30, 2017

Hi,

We are also facing the same issue. Please find the logs below:
(changed ipAddress and roomName with *'s)

NOTE: In our case, issue is not after weeks time of licode start. It happened after few days (~2 days).

2017-08-24 23:08:53.898  - WARN: CloudHandler - ErizoController 0  in  ***.***.com does not respond. Deleting it.
2017-08-24 23:08:53.905  - WARN: CloudHandler - I received a keepAlive message from an unknown erizoController
2017-08-24 23:08:53.905  - ERROR: ErizoController - message: This ErizoController does not exist in cloudHandler to avoid unexpected behavior this ErizoController will die
2017-08-24 23:08:53.907  - INFO: CloudHandler - [CLOUD HANDLER]: ErizoController in host  ***.***.com wants to be killed.
2017-08-25 03:22:50.769  - INFO: NuveAuthenticator - message: authenticate fail - MAuth header not present
2017-08-25 14:55:27.769  - INFO: RoomsResource - message: createRoom success, roomName:******, serviceId: superService, p2p: undefined
2017-08-25 14:55:27.913  - ERROR: CloudHandler - No erizoController is available.
2017-08-25 14:55:38.922  - ERROR: TokensResource - message: createToken error, errorMgs: No Erizo Controller available

In this scenario, we can notice erizoController.js & rabbitmq is still running, because of which there is no service restart (But CloudHandler logs no erizoController), which results in createToken error.

To overcome this, we added process.exit(0), So that entire service will be restarted.


            var intervarId = setInterval(function () {

                amqper.callRpc('nuve', 'keepAlive', myId, {'callback': function (result) {
                    if (result === 'whoareyou') {

                        // TODO: It should try to register again in Cloud Handler.
                        // But taking into account current rooms, users, ...
                        log.error('message: This ErizoController does not exist in cloudHandler ' +
                                  'to avoid unexpected behavior this ErizoController will die');
                        clearInterval(intervarId);
                        amqper.callRpc('nuve', 'killMe', publicIP, {callback: function () {}});
                        process.exit(0);  <=== Introduced this line.
                    }
                }});

            }, INTERVAL_TIME_KEEPALIVE);

Is this fine?

And how we can fix this issue?

Thanks & Regards,
Hemanth

HemanthAnakapalle commented Aug 30, 2017

Hi,

We are also facing the same issue. Please find the logs below:
(changed ipAddress and roomName with *'s)

NOTE: In our case, issue is not after weeks time of licode start. It happened after few days (~2 days).

2017-08-24 23:08:53.898  - WARN: CloudHandler - ErizoController 0  in  ***.***.com does not respond. Deleting it.
2017-08-24 23:08:53.905  - WARN: CloudHandler - I received a keepAlive message from an unknown erizoController
2017-08-24 23:08:53.905  - ERROR: ErizoController - message: This ErizoController does not exist in cloudHandler to avoid unexpected behavior this ErizoController will die
2017-08-24 23:08:53.907  - INFO: CloudHandler - [CLOUD HANDLER]: ErizoController in host  ***.***.com wants to be killed.
2017-08-25 03:22:50.769  - INFO: NuveAuthenticator - message: authenticate fail - MAuth header not present
2017-08-25 14:55:27.769  - INFO: RoomsResource - message: createRoom success, roomName:******, serviceId: superService, p2p: undefined
2017-08-25 14:55:27.913  - ERROR: CloudHandler - No erizoController is available.
2017-08-25 14:55:38.922  - ERROR: TokensResource - message: createToken error, errorMgs: No Erizo Controller available

In this scenario, we can notice erizoController.js & rabbitmq is still running, because of which there is no service restart (But CloudHandler logs no erizoController), which results in createToken error.

To overcome this, we added process.exit(0), So that entire service will be restarted.


            var intervarId = setInterval(function () {

                amqper.callRpc('nuve', 'keepAlive', myId, {'callback': function (result) {
                    if (result === 'whoareyou') {

                        // TODO: It should try to register again in Cloud Handler.
                        // But taking into account current rooms, users, ...
                        log.error('message: This ErizoController does not exist in cloudHandler ' +
                                  'to avoid unexpected behavior this ErizoController will die');
                        clearInterval(intervarId);
                        amqper.callRpc('nuve', 'killMe', publicIP, {callback: function () {}});
                        process.exit(0);  <=== Introduced this line.
                    }
                }});

            }, INTERVAL_TIME_KEEPALIVE);

Is this fine?

And how we can fix this issue?

Thanks & Regards,
Hemanth

@arsenicraghav

This comment has been minimized.

Show comment
Hide comment
@arsenicraghav

arsenicraghav Sep 25, 2017

I am also facing the same issue and more specifically Nuve stops responding on port 3000 after Licode has been running for a day or more.

I can see the node process running fine on the machine.

Thanks

arsenicraghav commented Sep 25, 2017

I am also facing the same issue and more specifically Nuve stops responding on port 3000 after Licode has been running for a day or more.

I can see the node process running fine on the machine.

Thanks

@j1122

This comment has been minimized.

Show comment
Hide comment
@j1122

j1122 Mar 21, 2018

Hi @HemanthAnakapalle

Tried to add process.exit(0) but no effect when i stop rabbitmq or mongodb service.
Any condition to make entire service restart?

j1122 commented Mar 21, 2018

Hi @HemanthAnakapalle

Tried to add process.exit(0) but no effect when i stop rabbitmq or mongodb service.
Any condition to make entire service restart?

@neumerance

This comment has been minimized.

Show comment
Hide comment
@neumerance

neumerance May 12, 2018

do we have a fix for this? im having this issue. it happens to me many times.

neumerance commented May 12, 2018

do we have a fix for this? im having this issue. it happens to me many times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment