Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library does not handle server/network failure (disconnect basically). #14

Closed
adelin-mcbsoft opened this issue Jun 5, 2019 · 15 comments
Assignees
Labels
bug Something isn't working resolved the issue was resolved

Comments

@adelin-mcbsoft
Copy link
Contributor

adelin-mcbsoft commented Jun 5, 2019

Hi,

I'm using the code from the Client Example for an SSL server.
Most of it works as expected, however - if I shutdown the WebSocket server I'm connected to or even disconnect the network, no event is being triggered.
Surprisingly, even if i call client.available() it returns true, even though the server is shut down or there's no network connection to it...

I'm using an ESP-WROOM-32 board if and an NGinx WebSocket server. The socket server works without any issue with any browser client (tested EDGE, Firefox, Chrome) and these browsers successfully report a server disconnect or so.

As told, code is the one from the client example:

#include <ArduinoWebsockets.h>
#include <WiFi.h>

const char* ssid = "WiFi Name"; //Enter SSID
const char* password = "WiFi Password"; //Enter Password
const char* websockets_server = "wss://url.com:443/ws/"; //server adress and port

using namespace websockets;

void onMessageCallback(WebsocketsMessage message) {
    Serial.print("Got Message: ");
    Serial.println(message.data());
}

void onEventsCallback(WebsocketsEvent event, String data) {
    if(event == WebsocketsEvent::ConnectionOpened) {
        Serial.println("Connnection Opened");
    } else if(event == WebsocketsEvent::ConnectionClosed) {
        Serial.println("Connnection Closed");
    } else if(event == WebsocketsEvent::GotPing) {
        Serial.println("Got a Ping!");
    } else if(event == WebsocketsEvent::GotPong) {
        Serial.println("Got a Pong!");
    }

    Serial.println("Event triggered...");
}

WebsocketsClient client;
void setup() {
    Serial.begin(115200);
    // Connect to wifi
    WiFi.begin(ssid, password);

    // Wait some time to connect to wifi
    for(int i = 0; i < 10 && WiFi.status() != WL_CONNECTED; i++) {
        Serial.println(".");
        delay(1000);
    }

    Serial.println("Connected to WiFi... probably.");

    // Setup Callbacks
    client.onMessage(onMessageCallback);
    client.onEvent(onEventsCallback);
    
    // Connect to server
    client.connect(websockets_server);

    // Send a message
    client.send("This is a nice little message sent from ESP32.");
}

void loop() {
  if (client.available()) {
    client.poll();
  } else {
    Serial.println("OOPS!! Client Disconnected...");
    delay(5000);
  }
}

If there's any detail I can help you with to debug this issue, just let me know.
All the best,
A.

@gilmaimon
Copy link
Owner

gilmaimon commented Jun 5, 2019

Hi,

Are you sure it evens connect to the server? Looks like you are using SSL but you never provide a certificate..

Can you share the full serial output?

Also, client.available() definitely should not return true in your case (mainly because I believe you never connect)

Also, if you could turn on debug logs and share those here, it might help.

Thank you,
Gil.

@adelin-mcbsoft
Copy link
Contributor Author

Hi,

I’m pretty sure I was connected, as I was able to (send and) receive messages in the Serial output, while I was writing them in the other endpoints (Firefox, Chrome clients).

However, as told, when I shut down the nginx server, nothing happened. I was still in the client.available() loop, while the other endpoints did notice the disconnect. No other event was triggered.

Regarding providing a certificate... usually this part is server-side handled (I mean, nginx handles this)... I didn’t know is needed to provide it on the client side as well, as the CA is public. (most implementations go this way).

Anyway, the thing is I was connected. I will provide you a screen recording later so you can see.

Hope it helps,
Thanks,
A.

@gilmaimon
Copy link
Owner

That's so odd. During SSL handshake the client usually wants to validate the server's certificate, in esp8266's WiFiClientSecure you can either choose not to do it (trust that the server is the one you want to talk to) or use a fingerprint. In esp32's implementation (which I of course relay on) you have to provide a certificate and chain validation is mandatory as far as I know.

If you use SSL to communicate with the proxy server you still should be required to provide a certificate in order to validate that you are indeed communicating with the right endpoint.

Web clients usually has some bank of known root Certificate Authorities they can use to validate any certificate, esp32 does not.

I consider this a bug, I will look into it properly this weekend. I will appreciate any help (links and info) about setting up the same server setup as you use.

Gil.

@gilmaimon gilmaimon added the bug Something isn't working label Jun 5, 2019
@adelin-mcbsoft
Copy link
Contributor Author

adelin-mcbsoft commented Jun 6, 2019

Unfortunately, unlike ESP8266, on ESP32 I don't have an option menu from which I can turn on Debugging. At least not in Arduino IDE, so I'm not really sure how to do it.

The configuration file of the nginx server (which behaves as a proxy for the SSL implementation):

map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
}

server {
        listen 443 ssl default_server;
        ssl_prefer_server_ciphers on;
        ssl_certificate /path/to/certificate.pem;
        ssl_certificate_key /path/to/privkey.pem;

        access_log /path/to/domain-access.log;
        error_log /path/to/domain-error.log;

        root /path/to/domain-root/;
        index index.html index.htm;

        server_name local-server.name;

        location /ws/ {
            proxy_pass http://unix:/path/to/websocket-server.sock;
	    # also, IP:PORT can be used on the line above
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_redirect off;
            proxy_read_timeout 86400s;
            proxy_send_timeout 86400s;
            keepalive_timeout 86400s;
            # prevents 502 bad gateway error
            proxy_buffers 8 32k;
            proxy_buffer_size 64k;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto https;
            reset_timedout_connection on;
        }
}

In my configuration, I'm running nginx 1.12 compiled with OpenSSL under Debian 9.

As a test for chat implementation, I'm using a fork this draft: https://github.com/sanwebe/Chat-Using-WebSocket-and-PHP-Socket .
There are also many more other implementations online, but you can use any websocket implementation you want; the issue still stays with any of it, once you shut down the nginx server.

I've tested it today with a public WebSocket server as well ( https://www.websocket.org/echo.html -- which is SSL ) and I can connect without any issue, without having ESP32 asking for any CA or issuing any error.

I noticed a difference though: if you cut out WiFi to ESP32, client.available() returns false, but still no disconnect WebsocketsEvent is being triggered. Unfortunately, I have no control over the server for the URL above, so I couldn't stop the socket server to see if it behaves like in my example above, but most probably it would.

Let me know if I can help you with anything else,
I'll be glad to help.

Best,
Adelin

@gilmaimon
Copy link
Owner

Great, the information is super helpful! One last question: does this (zombie state) happen when you shut down the nginx server or when you shut down the websockets server behind it? Or both?

Gil.

@adelin-mcbsoft
Copy link
Contributor Author

adelin-mcbsoft commented Jun 6, 2019

Mhm, great question you had; just tested: only when I shut down nginx . If I shut down the server behind it (the PHP Socket implementation in my case), ESP32 does notice and client.available() returns false (but still no event is being triggered).
That's an interesting scenario.

Also, I tried re-starting nginx , but ESP32 doesn't notice; so connection is lost at some point... but still client.available() returns true.

Let me know if you want any more testing on anything.
Have a great day,
A.

Later edit: Anyway, that's odd... you would expect it to happen exactly vice-versa. At least from my point of view.

@gilmaimon gilmaimon self-assigned this Jun 7, 2019
@gilmaimon
Copy link
Owner

gilmaimon commented Jun 7, 2019

Ok so now I'm not on mobile and can expand my thoughts on this.

For connectivity checking I relay on the built-in WiFiClient from the esp32 platform libraries. What I believe is happening is that closing the proxy does not send a websockets close message, but only closes the TCP connection which the WiFiClient (library) does not handle properly.

but still no event is being triggered

That is an unwanted-behavior for sure, first thing I will address is the callback not being called when availability changes states. You should know how-ever that the event callback is for websockets events. So if no callback is being called, its probably because no websockets close message is being sent (which make sense

Second thing I will try to address is a websockets connection over a proxy and the scenarios you described.

Third thing is certificates on esp32. From what I know you can't connect without setting a CA, but you seem to have no problem, so either the library changed or something else is going on.

Thank you for the detailed issue, I might split this into several other issues in-order to better track tasks.
Also, you should know this might take a while.

Gil

@gilmaimon
Copy link
Owner

UPDATE:

I've fixed the close event issues. That means that the callback will be properly called even when the connection is dropped (Not merged yet).

Regarding the nginx issue, I believe the problem is inherited from esp32's WiFiClient implementation and has nothing to do with the library itself. Testing the same exact code as in the library on desktop (I have a project called TinyWebsockets which is the same codebase but for desktop) shutting down the server behind the proxy or the proxy itself works as expected.

What you can do: you can try doing an "active available call". Basically, send a ping before calling available. That should test the TCP connection and the esp32 will notice that something isn't right. Available should return false and an event will be triggered. You can either do it manually by calling ping before available or you could pass true to available, like this:

if(client.available(true)) {
    // code...
}

Notice that if you have no delay between calls to available, this will spam your server with pings. So I suggest you either do it periodically or have a decent delay between calls.

Sadly, there isn't much I can do about this issue. You could however open an issue on espressif's github.

Can you do those 2 extra things:

  • Try running the sketch with the board as ESP32 Dev Module with the highest debug level (verbose) and share the full logs for each scenario here?
  • Try running the same sketch on an esp8266, if you have one.

Thank You,
Gil.

@adelin-mcbsoft
Copy link
Contributor Author

adelin-mcbsoft commented Jun 8, 2019

Testing done, both ESP8266 and ESP32.
Results conclusions with the forked branch (ArduinoWebsockets-fix_issue_14):

ESP8266: Library notices if the nginx proxy server goes down, no matter if client.available() is set to (true) or (false) (though I only tested with false, was enough for me).

ESP32: Library notices that nginx proxy server goes down, ONLY when using client.available(true), but indeed spams the server big-time... (for me personally, this is an issue, cannot allow this level of spamming or any delay -- that's the main reason I'm using WebSockets in the first place, to have an instant communication, with no delays).

Attached are 3 LOGS for all the situations and the used code for testing. As said, didn't test client.available(true) with ESP8266 as it didn't make sense, as it works with (false).

debug_esp32_client_available_true.txt
debug_esp32_client_available_false.txt
debug_esp8266_client_available_false.txt
code_used_for_testing.ino.txt

By the way, in the ESP32 log, you'll find surprisingly information about SSL, but that's another story.

[V][ssl_client.cpp:53] start_ssl_client(): Free internal heap before TLS 277316
[V][ssl_client.cpp:55] start_ssl_client(): Starting socket
[V][ssl_client.cpp:90] start_ssl_client(): Seeding the random number generator
[V][ssl_client.cpp:99] start_ssl_client(): Setting up the SSL/TLS structure...
[I][ssl_client.cpp:153] start_ssl_client(): WARNING: Use certificates for a more secure communication!
[V][ssl_client.cpp:177] start_ssl_client(): Setting hostname for TLS session...
[V][ssl_client.cpp:192] start_ssl_client(): Performing the SSL/TLS handshake...
[V][ssl_client.cpp:213] start_ssl_client(): Verifying peer X.509 certificate...
[V][ssl_client.cpp:222] start_ssl_client(): Certificate verified.
[V][ssl_client.cpp:237] start_ssl_client(): Free internal heap after TLS 236136
[E][WiFiClient.cpp:282] setOption(): fail on fd -1, errno: 9, "Bad file number"

Regarding our issue, right now I'm looking forward to see what can be done to the WiFi.h compared to Esp8266Wifi.h, as it seems - as you said too - there lies the problem. I'm looking forward for the fastest way address this.

Anyway, I appreciate your time for addressing these issues, this library is really important to me; hope I will be able to support your work one day, if my on-going project succeeds :) (right now, I'm building something from scratch from my own time & funds).

Let me know if you have feedback on the attached logs (including the SSL part),
Great work Gil,
Many thanks,
A.

@gilmaimon
Copy link
Owner

gilmaimon commented Jun 8, 2019

I'm glad to hear that this library is important to you. It is important to me aswell and seeing other users using it and helping to improve it makes me really happy. Openning this issue is very helpfull, and is the only kind of support I need 😃

A note: consider using active available calls periodically (for example, every minute or so) - so the server won't be under too much pressure. By the way, trying to send any kind of message to the server will also have the same effect as an active available call.

Thank you for the kind words and the detailed issue. I have merged the changes into the master branch and advanced to a new patch. I will keep the branch open so I will remember looking into the SSL logs. If you don't mind, I will also tag this as resolved.

Thanks again and good luck with your project!

Best wishes,
Gil.

@adelin-mcbsoft
Copy link
Contributor Author

adelin-mcbsoft commented Jun 11, 2019

Hi Gil,

Sorry for the delayed-reply, weekend caught me way too long this time... (and is Tuesday already) :D .

Yes, somehow I'll find a solution using periodical calls to the client.available(true) call. I'm determined to use your library as it completely fulfills my needs.

Hope in the near future Espresiff will also fix the issue in the WiFi library. I would want to open an issue on their Github page regarding this, but the only thing standing in my way is the fact that I'm not really sure how to describe the issue without going into the Sockets library story, as I'm afraid they won't care when I'll mention the fact that I'm using another library to trigger the issue.
Perhaps, when you have a bit of time, you could shed some light here.

Thank you as well and once again for the work you've put in all this project, really helpful for me and I'm sure for other people too! When my project will be a successful one, I won't forget your name :))

All the best,
A.

@gilmaimon
Copy link
Owner

The discussion continued on #18.

@mikegoubeaux
Copy link

Hi Gil,

I'm having a related issue with latest (0.5) library.

I am connecting to a Red digital cinema camera that has a websocket server in it - though I have no access or control to that server - it's sealed up inside the camera.

Everything is working really well to send and receive websocket data to the camera.

However, like the OP, when I power down the camera, remove the wifi connection to the camera, or otherwise break the TCP connection, client.available() is not returning false, even when doing a manual client.ping before client.available.

Pongs stop coming back, but available still returns true. Am I doing something wrong to force an availability check?

I'm using an esp8266 (ESP-12E). And am currently using the example to test this issue:

#include <ArduinoWebsockets.h>
#include <ESP8266WiFi.h>

const char* ssid = "gfunkish"; //Enter SSID
const char* password = "11111111"; //Enter Password
const char* websockets_server_host = "192.168.7.142"; //Enter server adress
const uint16_t websockets_server_port = 9998; // Enter server port

using namespace websockets;

void onMessageCallback(WebsocketsMessage message) {
    Serial.print("Got Message: ");
    Serial.println(message.data());
}

void onEventsCallback(WebsocketsEvent event, String data) {
    if(event == WebsocketsEvent::ConnectionOpened) {
        Serial.println("Connnection Opened");
    } else if(event == WebsocketsEvent::ConnectionClosed) {
        Serial.println("Connnection Closed");
    } else if(event == WebsocketsEvent::GotPing) {
        Serial.println("Got a Ping!");
    } else if(event == WebsocketsEvent::GotPong) {
        Serial.println("Got a Pong!");
    }
}

WebsocketsClient client;
void setup() {
    Serial.begin(500000);
    // Connect to wifi
    WiFi.begin(ssid, password);

    // Wait some time to connect to wifi
    for(int i = 0; i < 10 && WiFi.status() != WL_CONNECTED; i++) {
        Serial.print(".");
        delay(1000);
    }

    // run callback when messages are received
    client.onMessage(onMessageCallback);
    
    // run callback when events are occuring
    client.onEvent(onEventsCallback);

    // Connect to server
    client.connect(websockets_server_host, websockets_server_port, "/");

    // Send a message
    client.send("{\"type\":\"rcp_config\",\"strings_decoded\":0,\"json_minified\":0,\"include_cacheable_flags\":0,\"encoding_type\":\"legacy\"}");

    // Send a ping
    client.ping();
}

void loop() {
  client.ping();
  if(client.available()){
    client.poll();
    Serial.println("true");
  }
}

@adelin-mcbsoft
Copy link
Contributor Author

Hi @mikegoubeaux ,

Just a tip (for now, as I don't have the necessary time to dive into code at the moment):
Try to monitor the socket close event:

client.onEvent([&](WebsocketsEvent event, String data) {
    if(event == WebsocketsEvent::ConnectionOpened) {
        Serial.println(F("WebSockets: Event // Connnection opened."));
    } else if(event == WebsocketsEvent::ConnectionClosed) {
        Serial.print(F("WebSockets: Event // Connnection closed. Reason: "));
        Serial.println(client.getCloseReason());
        /* Do your connection close routine here */
    } else if(event == WebsocketsEvent::GotPing) {
        Serial.println(F("WebSockets: Event // Got a Ping!"));
    } else if(event == WebsocketsEvent::GotPong) {
        Serial.println(F("WebSockets: Event // Got a Pong!"));
    }
});

If you indeed get the ConnectionClosed event triggered but client.available() still returns true, it could be a bug in the library and it can be fixed for sure. However, as told, can't test it right now.

On my end, with ESP8266 I managed to get the status correctly all the time. Only ESP32 caused troubles.
Alternatively, you could use client.available(true) which sends a ping and waits for the pong reply to confirm server's availability, and use millis() (or even better, a timer library - e.g. ArduinoTimer library) to check the status once every x seconds, just to prevent flooding the server with ping cals.

Hope it helps,
Best,
A.

@mikegoubeaux
Copy link

Thanks @gilmaimon !
I've been monitoring all of the socket events. All of them trigger (connection open, ping, and pong) except connectionClosed.

Great suggestion on ArduinoTimer, I am using it to call client.available(true). While the server is running, GotPongs are triggered. Then, when I shut down the server (power off the camera), client.available() still returns true, yet GotPong stops being triggered.

I'm calling client.available(true) every 2 seconds, though I've tried other rates as well.

Sure you're busy, but not sure where to go from here. LOVE the library. Thanks for all of your great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolved the issue was resolved
Projects
None yet
Development

No branches or pull requests

3 participants