Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp8266 client not connecting on local broker (re-open) #50

Closed
pottendo opened this issue Dec 30, 2020 · 2 comments
Closed

esp8266 client not connecting on local broker (re-open) #50

pottendo opened this issue Dec 30, 2020 · 2 comments

Comments

@pottendo
Copy link

hi,
I'm stuck... so I have to reopen this:
#47

However there is a difference now: my code works well when connected to my local WiFi (Fritzbox + Netgear Wifi-extender);
-> therefore I reported success and closed issue #47
However it only works on this particular WiFi network... If I try to span a hotspot with android phones (tried 3 different brands) or I use another WiFi Router (Huawai) it gets stuck, where the mqtt client connects to the local broker (i.e. the broker sits on the same 8266). WiFi.isConnected() reports success before trying to connect to the broker.

The behavior is deterministic: it always works on the one particular Wifi, and it fails on all others (even outside at the final place on another router).

I've tried many combinations on Wifi settings (different encryptions, etc.) but no success. I don't have an idea why a local (!) connection (i.e. a client connects to the broker on the same uController) shall fail dependent on the real WiFi connected...

Here the good case (logs from uMQTT enabled MQTT_INFO):
[...]
17:15:56.149 -> Wifi connecting...
17:15:56.645 -> Setting hostname fcce
17:15:56.645 -> MDNS add Service: 1
17:15:56.645 -> fcce IP = 192.168.188.34, GW = 192.168.188.1, DNS = 192.168.188.1
17:15:56.645 -> Mode: STA
17:15:56.645 -> PHY mode: N
17:15:56.645 -> Channel: 11
17:15:56.645 -> AP id: 0
17:15:56.645 -> Status: 5
17:15:56.645 -> Auto connect: 1
17:15:56.645 -> SSID (12): EXT
17:15:56.645 -> Passphrase (13): XXXXXX
17:15:56.645 -> BSSID set: 0
17:16:01.641 -> Starting MQTT server on port 1883
17:16:01.674 -> MQTT:InitConnection
17:16:01.674 -> MQTT:InitClient
17:16:01.674 -> MQTT: Client task activated - state 0
17:16:01.707 -> TCP: Connect to ip 192.168.188.34:1883
17:16:01.740 -> MQTT: queue subscribe, topic"fcce/config", id: 1
17:16:01.740 -> MQTT: queue subscribe, topic"fcc/#", id: 2
17:16:01.740 -> MQTT: queuing publish, length: 51, queue size(34/2048)
17:16:01.773 -> MQTT: Client task activated - state 12
see below, here the difference starts:
17:16:01.972 -> MQTT: Connected to broker 192.168.188.34:1883
17:16:01.972 -> MQTT: Sending, type: 1, id: 0000
17:16:01.972 -> MQTT: Client task activated - state 16
17:16:01.972 -> MQTT_ClientCon_connected_cb(): Client connected
17:16:01.972 -> MQTT: InitClientCon
17:16:02.005 -> MQTT_ClientCon_recv_cb(): 33 bytes of data received
17:16:02.005 -> MQTT: TCP: data received 33 bytes (State: 14)
17:16:02.005 -> MQTT: total_len: 33
17:16:02.005 -> MQTT: message_type: 1
17:16:02.005 -> MQTT: Connect received, message_len: 33
17:16:02.005 -> MQTT: Connect flags 6
17:16:02.005 -> MQTT: Keepalive 120
17:16:02.005 -> MQTT: Client id "fcce"
17:16:02.005 -> MQTT: LWT topic /lwt
17:16:02.005 -> MQTT: 7 bytes of LWT data
17:16:02.005 -> MQTT: Server task activated - state 19
17:16:02.005 -> MQTT: Sending, type: 2, id: 0000
17:16:02.005 -> TCP: Sent
[...]

and here the bad case (I have a second router with a different local IP range):
17:18:09.686 -> Wifi connecting...
17:18:10.183 -> Setting hostname fcce
17:18:10.183 -> MDNS add Service: 1
17:18:10.183 -> fcce IP = 10.0.0.2, GW = 10.0.0.138, DNS = 10.0.0.138
17:18:10.183 -> Mode: STA
17:18:10.183 -> PHY mode: N
17:18:10.183 -> Channel: 1
17:18:10.183 -> AP id: 0
17:18:10.183 -> Status: 5
17:18:10.183 -> Auto connect: 1
17:18:10.183 -> SSID (9): T
17:18:10.183 -> Passphrase (13): XXXXXXX
17:18:10.183 -> BSSID set: 0
17:18:15.177 -> Starting MQTT server on port 1883
17:18:15.210 -> MQTT:InitConnection
17:18:15.210 -> MQTT:InitClient
17:18:15.210 -> MQTT: Client task activated - state 0
17:18:15.288 -> TCP: Connect to ip 10.0.0.2:1883
17:18:15.288 -> MQTT: queue subscribe, topic"fcce/config", id: 1
17:18:15.288 -> MQTT: queue subscribe, topic"fcc/#", id: 2
17:18:15.288 -> MQTT: queuing publish, length: 51, queue size(34/2048)
17:18:15.288 -> MQTT: Client task activated - state 12
see above here the difference starts
17:18:18.286 -> MQTT: queuing publish, length: 21, queue size(87/2048)
17:18:18.286 -> MQTT: Client task activated - state 12
17:18:20.271 -> fcce - ping /uptime 00h:00m:19s, free mem = 35232
17:18:20.304 -> MQTT: queuing publish, length: 52, queue size(110/2048)
17:18:20.304 -> MQTT: Client task activated - state 12
17:18:21.296 -> MQTT: queuing publish, length: 21, queue size(164/2048)
17:18:21.296 -> MQTT: Client task activated - state 12
17:18:21.461 -> TCP: Reconnect to 10.0.0.2:1883
17:18:21.461 -> MQTT: Client task activated - state 10
17:18:22.321 -> MQTT: queuing publish, length: 20, queue size(187/2048)
17:18:22.321 -> MQTT: Client task activated - state 10
17:18:24.306 -> MQTT: queuing publish, length: 21, queue size(209/2048)
[...]

I could extract the code to relevant minimum to be able to ship it to someone - if this is needed, let me know and I'll invest the work to do this.
I went through the uMQTT-code and from the debug info I find that in mqtt.c:
[...]
espconn_regist_connectcb(mqttClient->pCon, mqtt_tcpclient_connect_cb);
espconn_regist_reconcb(mqttClient->pCon, mqtt_tcpclient_recon_cb);
[...]
the first callback seems never being called as
MQTT_INFO("MQTT: Connected to broker %s:%d\r\n"
is not seen in the bad case log above - seen in the good case, though.

Interesting is that the second callback registered to espconn is being called on reconnection:
MQTT_INFO("TCP: Reconnect to %s:%d\r\n"
is seen in the log above (bad case).

Any support would be much appreciated!

thanx in advance, pottendo

PS: I've tried several WiFi setups (e.g. AutoConnect) - now I'm back to hardwire the IP adresses - tried also with DHCP but no difference.
I also move the project to arduino 1.8.13 IDE - to rule out some issue with platformIO (my main dev. platform). There I've tried variants of LwIP - which made no difference. I couldn't enable the CORE platform debug flags, as the code won't build then (some missing snprintf(..)).

My code looks like this (If 'TEST' or 'MOBILE' is enable, connection fails, with 'EXT' it works)"

#include <ESP8266WiFi.h>
#include "defs.h"
//#define MOBILE
#define TEST
//#define EXT

void setup_wifi(void)
{
//WiFi.setPhyMode(WIFI_PHY_MODE_11N);
WiFi.mode(WIFI_STA);

#ifdef MOBILE
IPAddress ip(192, 168, 43, 67);
IPAddress gw(192, 168, 43, 1);
IPAddress dns(192, 168, 43, 1);
IPAddress subnet(255, 255, 255, 0);
WiFi.config(ip, gw, subnet, dns);
WiFi.begin("M2", "XXXXX");
#endif
#ifdef TEST
IPAddress ip(10,0,0,2);
IPAddress gw(10,0,0,138);
IPAddress dns(10,0,0,138);
IPAddress subnet(255, 255, 255, 0);
WiFi.config(ip, gw, subnet, dns);
WiFi.begin("T", "XXXXX");
#endif
#ifdef EXT
IPAddress ip(192, 168, 188, 34);
IPAddress gw(192, 168, 188, 1);
IPAddress dns(192, 168, 188, 1);
IPAddress subnet(255, 255, 255, 0);
WiFi.config(ip, gw, subnet, dns);
WiFi.begin("EXT", "XXXX");
#endif
while (!WiFi.isConnected())
{
log_msg("Wifi connecting...");
delay(500);
}
log_msg("Setting hostname " + hostname);
while (!MDNS.begin(hostname.c_str()))
{
log_msg("DNS setup failed.");
}
printf("MDNS add Service: %d\n", MDNS.addService("mqtt", "tcp", 1883));

printf("fcce IP = %s, GW = %s, DNS = %s\n",
       WiFi.localIP().toString().c_str(),
       WiFi.gatewayIP().toString().c_str(),
       WiFi.dnsIP().toString().c_str());
WiFi.printDiag(Serial);

delay(2000);
mqtt_broker.init();
delay(2000);

}

and the client connects then like this on the same esp8266:

#define MQTT_SERVER WiFi.localIP().toString().c_str()
void setup_mqtt(void)
{
mqtt_client = new MQTT{my_clientID.c_str(), MQTT_SERVER, 1883};
// setup callbacks
mqtt_client->onConnected(myConnectedCb);
mqtt_client->onDisconnected(myDisconnectedCb);
mqtt_client->onPublished(myPublishedCb);
mqtt_client->onData(myDataCb);
log_msg("connect mqtt client...");
delay(20);
mqtt_client->connect();
delay(20);
mqtt_client->subscribe("fcce/config");
mqtt_client->subscribe("fcc/#");
log_msg("mqtt setup done");
mqtt_publish("/sensor-alive", "Formicula embedded starting...");
fcc_last_seen = millis(); /* give fcc 10min to connect, otherwise restart */
}

@pottendo
Copy link
Author

pottendo commented Jan 3, 2021

hi,
having spent some hours convincing my setup to get all debug-logs out of lwip, I conclude that the problem is (not totally surprising) the missing 'loopback' support of the lwip stack. Strangely there seems to be some code inside lwip to support loopback devices (which is deactivated by #defines), but even if activated in the lwip code, the netif for 'lo' is not registered by default.
Calling the respective routine (netif_init()) ultimately results in crashes. So I suspect that the code is either incomplete, broken otherwise or I just used it wrongly.
I didn't check lwip2, as there's the 5 connection limit - a show stopper for me.

I have no clue, why it works on my Fritzbox featured WiFi. I found a dependency to mDNS. If this is not used in my esp-code, it even doesn't work on this Fritzbox WiFi. With my limited understanding, I guess (which my be totally wrong) that the mDNS/Service register broadcasts trigger some magic in my router, which helps my esp8266 finding itself on the network.

So I give up for now, move to another client (with full TLS support) and need to setup a separate broker - experimenting with a public now (myqtthub).

I'd be happy to support here to make it work - however, this may need some profound understanding of the inner workings of IP (lwip). Another solution could be to have 'bypass' lwip in the uMQTT client directly into the local uMQTT broker, if a loopback scenario is somehow detected. This could be a rather slim change by 'just' calling some callbacks, expected to be called from the IP stack? @martin-ger, what do you think?

Anyway, I appreciate all the work here very much. Thanks to @martin-ger and the community. Let me know if I can help to test/debug or even contribute...
pottendo

@pottendo
Copy link
Author

pottendo commented Jan 4, 2021

hi again,
I've revisited the uMQTTBroker examples and found the direct API calls of the broker to subscribe/publish.
So my suggestion from above to have a 'shortcut' is already there - not in the shipped mqtt-client but still fine for my use case.

Now my architecture works as expected - even more stable, as it seems, compared to the usage of the mqtt client; but this needs to be confirmed; I have some tests running now.
Spending the time to investigate the (finally non-)problem was not useless for me. Learned alot about the arduino platform, platform IO and mqtt.

bye, pottendo

@pottendo pottendo closed this as completed Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant