Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library won't reconnect. #18

Closed
skorokithakis opened this issue Jul 28, 2016 · 96 comments
Closed

Library won't reconnect. #18

skorokithakis opened this issue Jul 28, 2016 · 96 comments
Labels

Comments

@skorokithakis
Copy link
Contributor

I have the library set to reconnect on disconnection, but that doesn't work, and I find the clients frequently disconnected even though they are on WiFi. Is there a setting I can use, or another way I can make them keep trying to reconnect?

@eos1d3
Copy link

eos1d3 commented Jul 28, 2016

Confirmed that the reconnection never works for me. And keepalive also never works. These ruins the library to useless. Otherwise, this library would be nice. It has everything. But these two issues kill everything.

I reported the issues for a while. Hope there is a chance to fix them.

@skorokithakis
Copy link
Contributor Author

Yeah, agreed. I'm thinking of either going back to the synchronous library, but I'll see if I can sidestep the issues by adding a call every 5 seconds to connect if not connected.

@eos1d3
Copy link

eos1d3 commented Jul 28, 2016

I tested this with Adafruit IO and also Mosquitto. I remember even I publish every 60 seconds, there will finally a disconnection which breaks the connection permanently until restart.
Adafruit IO keepalive is 5 mins. So every 60 seconds is good to keep connection. But what I found is:

  1. Sometimes there will be a disconnection with random time period.
  2. Sometimes disconnection is never detected and the the program still works as if connection is still good. But nothing is actually published.

@skorokithakis
Copy link
Contributor Author

Ouch, the second kills it. I can deal with random disconnections if I have the script try to connect every 10-15 seconds, but if it doesn't even know it's disconnected, that's a no-go...

@NeilSpp
Copy link

NeilSpp commented Jul 29, 2016

I have such a case, after the connection is disconnected permanently until restart. I publish every 60 seconds to server.

@marvinroger
Copy link
Owner

You're right, the library is not stable, that is why it is not actually released as v1. I've never had this kind of issue however. So you experience this behavior with the example sketch, for example?

We will obviously try to solve these issues, unfortunately I don't have ESPs at the moment for 2 weeks so I cannot do anything during this timelapse.

@eos1d3
Copy link

eos1d3 commented Jul 29, 2016

Right! Just tested with Fullfeatured example with Adafruit IO and Mosquitto on PI3. Both have Keepalive and disconnection problem. Once disconnected, there is no way to reconnect until restart.

I am using NodeMCU which is actually ESP12E. And all related software/libraries are all updated to the newest version.

This library has good potential. Don't give up!

I can help if you find something to change. You can tell me what to try if you want.

I also tried the library to force re-connection by passing the variable inside the cpp file. It also does not connect.

@marvinroger
Copy link
Owner

I won't give up, don't worry. 😉
Take a look at #1... This is weird! When I tested it, even with public brokers, it was very stable, even reconnections were stable. However I am not sure for the keep-alive.

Maybe an esp8266/Arduino or me-no-dev/ESPAsyncTCP update changed some things that we will need to figure out.

@eos1d3
Copy link

eos1d3 commented Jul 29, 2016

For me, with the FullFeatured example, I will have two conditions:

  1. The example program will detect disconnection at random time intervals, Sometimes 60 seconds, sometimes 5 mins.Once it detected, the library will try to reconnect, but it never gets connected. I tried to publish at a fixed internal, when it gets disconnected, the publish never reports error. It looks like connection is good. But I see nothing from the broker.
  2. The example program never detects the disconnection when connection is actually broken. I even tried to wait some hours and it just cannot resume.

@skorokithakis
Copy link
Contributor Author

skorokithakis commented Jul 29, 2016

You can see my usage here:

https://gitlab.com/stavros/sonoff/blob/08ac399becc01eb61666fdaa44c9daabd84d8e04/src/main.ino

I also experience disconnections that never reconnect. I have a hunch that onDisconnect fires, onConnect is called, and, if that fails, then the library never retries.

@marvinroger
Copy link
Owner

Can you test #19?

@kiralikbeyin
Copy link

Will be a fix very soon or i am planing to go with https://github.com/i-n-g-o/esp-mqtt-arduino

@marvinroger
Copy link
Owner

The thing is I am not able to reproduce... I've been running a sample sketch for 72 hours, publishing at different rates, and no disconnection so far. So I am kind of stuck!

@marvinroger
Copy link
Owner

When I restart my mosquitto instance, it reconnects just fine. I am pretty sure of the answer, but you're all using a modern mosquitto broker, right? I am using v1.4.9.

@skorokithakis
Copy link
Contributor Author

I'm using 1.4.8 :/ Too bad you can't reproduce this, it happened pretty regularly for me (disconnection after a few days, without reconnection).

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

Right, I use everything (Arduino IDE, EPS8266 library, ESPAsyncTCP,
Mosquitto broker on PI3...) with the newest version.
And it NEVER can reconnect once.

On Wed, Aug 17, 2016 at 3:23 AM, Marvin Roger notifications@github.com
wrote:

When I restart my mosquitto instance, it reconnects just fine. I am pretty
sure of the answer, but you're all using a modern mosquitto broker, right?
I am using v1.4.9.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#18 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/APlCi5i5BqJJ7C9QQOlS8YJcjB3zXdy3ks5qgg5BgaJpZM4JXrJY
.

@kiralikbeyin
Copy link

I'm using mosca, i think this is not related with broker.

maybe we all use ESPAsyncWebServer except @marvinroger

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

It may not be MQTT broker issue. The problem exists for both Adafruit IO and Mosquitto broker.

@skorokithakis
Copy link
Contributor Author

I switched the exact same code from async MQTT client to the normal, synchronous MQTT client and reconnections work fine.

@marvinroger
Copy link
Owner

@kiralikbeyin said he was using ESPAsyncWebServer, are you @skorokithakis and @eos1d3? As it uses the same underlying ESPAsyncTCP, there might be a conflict or something.

@skorokithakis
Copy link
Contributor Author

@kiralikbeyin
Copy link

kiralikbeyin commented Aug 16, 2016

#include <ESP8266WiFi.h>
#include <ESP8266mDNS.h>
#include <ArduinoOTA.h>
#include <FS.h>
#include <Hash.h>
#include <ESPAsyncTCP.h>
#include <ESPAsyncWebServer.h>
#include <ESP8266HTTPClient.h>
#include <ESP8266httpUpdate.h>
#include <StreamString.h>
#include <TimeLib.h>
#include <NtpClientLib.h>

#include "AsyncJson.h"
#include "ArduinoJson.h"
 #include <ESPAsyncWiFiManager.h>   

Should be a DNS problem

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

I use this: https://github.com/me-no-dev/ESPAsyncTCP

I did try two versions of the above in July, both have the same problem.

One thing I cannot confirm, when I tested this library for the first time (at the time you released it), I remember I see a lot of disconnection and reconnection. Then after changes of this and that, it never works now.

@marvinroger
Copy link
Owner

Are you all using hostnames instead of direct IP?

@skorokithakis
Copy link
Contributor Author

I'm using a hostname, yes. You think resolution is failing?

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

I use Mosquitto broker on PI3 at home with IP (192.168.2.xxx). It never works!

@kiralikbeyin
Copy link

mqttClient.setServer(IPAddress(172, 20, 10, 7), 1883);
BUT
MDNS.begin(config.DeviceName.c_str());

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

It can't be DNS problem as it can connect, public and subscribe normally for a while and then die.
If it is DNS, it won't reach the broker.

@eos1d3
Copy link

eos1d3 commented Aug 16, 2016

@marvinroger What is the date of your ESPAsyncTCP library?

@me-no-dev
Copy link

@eos1d3 can you add a check here to see if maybe ipaddr is NULL?

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

@me-no-dev Wait! Why is my version different from the above?

void AsyncClient::_s_dns_found(const char *name, ip_addr_t *ipaddr, void *arg){
  AsyncClient* c = reinterpret_cast<AsyncClient*>(arg);
  IPAddress remote_addr;
  remote_addr = ipaddr->addr;
  c->connect(remote_addr, c->_connect_port);
}

And it starts on line 359.

@me-no-dev
Copy link

i pushed some commits since, but nothing important :)

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

Waiting for exception, it takes time.

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

I changed this for NULL checking.

void AsyncClient::_s_dns_found(const char *name, ip_addr_t *ipaddr, void *arg){
  AsyncClient* c = reinterpret_cast<AsyncClient*>(arg);
  IPAddress remote_addr;
  if (ipaddr == NULL)
      Serial.println("------- ipaddr is NULL --------------");
  else
      Serial.println("------- ipaddr is not NULL --------------");
  remote_addr = ipaddr->addr;
  c->connect(remote_addr, c->_connect_port);
}

It has been a while and I still cannot see any exception.
Is there any chance for timing issue? (Need to wait a while to get ipaddr?)

@me-no-dev
Copy link

can't be timing :) this function is called by lwip when there is result. Interesting though... I hope you get the same exception so we figure out what is going on.

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

It was easy to get exceptions before . But now I still can't get one.

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

Finally got it!

** Disconnected from the broker **
Reconnecting to MQTT...
**# ------- ipaddr is NULL --------------**

Exception (28):
epc1=0x4020477b epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffd50 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffef0:  3ffeb438 00000000 3ffee9e4 4020477b  
3fffff00:  402261e8 00000000 3ffe8c50 00000000  
3fffff10:  3ffeec38 3fff0230 3ffefff0 40223a04  
3fffff20:  3fff0234 3fff0230 4e61563d 40223ad5  
3fffff30:  3fff0234 3fff0230 4e6137b8 40105e3c  
3fffff40:  40104fee 4e6174bd 60000600 40226391  
3fffff50:  40105e30 00000000 4e6174bd 00000000  
3fffff60:  4021be9c 3ffee3e8 3ffee410 60000600  
3fffff70:  fad17e6f 3ffee410 3ffee3e8 4021bea9  
3fffff80:  4021beee 3fffdab0 00000000 3fffdcb0  
3fffff90:  3ffee420 3fffdad0 3ffeed18 402056cf  
3fffffa0:  40000f49 40000f49 3fffdab0 40000f49  
<<<stack<<<

Decoding 16 results
0x4020477b: AsyncClient::_s_dns_found(char const*, ip_addr*, void*) at D:\Users\andy\Codes\Arduino\libraries\ESPAsyncTCP\src/ESPAsyncTCP.cpp line 510
0x4020477b: AsyncClient::_s_dns_found(char const*, ip_addr*, void*) at D:\Users\andy\Codes\Arduino\libraries\ESPAsyncTCP\src/ESPAsyncTCP.cpp line 510
0x402261e8: tcpip_tcp_timer at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/timers.c line 81
0x40223a04: dns_check_entry at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/dns.c line 680
0x40223ad5: dns_check_entries at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/dns.c line 727
:  (inlined by) dns_tmr at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/dns.c line 311
0x40105e3c: dns_timer at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/timers.c line 237
0x40104fee: wdt_feed at ?? line ?
0x40226391: sys_check_timeouts at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/timers.c line 420
0x40105e30: dns_timer at /Users/igrokhotkov/espressif/arduino/tools/sdk/lwip/src/core/timers.c line 233
0x4021be9c: ets_timer_handler_isr at ?? line ?
0x4021bea9: ets_timer_handler_isr at ?? line ?
0x4021beee: ets_timer_handler_isr at ?? line ?
0x402056cf: loop_task at C:\Users\andy\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.3.0\cores\esp8266/core_esp8266_main.cpp line 56

@me-no-dev
Copy link

what do you guys think is the best way to handle this case? call onDisconnect?

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

Pls explain to me the cause. onDisconnect is called twice to cause the problem?

@me-no-dev
Copy link

please pull the latest version of ESPAsyncTCP. It will call onError and onDisconnect in the case where DNS returns empty result (could not resolve the host).
It can happen for many reasons, but prolly the packet did not reach the DNS server or the ESP did not get the response.

@marvinroger
Copy link
Owner

I think onDisconnect is misused. It is called in case the connection failed (so you're not actually disconnected because you weren't actually connected). I would call this an error.

@me-no-dev
Copy link

many users might not attach to onError as it always leads to disconnect afterwards. I'm worried that they might miss the failure.

@marvinroger
Copy link
Owner

Makes sense. We can workaround this anyway, at the application level. ;)

Alright, let's wait for @eos1d3 feedback, so we can close both issues here and on ESPAsyncTCP!

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

Just tell me what you want me to do.
I still have time to test it.

@me-no-dev
Copy link

not get exception :)

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

@marvinroger Will you update your example so that we can confirm it works? And this saves time to test again for example.

@marvinroger
Copy link
Owner

There's nothing to chance, as @me-no-dev handled a DNS error the same way a normal failure would happen, so it would reconnect anyway. Nothing needs to change in the example :) just pull the latest ESPAsyncTCP git and check if there are no exceptions anymore.

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

OK. I will test it for at least one day. Will inform again if I find error.

BTW, is there any timeout setting in ESPAsyncTCP? And what is the default timeout?

@me-no-dev
Copy link

there is 5 seconds timeout between the time you send data and get an ACK from the other side. It's used to detect lost connection and calls onTimeout

@eos1d3
Copy link

eos1d3 commented Aug 20, 2016

When proxy or VPN is used, using 5 seconds may not be safe enough. And in some countries, like China, the internet is very slow to reach outside from China. I suggest giving an option for custom timeout setting.

@me-no-dev
Copy link

@me-no-dev
Copy link

ah nevermind... i see that I use the defined value here: https://github.com/me-no-dev/ESPAsyncTCP/blob/master/src/ESPAsyncTCP.cpp#L309

@me-no-dev
Copy link

done: me-no-dev/ESPAsyncTCP@5987225

@eos1d3
Copy link

eos1d3 commented Aug 21, 2016

It has been running for 15 hrs without any error with disconnection of 60 seconds intervals. This issue can be closed.

Great work! Thanks!

@me-no-dev Is ASYNC_MAX_ACK_TIME applied to both RX and TX operations?

@marvinroger
Copy link
Owner

Awesome, kudos @me-no-dev :)

@me-no-dev
Copy link

@eos1d3 you wait for ack only when you are sending a packet. RxTimeout deals with time since last incoming transmission (in cases where you want to close a connection if nothing has been received for some time)

@rmjjanssen1
Copy link

rmjjanssen1 commented Oct 29, 2017

Hi There,

Seem to have the same issue with my setup trying to connect to Adafruit IO. Connecting, disconnecting with error code 0 and 6. Can somebody summarise how to fix the issue? Do I need to apply fix to ESPAsyncTCP?

Thanks for helping out....

Using v1.0.0. ESP AsyncTCP library

R

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants