-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempting MQTT connection...failed, rc=-2 try again in 5 seconds #13
Comments
Thanks for the detailed description, it makes identifying potential
problems a lot easier.
First I guess is that if you are getting interrupt errors on every
attempted packet, interrupts are not working. You should be able to confirm
this by just
using teh standard ncurses example instead of ncursesInt.
The most likely causes of interrupts not working is of course a: the pin
not being connected b: the interrupt pin not specified correctly in the
example c: wiringPi not installed (the RF24 lib uses wiringPi for
interrupt handling) wiringPi is not installed by default on RPI OS light
In the mqtt client examples, you will notice the line *char clientID[] =
{"arduinoClient "};* which specifies a unique identifier for the
client.
If two devices are using the same ID they will be constantly disconnected
from MQTT.
The BCM2835 driver was the previous default for RF24, so you may want to
try with that as well. WIll need to recompile all libraries and restart
SPI/reboot after changing driver.
If you try the following:
1. Stop your RF24Gateway master
2. Delete the dhcplist.txt file from the RF24Gateway example directory you
are using as master.
3. Restart your Arduino node(s) then the gateway
4. Do they show up in the address list relatively quickly?
5. Can you ping them from the gateway?
If they don't show up in the list, then I would suggest troubleshooting
basic functionality with RF24 core examples.
If they show up in the list, but can't connect to MQTT, can you ping them?
If not able to ping:
I would run an RF24Mesh example on the Arduino and see if the Gateway (no
interrupt version) shows user packets being received.
If it gets RF24Mesh user packets, then RF24 and RF24Mesh are working fine,
it might be a MQTT or Gateway issue.
If no user packets, again troubleshoot/test with RF24 core examples.
hope this helps, let me know,
TMRh20
…On Wed, Aug 26, 2020 at 11:00 AM SimonMerrett ***@***.***> wrote:
Hi, I realise I'm clutching at straws here but I'm starting to run out of
options. Before the recent releases in July and August I set up a link from
an Arduino Nano to mqtt2zigbee running on a Pi 4 with Raspbian desktop. I
recently tried to port this over to an Arduino Nano and a Pi 3 A+ on
Raspbian light. I used the most recent mqtt2zigbee, mosquitto and couldn't
get things working with the ncurses, gatewayNode or ncursesInt to start
with. After updating the pubsubclient and RF24 etc libraries for the
Arduino, I can get intermittent connections but I get long periods of Attempting
MQTT connection...failed, rc=-2 try again in 5 seconds.
I have tried going back to the original setup with the Pi 4 and the Nano
but the Nano is now working from the latest RF24 etc version so it isn't
exactly the same (same intermittent connection dropping and failure to
reconnect). I have interrupt errors on the ncursesInt example for every
attempted packet and this is using the same hardware connections that were
working when testing my last issue (ie no changes). I did go through the
example and explicitly made sure interrupts were turned back on with
gw.interrupts(1) rather than gw.interrupts() - like I said, running out of
things to check/tweak.
I notice that the installer now gives options for Pi SPI drivers which
weren't there before. I selected SPIdev for compatibility on the A+ but I
don't know if that's what would have been installed by default on the
pre-Jul update version on the Pi 4.
I acknowledge that this could be a problem with mosquitto or pubsubclient
but given the raft of recent changes to RF24, I thought it would at least
be worth asking if there are any thoughts on this. Even confirming that the
process followed in the guide (
https://tmrh20.blogspot.com/2019/05/automationiot-with-nrf24l01-and-mqtt.html)
still results in reliable performance would be really helpful. I set up
mosquitto local service following
https://randomnerdtutorials.com/how-to-install-mosquitto-broker-on-raspberry-pi/
.
Not sure what else to include/mention at this stage because it is such a
tricky issue to localise. Please shout for more detail that might be
helpful.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT5KHEN45BAMMCW6JWNIBTSCU5T3ANCNFSM4QMBQVIQ>
.
|
Brilliant pointers - thanks. I will probably start with a fresh install of RaspPi OS with desktop on the A+ as it will make setup and reinstalling everything faster. I'm definitely going to try the original SPI drivers too. Then I can run through the checklist. I have been deleting the dhcplist.txt files every now and then. Sometimes the node is quickly detected, other times not. Oh yes, I have also turned the Pi examples and Arduino code to radio.setPALevel(RF24_PA_MIN) as the antennas are only about a metre apart for testing. |
for an example of what happens to ping as the node stops being connected. BTW, the decent ping times in the second run are still while the Arduino Zero is failing mqtt connection. While I have been typing this message it has reconnected! Just so sporadic. |
Hmm, I'm running 15 nodes right now, 9 mqtt clients publishing every 1 to 3
seconds, 2 clients running LEDs, 1 http client, 1 server, 2 RPi pinging the
master, so not sure why it would be so spotty with 1 node.
Have you tried a different radio channel? Usually 1 and 50 are pretty open
in my area.
Master, channel 50:
uint8_t nodeID=0;
gw.begin(nodeID,50);
Nodes, channel:
mesh.begin(50);
…On Wed, Aug 26, 2020 at 11:35 AM SimonMerrett ***@***.***> wrote:
***@***.***:~ $ ping 10.10.3.5
PING 10.10.3.5 (10.10.3.5) 56(84) bytes of data.
64 bytes from 10.10.3.5: icmp_seq=1 ttl=64 time=33.3 ms
64 bytes from 10.10.3.5: icmp_seq=2 ttl=64 time=30.1 ms
64 bytes from 10.10.3.5: icmp_seq=3 ttl=64 time=44.4 ms
^C
--- 10.10.3.5 ping statistics ---
36 packets transmitted, 3 received, 91.6667% packet loss, time 332ms
rtt min/avg/max/mdev = 30.075/35.936/44.395/6.131 ms
***@***.***:~ $ ping 10.10.3.5
PING 10.10.3.5 (10.10.3.5) 56(84) bytes of data.
64 bytes from 10.10.3.5: icmp_seq=9 ttl=64 time=1022 ms
64 bytes from 10.10.3.5: icmp_seq=42 ttl=64 time=29.7 ms
64 bytes from 10.10.3.5: icmp_seq=44 ttl=64 time=28.9 ms
64 bytes from 10.10.3.5: icmp_seq=46 ttl=64 time=23.3 ms
64 bytes from 10.10.3.5: icmp_seq=52 ttl=64 time=24.1 ms
64 bytes from 10.10.3.5: icmp_seq=54 ttl=64 time=14.0 ms
64 bytes from 10.10.3.5: icmp_seq=56 ttl=64 time=20.1 ms
64 bytes from 10.10.3.5: icmp_seq=57 ttl=64 time=36.2 ms
64 bytes from 10.10.3.5: icmp_seq=58 ttl=64 time=34.7 ms
64 bytes from 10.10.3.5: icmp_seq=60 ttl=64 time=24.4 ms
64 bytes from 10.10.3.5: icmp_seq=61 ttl=64 time=25.5 ms
64 bytes from 10.10.3.5: icmp_seq=62 ttl=64 time=22.2 ms
64 bytes from 10.10.3.5: icmp_seq=63 ttl=64 time=20.5 ms
64 bytes from 10.10.3.5: icmp_seq=64 ttl=64 time=22.1 ms
64 bytes from 10.10.3.5: icmp_seq=66 ttl=64 time=29.3 ms
64 bytes from 10.10.3.5: icmp_seq=68 ttl=64 time=29.7 ms
64 bytes from 10.10.3.5: icmp_seq=71 ttl=64 time=15.8 ms
^C
--- 10.10.3.5 ping statistics ---
71 packets transmitted, 17 received, 76.0563% packet loss, time 644ms
rtt min/avg/max/mdev = 14.027/83.679/1022.149/234.688 ms
for an example of what happens to ping as the node stops being connected.
BTW, the decent ping times in the second run are still while the Arduino
Zero is failing mqtt connection. While I have been typing this message it
has reconnected! Just so sporadic.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT5KHEAGB47DACSXGKGXODSCVBU5ANCNFSM4QMBQVIQ>
.
|
Just a quick update. I changed to channel 50 without doing any RF survey and moved the Arduino Zero node away from the desktop with laptop, raspi, mobile phone etc. There seemed to be a significant improvement in "uptime" but I don't have the serial monitor available away from the desktop and am only able to monitor with |
Couple things
a: Make sure to update all calls to mesh.begin(50); including lower down in
the gateway master example else it may restart on the wrong channel
b: regarding teh failure handling part where the mesh is restarted, I
believe there is just a problem with the milis() functionality on RPi where
it is not consistent and/or jumps around in value causing false detections
of errors
c: the aforementioned issue appears to be more prevalent at 250Kbps
d: I am/will be looking at ways to remedy the situation.
…On Thu, Aug 27, 2020 at 3:26 PM SimonMerrett ***@***.***> wrote:
Just a quick update. I changed to channel 50 without doing any RF survey
and moved the Arduino Zero node away from the desktop with laptop, raspi,
mobile phone etc. There seemed to be a significant improvement in "uptime"
but I don't have the serial monitor available away from the desktop and am
only able to monitor with RF24Gateway_ncurses packet flow. There was a
point where the gateway wasn't receiving anything and a power cycle on the
Zero didn't help - it took ctrl+c and restarting RF24Gateway_ncurses but
it did get going again. I will try and do better testing soon. I'm
considering going to RF24_250KBPS to see if that helps.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT5KHCLZ4QH5AMZU3TPPXLSC3FPLANCNFSM4QMBQVIQ>
.
|
Brilliant tips again - I will check if I'm restarting on the correct channel. |
so I changed the failure detection section in Edit I rechecked 'failLog.txt' and there was a and I also wondered about the commented out delay here https://github.com/nRF24/RF24Mesh/blob/fce52350416070d3fae40a866119010fad631cc8/RF24Mesh.cpp#L17 as I'm using SPIDEV. |
So I left it again and after about two hours the connection dropped again. I left it 40 mins and there was no reconnection. I have create a gist with my node code here, in case you can see something silly I have done in there. After deleting the |
Not sure how this works without causing problems, because radio.begin() is not called until mesh.begin(); It is difficult to identify issues just looking at code, so I would suggest running the default mqtt example on another device if possible, with the only changes made being ones required to make it operational, pin #s, ip address RF24Network_config.h:
Arduino Sketch, AVR devices need printf enabled for debug output: You said you are using a Nano? In the debug output, there should be a bunch of these during renwal:
...
That is mainly what you should see, along with some type 198 messages (lookups) and partial printouts of the IP data. |
Oof, really struggling here. I was using an arduino Zero but I can't see a way to enable *Tried the libprintf to no avail |
How about adding at top of sketch?
#define printf Serial.printf
… On Aug 28, 2020, at 10:52 AM, SimonMerrett ***@***.***> wrote:
Oof, really struggling here. I was using an arduino Zero but I can't see a way to enable printf() for SAMD series* so I switched across to a Nano and I can't even get as far as I was with the Zero! The debug output is in the gist but it isn't representative of the issues I was observing with the Zero.
Tried the libprintf to no avail
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Well I was just about to try that when this happened! I can't even copy/paste it because Win10 clipboard is refusing to capture it properly. So do I need to change the buffer size in |
No that suggests problems with your code. (memory issues: buffer overruns
etc)
Again, test with a known working example subscribed to the same topic...
As a developer, I really don't have time to provide technical support,
debugging other peoples code etc, it just doesn't make sense/not possible.
Users need to test with known working code if finding issues.
…On Fri, Aug 28, 2020 at 12:10 PM SimonMerrett ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/26767525/91600906-a30ea680-e960-11ea-903d-b137596f357a.png>
Well I was just about to try that when this happened! I can't even
copy/paste it because Win10 clipboard is refusing to capture it properly.
So do I need to change the buffer size in #define MAX_PAYLOAD_SIZE 144 in
RF24NetworkConfig.h?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT5KHCGNWPGF77FRQBSDWLSC7XIRANCNFSM4QMBQVIQ>
.
|
Sorry - as I said at the beginning I was clutching at straws and I'm going to use the example and build back up. I agree it's not your job to debug my code - sorry and thanks for everything you've done. |
No need to apologise, just explaining my point of view. Development and
ongoing maintenance takes a huge hit due to all the requests for
programming assistance etc and many have nothing to do with RF24, so it's
just a matter of limiting things. Like I said, if you can identify issues
using the examples etc, please open another issue.
…On Fri, Aug 28, 2020 at 12:51 PM SimonMerrett ***@***.***> wrote:
Closed #13 <#13>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT5KHBBRWVDBHGUIMTUQBTSC74CVANCNFSM4QMBQVIQ>
.
|
I just noticed the removal of |
Excellent. Updates are a dual edged sword sometimes, fixing one issue and causing another. |
More tests over the last couple of days suggest that I was also suffering from a poor rf link as a more favourable position for rf propagation and reception has also improved stability (still testing with RF24_PA_MIN - moved after mesh.begin()). |
I'm not sure about what you mean by mirroring that on the gateway. How would the gateway know when to adjust the power level? I keep my nodes at the max pa level. One of my nodes dropped into a state where it could renew its address and verify connectivity, but not connect to mqtt. A quick reset brought it back up, so there could be some issue either in RF24Ethernet or mqtt. I may try a few things with the default examples i'm using as time allows, maybe a different mqtt client too just to be sure. |
The gateway absolutely wouldn't know the power level but there may be a scheme which allows you to ensure that at some point all combinations of power are tried at both ends. The wrong but easiest analogy is that a 12hr clock that is slow will eventually be in sync with an accurate clock because they're cycling at different speeds. I really don't know if it's possible but if it were, I imagine it would employ a similar scheme. If the gateway were expecting a known node to always be connected, it could ramp through power levels at rate A for a window of time since the known node was last connected. Meanwhile the known node could ramp through power levels at rate B as it tried to reconnect. Anyway I'm dragging this off topic. |
Interesting, trying not to get too far off topic, that would probably require some internal changes to RF24Gateway unless it was just done randomly or based off RF24Mesh/Network level messages. |
Still trying to find time to fully investigate this issue. I've been testing with the https://github.com/256dpi/arduino-mqtt library and just filed an issue to get it working with IP address, so you need to download the library from github, not the library manager until a release is made to have it work out of the box. Its working nicely, so should help rule out/in issues between RF24Ethernet and pubsub library. |
Thanks. I tried implementing an oled display and buttons on my samd21-based device but the various calls to mqtt etc were taking so long it killed the responsiveness to the user. So I resorted to a samd21 attached to the Pi with an ethernet spi adapter and a 5cm ethernet cable between them, and basic rf24 messages from the gui samd21 and the rpi samd21. The project was terribly overrun so that was the easiest avenue avaible. But I do want to see where the delays may have originated. BTW, for integration with e.g. zigbee2mqtt where the messages can get quite long, being able to whack the pubsub payload buffer size up to 2048 bytes on the samd21 without fear of piling in on memory is very nice. |
I've been using the alternate MQTT library for the past while on all of my nodes and have not seen any downtime on them. This of course indicates a problem between RF24Ethernet and the pubsub lib, although I'm not certain if it is the combination of them or just the mqtt library. |
Thanks for testing it out with known-working radios. I will take a look at the alternate library. |
Hi, I realise I'm clutching at straws here but I'm starting to run out of options. Before the recent releases in July and August I set up a link from an Arduino Nano to mqtt2zigbee running on a Pi 4 with Raspbian desktop. I recently tried to port this over to an Arduino Nano and a Pi 3 A+ on Raspbian light. I used the most recent mqtt2zigbee, mosquitto and couldn't get things working with the ncurses, gatewayNode or ncursesInt to start with. After updating the pubsubclient and RF24 etc libraries for the Arduino, I can get intermittent connections but I get long periods of
Attempting MQTT connection...failed, rc=-2 try again in 5 seconds
.I have tried going back to the original setup with the Pi 4 and the Nano but the Nano is now working from the latest RF24 etc version so it isn't exactly the same (same intermittent connection dropping and failure to reconnect). I have interrupt errors on the ncursesInt example for every attempted packet and this is using the same hardware connections that were working when testing my last issue (ie no changes). I did go through the example and explicitly made sure interrupts were turned back on with gw.interrupts(1) rather than gw.interrupts() - like I said, running out of things to check/tweak.
I notice that the installer now gives options for Pi SPI drivers which weren't there before. I selected SPIdev for compatibility on the A+ but I don't know if that's what would have been installed by default on the pre-Jul update version on the Pi 4.
I acknowledge that this could be a problem with mosquitto or pubsubclient but given the raft of recent changes to RF24, I thought it would at least be worth asking if there are any thoughts on this. Even confirming that the process followed in the guide (https://tmrh20.blogspot.com/2019/05/automationiot-with-nrf24l01-and-mqtt.html) still results in reliable performance would be really helpful. I set up mosquitto local service following https://randomnerdtutorials.com/how-to-install-mosquitto-broker-on-raspberry-pi/ .
Not sure what else to include/mention at this stage because it is such a tricky issue to localise. Please shout for more detail that might be helpful.
The text was updated successfully, but these errors were encountered: