Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThroughSerial over RS485 repeated packets (v11) #195

Closed
ibantxo opened this issue Apr 9, 2018 · 16 comments
Closed

ThroughSerial over RS485 repeated packets (v11) #195

ibantxo opened this issue Apr 9, 2018 · 16 comments

Comments

@ibantxo
Copy link
Contributor

ibantxo commented Apr 9, 2018

Hi,

My slaves (id 33 and id 44) are sendig HEARTBEATS to a PJON gateway. In each heartbeat each slave sends its "millis()" time and the number of the calls to the "pjon.send" function. Slave 33 sends heartbeats each 500ms.

The gateway detects messages repeated.

Any suggestions to avoid the repeated info?

Many thanks in advance.

Iban

I am using sync ack:

SLAVE's config:
//#define INCLUDE_ASYNC_ACK true
#define TS_RESPONSE_TIME_OUT 100000
#include <PJON.h>
Serial.begin(57600); //57600 115200
//PJON
pjon_bus.strategy.set_serial(&Serial);
// Avoid default sync ack
pjon_bus.set_synchronous_acknowledge(true);
pjon_bus.set_asynchronous_acknowledge(false);
// Set enable pins
pjon_bus.strategy.set_enable_RS485_pin(2);
pjon_bus.set_receiver(pjon_receiver_function);
pjon_bus.set_crc_32(true);
pjon_bus.set_error(error_handler);
//pjon_bus.set_id(77);
pjon_bus.begin();

GATEWAY's config:
#define PJON_MAX_PACKETS 20 //default 5
#define TS_MAX_ATTEMPTS 20 //default
#define PJON_MAX_RECENT_PACKET_IDS 100000
#define PJON_INCLUDE_TS
#include <PJON.h>

//PJON
pjon_bus.strategy.set_serial(&Serial1);
// Avoid default sync ack
pjon_bus.set_synchronous_acknowledge(true);
pjon_bus.set_asynchronous_acknowledge(false);
// Set enable pins
pjon_bus.strategy.set_enable_RS485_pin(SET_ENABLE_RS485_PIN);
pjon_bus.set_receiver(pjon_receiver_function);

pjon_bus.set_crc_32(true);
pjon_bus.begin();

FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25509654 50894 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25510155 50895 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25510661 50896 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25511165 50897 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25511666 50898 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25512167 50899 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25512667 50900 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25513167 50901 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25513668 50902 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25514168 50903 | 10 0 0
FROM_GW_BETA FROM_PJON 44 HEARTBEAT 25516217 2551 | 0 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25514670 50904 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25514670 50904 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25514670 50904 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25515174 50905 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25515174 50905 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25515676 50906 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25515676 50906 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25516177 50907 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25516678 50908 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25517178 50909 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25517679 50910 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25518179 50911 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25518179 50911 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25518681 50912 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25519181 50913 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25519681 50914 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520683 50916 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520683 50916 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520683 50916 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25521186 50917 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520683 50916 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25521686 50918 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25521686 50918 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25522187 50919 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25522187 50919 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25520182 50915 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25522688 50920 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25522688 50920 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25522688 50920 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25523188 50921 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25523689 50922 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25523689 50922 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25524190 50923 | 10 0 0
FROM_GW_BETA FROM_PJON 44 HEARTBEAT 25526217 2552 | 0 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25524690 50924 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25525191 50925 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25525691 50926 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25526192 50927 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25526692 50928 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25527193 50929 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25527693 50930 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25528194 50931 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25528694 50932 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25529195 50933 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25529195 50933 | 10 0 0
FROM_GW_BETA FROM_PJON 33 HEARTBEAT 25529697 50934 | 10 0 0
`

@gioblu
Copy link
Owner

gioblu commented Apr 9, 2018

Ciao @ibantxo that could be provoked by a non-optimal timing setup. If in your application you need packet uniqueness and avoid duplications, include the packet id feature: https://github.com/gioblu/PJON/blob/master/documentation/configuration.md#extended-configuration

Here an example showing how to use the packet id feature:
https://github.com/gioblu/PJON/blob/master/examples/ARDUINO/Local/SoftwareBitBang/UsePacketId/Transmitter/Transmitter.ino

@gioblu
Copy link
Owner

gioblu commented Apr 9, 2018

@ibantxo this is the things I would try to reduce the amount of repeats:

  • TS_RESPONSE_TIME_OUT value could be too short on gateway, device 1 or 2
  • in gateway call bus.receive() on both instances without any duration see if better
  • in devices call bus.receive() without any duration see if better
  • or the contrary in devices try to pass higher duration to bus.receive(), see if better
  • Another problem could be that if the gateway is a real time operative system and is also printing the result, some of the repeats could be caused by the time lost by gateway printing the output or executing other processes.

@gioblu gioblu changed the title Repeated packets received (v11) ThroughSerial over RS485 repeated packets (v11) Apr 10, 2018
@gioblu
Copy link
Owner

gioblu commented May 8, 2018

Ciao @ibantxo how its going? Have you solved this issue?

@ibantxo
Copy link
Contributor Author

ibantxo commented May 9, 2018

Umm... I cannot include packet ide feature because memory usage... lol...
Packet id feature gets the messages reach the destination, but:

  1. sometimes too late
  2. sometimes not in the correct order.

I have been inproving the network impedance load. It is something difficult to do because my TTL to RS485 transceivers have fixed resistors. They are designed for "peer-to-peer" (2 peers) network configuration and not multipoint.

I have to improve much more the network impedance adjustaments.

I hope to tell you something soon! lol

Many Thanks!!!!

@gioblu gioblu added unsettled and removed question labels May 14, 2018
@ibantxo
Copy link
Contributor Author

ibantxo commented Aug 10, 2018

I am looking forward to testing ThroughAsyncSerial strategy!! lol
Many thanks to @sticilface, @gioblu and @everybody!!!

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 18, 2018

😳 😃 I have just tested with v11.1 (TSA) and it is working now with 4 nodes.
One of them is the "main" node who processes all the messages. The config is multimaster mode.
It is working better than with TS.
The 3 "secondary" nodes are sending one "heartbeat" each 500ms. No error, but when "main" node starts asking for information, then some errors reported by secondary nodes.
testing baudrate now: 57600

The normal operation is not sending 1 heartbeat each 500ms, but I am using that freq. to testing the solution.

I have made some modifications to the MAX485 to TTL device because the resistors of each node must be different in a 485 network: only the first node with pullup and pulldown resistors, first and last nodes with "load" resistor, other nodes nothing!

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 19, 2018

After all the night testing, every "secondary" node has sent about 50000 messages to "main" node.
Two of them about 200 messages with a connection_lost error and about 50 messages with a packets_buffer_full error. The third one only 6 times connection_lost error.
I see a lot of repeated packets at receiver.

I am moving to T=1s for testing again.

@gioblu
Copy link
Owner

gioblu commented Nov 19, 2018

ciao @ibantxo thank you very much for your feedback. I am also testing TS and TSA and I do experiment as you described higher performance while using TSA. In multi-master mode I do see occasional re-transmissions too. The packet's buffer full error you get demonstrates that for some reason the first 2 devices are colliding continuously until filling the buffer. It may help to set a higher PJON_RESPONSE_TIME_OUT or a higher TS_COLLISION_DELAY.

If the mode of operation is arbitrary sending from nodes at a certain frequency, it may have sense not to use the acknowledgement procedure and let the application go over occasional missing packets, would be nice to know the test results without using the acknowledgement.

Consider that because of the serial's medium access mode limitations (no way to effectively implement carrier-sense before transmission) TS and TSA are designed to be used in master-slave mode operating using the request-response procedure, that does not require the acknowledgement and practically avoids any possible collision.

Because of the limitations of serial stated above, In multi-master mode, TS and TSA implement the slotted ALOHA medium access method which maximum data throughput is only 36.8% of the available bandwidth. The use of the acknowledgement increases the chances of collision reducing even more the available bandwidth. So, TS and TSA in multi-master may be used without acknowledgement to implement a link similar to UDP, fast but without reception certainty. Using a fast baud-rate, containing the medium usage of each device and the number of devices should yield stable and acceptable performance, although in most cases it is probably more efficient and safer to use master-slave mode.

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 19, 2018

T=1s. Multi-master: 3 slaves speaking and one master listening (ack-ing). I am using this config at slaves and at master.

pjon_bus.set_synchronous_acknowledge(true);
pjon_bus.set_asynchronous_acknowledge(false);

About 50000 messages sent by each slave. 113, 92 and 42 connection_lost error. No other error type detected. Repeated packets received on "master" node.

I have moved to:

pjon_bus.set_synchronous_acknowledge(false);
pjon_bus.set_asynchronous_acknowledge(true);

for testing, but slaves grow memory a lot when adding #define PJON_INCLUDE_ASYNC_ACK true

I am testing now

pjon_bus.set_synchronous_acknowledge(false);
pjon_bus.set_asynchronous_acknowledge(false);

but nothing is receiving... umm... something wrong in the code. lol

@gioblu
Copy link
Owner

gioblu commented Nov 19, 2018

Ciao @ibantxo try setting a higher TSA_RS485_DELAY https://github.com/gioblu/PJON/blob/master/src/strategies/ThroughSerialAsync/Timing.h#L76

In some cases the packet is not received because the enable pin on transmitter's side must be set a little earlier.

@gioblu
Copy link
Owner

gioblu commented Nov 22, 2018

Ciao @ibantxo how the test went? Were you able to get it working without ack?

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 23, 2018

Yes, @gioblu. That test is working since one day ago. Packets are received, but I do not know how many problems (packet lost) are in this test. I see packets of each node (very fast). Each node sending 1 packet / 500ms.

I have to count at receiver the packets of each node to calculate how many (aprox) have been lost. Some work to do...

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 24, 2018

Hi! I have changed the code at receiver to count the number of lost packets.
By now... 3 nodes sending 1 heartbeat each 500ms and about 1500 messages sent by each one... 0*3 errors. None error. I will do this test for some hours.

Umm... I thinking about my configuration then... "many speakers":

  • some "nodes": they need to start speeking without a pooling wait-time (modbus is not possible)
  • one master (it is a gateway), who can ask to nodes in any time
  • the needed bandwidth is small. Almost all time the bus is free.

One suggestion to inprove the "multimaster-ack" communication:

  • gateway controlls everything: MASTER, but not doing a pooling.
  • gateway sends a "who is next?" message periodically. Very fast and frequently if no response.
  • if a slave wants to communicate to master, it does not send the whole message, only a small message to ask for permission to "get the bus" for a time
  • master allows the use of the bus to that slave sending a "silence please, except device_id" broadcast message to all the slaves.
    Maybe too much time for a realtime conversation... lol...

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 24, 2018

With no ack...

3 nodes sending 1 heartbeat each 500ms:
about 32000 messages sent by each node...
192, 333, 318 messages lost from each sender at receiver ("master").

3 nodes sending 1 heartbeat each 5s:
about 6300 messages sent by each node...
12, 13, 17 messages lost from each sender at receiver ("master").

@ibantxo
Copy link
Contributor Author

ibantxo commented Nov 25, 2018

Witch ACK again (async)...
3 nodes sending 1 heartbeat each 500ms:
about 25000 messages sent by each node...
161, 110, 173 messages with problems at receiver ("master"): connection_lost error

I have set #define TSA_TIME_IN 1000000 (1 second) at node number 1, and I have set that every node sends 1 heartbeat / 500ms. Umm... node number 1 sends the info because it is being received. With that value for TSA_TIME_IN, must it not wait for ever?

`#define PJON_INCLUDE_PACKET_ID true
#define PJON_INCLUDE_ASYNC_ACK true
#define TSA_INITIAL_DELAY 1000
#define TSA_COLLISION_DELAY 32
#define TSA_RESPONSE_TIME_OUT 10000
#define TSA_TIME_IN 1000000
#define TSA_READ_INTERVAL 100
#define TSA_BYTE_TIME_OUT 1000000
#define TSA_MAX_ATTEMPTS 20
#define TSA_BACK_OFF_DEGREE 4
#define TSA_RS485_DELAY 10
#define TSA_FLUSH_OFFSET 152

#define PJON_INCLUDE_TSA true
#include <PJON.h>`

@gioblu
Copy link
Owner

gioblu commented Nov 29, 2018

Ciao @ibantxo thank you for your precious feedback and excuse me for the late answer.
About the scheme you propose:

One suggestion to inprove the "multimaster-ack" communication:

gateway controlls everything: MASTER, but not doing a pooling.
gateway sends a "who is next?" message periodically. Very fast and frequently if no response.
if a slave wants to communicate to master, it does not send the whole message, only a small message >to ask for permission to "get the bus" for a time
master allows the use of the bus to that slave sending a "silence please, except device_id" broadcast >message to all the slaves.
Maybe too much time for a realtime conversation... lol...

Probably is more efficient for the master to roll-call all known nodes asking them if there is something new to transmit. Each slave could be configured to know that can transmit for a given amount of time, and transmit a "NO" if nothing has to be sent to spare bandwidth, all collisions should be practically excluded.

Another way to reduce the chances of a collision is to use a higher baudrate and set a longer TSA_COLLISION_DELAY.

About TSA_TIME_IN and the test of sending every 500ms, at which rate you receive the packets?

Thank you very much for your support.

@gioblu gioblu closed this as completed Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants