New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with big packets #26

Closed
TeeTrizZz opened this Issue Feb 21, 2016 · 14 comments

Comments

Projects
None yet
3 participants
@TeeTrizZz

TeeTrizZz commented Feb 21, 2016

I'm sorry for the long delay in my response. I hadn't had the time to test everything until this week. Fortunately, after a lot of testing, I can say it is very stable now when sending short messages. I haven't noticed any more problems so far. I added two more bytes to my messages (a total of 4 bytes now) to be more stable in general, though.

After the problems with small packets, however, I have now problems with big packets (e.g. 50 bytes or more) :-) The problem is not the message's size itself, though. There must be another bug in the receive function. The problem might be that the CRC could be correct although the message isn't.

1.) When PACKET_MAX_LENGTH (e.g. = 50) was lower than the actual size of the received package (e.g. = 60), the first strange thing happened. The sender tried to send the message over and over again but failed with 256 (FAIL). The receiver, however, received the message and instead of declining it, this->_receiver() was called with the data of the last correct message (e.g. the last short message). Long story short, with every FAIL at the sender, I got the same old message at the receiver.

2.) I have to send about 30 packets with a size of about 250 bytes each. As this didn't work very well, I added an id to the messages to see if the order was correct and if all packets were received correctly. I was very surprised when I checked the ids in my receiver function:

0, 0, 1, 2, 3, 4, 5, 5, 8, 9, 10, ...

So, in some cases I recevied the same package twice (while the sender failed with 256). In other cases, I didn't receive the package at all (while the sender reported ACK). It was definitely better after I changed if(!CRC) { to if(!CRC && (package_length > 4)) { in your library to ensure that the package might be correct.

My assumption is that there are cases where the loop breaks after one or two bytes (because package_length is 1 or 2 or something) but the CRC is still correct. Then it calls this->_receiver() with old data (data is never set back to null, is it?) and sends ACK. If the sender is still sending then it doesn't notice the ACK, fails with 256 and resends the package. Could this be the cause? Any ideas how to fix that? I have no idea why some messages are skipped, though. Could be the same reason: ACK when there shouldn't be an ACK.

I have no problem with the fact that big packets might cause a lot of trouble (several FAILs before ACK, interference, etc.) but the fact that the library seems to behave unexpected is the main problem in this case. It works if I send the data splitted into very small packets but that makes it slower. Any idea?

Thanks!

P.S.: As promised I will also reply to your e-mail in the next few days.

@gioblu gioblu added the bug label Feb 22, 2016

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 22, 2016

Hei! Thank you again for your support.
I should investigate further but I think you are right

I think a solution might be at line 409 https://github.com/gioblu/PJON/blob/master/PJON.cpp#L409

if(i == 1)
  if(data[i] > 3 && data[i] < PACKET_MAX_LENGTH)
    package_length = data[i];
  else return FAIL;

So every time a length is saved if less than 4 return FAIL.
If you can try this modification and let me know your results.

@domonetic

This comment has been minimized.

Contributor

domonetic commented Feb 25, 2016

when you make changes, please launch a new release

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 25, 2016

Hi domonetic. You mean this changes in particular of you are suggesting to make a new release for every changes made?

I any case I am working on this problem, but seems the code I proposed is a solution.
I would hear feedback also from Fabian to understand if my results are matching his.

@domonetic

This comment has been minimized.

Contributor

domonetic commented Feb 25, 2016

I follow the project because I am interested .
If there are major changes or bug fixes . Should be create a new release.
Thus Arduino knows to update the library.

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 25, 2016

You are absolutely right. WIll be done as soon as this problem will be cleared out.
Thank you for your support.

@TeeTrizZz wait for your feedback and thank you again for your great suggestions.

@domonetic

This comment has been minimized.

Contributor

domonetic commented Feb 25, 2016

Thank you for your sharing

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 27, 2016

So I tested over and over again, and I have obtained similar results. As described filtering away possible packets with a length from 0 to 3 (that in any case should not be considered) the problem seems solved. By my side I agree with @TeeTrizZz .

To give a fix to the second issue I added the CONTENT_TOO_LONG error and modified send function to catch the case and call _error with a dedicated error symbol and the length of the packet returning FAIL and so avoiding to start the sending execution with a too long packet.

I will now make a new release as @domonetic correctly suggested.

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 27, 2016

@TeeTrizZz

This comment has been minimized.

TeeTrizZz commented Feb 27, 2016

Thank you for the tests and the fix gioblu! I can't test it with large messages right now. I had a few more problems with short messages in the last few days, though. I used six Arduino Nanos and after starting them, the communication between the devices usually worked flawlessly for a while. But after a few seconds, minutes, or whatever, it broke. It felt like the communication was stuck somehow. I had to restart the Nanos to get it back working. As this was on a trade fair, I had no time to debug the system. I will do that on Monday. I assume that all these problems have the same reason.

I just wrote a litte console application to "simulate" your library and all of the messages our system currently uses. This tool basically checks what happens when an Arduino starts to read the data stream at an unknown time. It calculates the original CRC and compares it to the CRC of a message where the first byte, the first two bytes, the first three bytes, etc. are missing. It outputs all messages where the CRC was correct although one or more bytes were missing or are part of the next message (i.e. all messages which would lead to a faulty ACK). A faulty ACK also means that two messages are sent over the channel at the same time if the sender is still sending or is already sending the next message.

I will post the results later this day!

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 27, 2016

So, @TeeTrizZz if you are ok with it and you perceive this problem fixed I would close this issue and open anotherone for the "freezing after a while" problem. I am sorry if this affected your showcase, but I am in any case sure we will identify and fix the problem soon. Thank you again for your support and I am so curious to see your console application.

@TeeTrizZz TeeTrizZz closed this Feb 27, 2016

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 27, 2016

Ok, I started a test with only two devices on the bus, I think it will not crash, and this will be a good indication of where the bug could be located in the codebase.

You want me to create the issue or you are doing so (only want to avoid duplication :) )?
Will let you know results after a 12 hours test.

Thank you again!

@gioblu

This comment has been minimized.

Owner

gioblu commented Feb 27, 2016

@TeeTrizZz if this clue can help you: another user, Mauro, confirms that a freezed device doesn't transmit but is able to receive a message and execute the _receive() function.

@TeeTrizZz

This comment has been minimized.

TeeTrizZz commented Feb 27, 2016

Unfortunately - or fortunately :-) -, I have no relevant results yet. Only two of the 3900 possible messages might be problematic every now and then. While this is actually pretty good, the CRC respectively the checksum could be a bit more reliable, though. Maybe it should take the position of a byte into account or something.

Anyway, I guess something else was the reason for the frozen communication. A buffer overflow could've been one of the problems. I have changed MAX_PACKETS but I'm not sure if I did this for every device. I also forgot to handle the MEMORY_FULL error. However, this shouldn't break everything and so this can't be the only reason, though. I'm going to think about a test program which runs directly on the Arduinos. I will test this on Monday when our six devices are back from the trade fair. I will open an issue as soon as I know more!

P.S.: I like the CONTENT_TOO_LONG error. This is pretty useful!

@domonetic

This comment has been minimized.

Contributor

domonetic commented Feb 28, 2016

Dontt use dinámic memory as getmem() in embedded devices. This not free or not can fit in next call

gioblu added a commit to gioblu/PJON_ASK that referenced this issue Feb 29, 2016

gioblu pushed a commit that referenced this issue Apr 1, 2017

Merge pull request #26 from gioblu/master
Merged from bioblu master at v7.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment