Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving wrong data, etc. #24

Closed
TeeTrizZz opened this issue Jan 21, 2016 · 15 comments
Closed

Receiving wrong data, etc. #24

TeeTrizZz opened this issue Jan 21, 2016 · 15 comments
Labels

Comments

@TeeTrizZz
Copy link

Hi! I have a lot of issues right now. My head is a bit spinning as I was working on this the whole day. Maybe I miss something important. Anyway, I thought I should ask you before I continue with debugging. Hope you can help me.

My setup consists of two Unos and one Nano. When I start my software (a small C# tool) it connects to one of the two Unos ("master", e.g. device id 1) via USB. Then, the Uno sends a "ping" message to the devices 2, 3, 4, 5, and 6 (= "slaves"). The two other devices should then respond with a "I'm alive" message while the non-existing devices 4, 5, and 6 should raise a CONNECTION_LOST error. This usually works perfectly fine but not always.

In a few cases the Nano seems to respond, but the master device doesn't care (receiver function is not called). I started to debug this case. The Nano definitely gets an ACK and removes the package from the queue. I'm not sure if the ACK was sent by the master device or by something else, though.

When I fiddled around to find the cause, "device 4" suddenly stopped raising a CONNECTION_LOST error. I found out that my second Uno was responding to the ping message with ACK although its device id is 3. I removed the Nano to simplify the whole system and added more Serial.print() stuff in your code to see what is being sent and what is being received.

Master sends (this->send_byte is called with these values):

3-4-3-4 // ping for device 3
4-4-3-3 // ping for device 4

Slave receives (this->receive_byte() returns these values):

3-3-0-0 // incorrect!
3-4-0-0 // incorrect!
3-4-0-0 // incorrect!
3-4-3-4 // correct!

This is what happens with the original PJON.cpp. As I said in my last issue, I need to include the sender id. Therefore, I have changed a few parts of your code so the sender id is always included as the first byte of the content. In this case, the messages look like this:

Master sends (this->send_byte is called with these values):

3-5-1-3-4 // ping for device 3
4-5-1-3-3 // ping for device 4

Slave receives (this->receive_byte() returns these values):

3-5-1-3-4 // correct!
3-3-0-0-4 // incorrect!

I have no idea what's happening. Maybe I changed something, I shouldn't have changed, something is interfering, the timings are incorrect, or something is being converted wrongly. I hope my explanations are clear. If you need more information feel free to ask. Any help is appreciated!

@gioblu
Copy link
Owner

gioblu commented Jan 21, 2016

Ciao. I would suggest you this procedure:

  • Start with master codebase.
  • Try NetworkAnalysis and ErrorTest examples with your devices in every combination (one to one)

If you can share here your results and I will help you to fix.
I think it seems a really high interference amount scenario.

Can you please document precisely which devices and microcontrollers you are using?

Please read also:
https://github.com/gioblu/PJON/wiki/Troubleshooting

Speaking about the sender id inclusion I would suggest you to include it on higher level in the packet's content. So you will be able to pull the new versions and so be updated following library's development without great deal and modifications.

@TeeTrizZz
Copy link
Author

Thanks! I'll run those examples tomorrow. Interference might be possible but the received packets are always the same: 3-5-... is always transmitted correctly, 4-5-... is always transmitted incorrectly. So at least it is not some kind of random interference.

Currently, I have two Arduino Genuino UNO (ATmega328) and one Arduino Nano (ATmega328). In the final configuration I will only use Nanos (about six of them).

I have forked your project and added the sender id inclusion as an option with a dedicated receiver function. This allows me to easily pull and merge new versions. You are right, though. I could/should move the code to a higher level probably.

Thanks again! I'll get back to you tomorrow.

@gioblu
Copy link
Owner

gioblu commented Jan 21, 2016

Good. So following your report you should test with NetworkAnalysis transmitting the content usually is transmitted incorrectly. I will also give it a try in an hour. Will give you feedback using the same devices you have in your setup.

@gioblu
Copy link
Owner

gioblu commented Jan 21, 2016

Ciao again :).

I tested the actual master with a UNO and a Nano with NetworkAnalysis, and communication is stable. After this I modified NetworkAnalysis transmitter's sketch to roll a byte value from 1 to 255 and back again to see if all bytes are correctly received, I did this adding to networkAnalysis receiver's sketch the setter and the receiver function with Serial.print to see what was received.

It works smoothly without any missing byte or mistake detected by CRC. I at last tried to send a packet with inside only 4, 5 as you where saying before, but the sending is executed smoothly also in this case.

Considering the actual common setup we are testing I think you could have missed one of the issues described in the troubleshooting page, or that some modifications to the code are affecting sync of your test devices.

Consider that every slight change in the receiving and transmitting code layer of PJON will variate its execution time, this means that probably timing values of all supported devices will have to be updated accordingly (and is a work of a day, much zen approach and all devices in hands to test cross-platform compatibility).

All tests were made on pin 12 on both devices, The two test devices are grounded by usb cable and common laptop as power source. 10cm male/male chinese insulated jumper used as communication medium.

@TeeTrizZz
Copy link
Author

Ciao! Thank you for the suggestions! I'm back at the office and ran the tests using two Arduino Uno (instead of one Uno and Nano). Besides that, I use the same setup you've used (pin 12, common ground by USB and common laptop, 10 cm jumper cable).

Now, the good news is: I have encountered no error. Even with my customized PJON library (and a lot of Serial.println() in the PJON.cpp) I get this:

// ...
Rcvd Data: 44-7-100-100-100-... // just my debugging stuff
Rcvd Data: 44-7-101-101-101-...
Rcvd Data: 44-7-102-102-102-...
Rcvd Data: 44-7-103-103-103-...
Rcvd Data: 44-7-104-104-104-...
Rcvd Data: 44-7-105-105-105-...
Rcvd Data: 44-7-106-106-106-...
Rcvd Data: 44-7-107-107-107-...
Rcvd Data: 44-7-108-108-108-...
Rcvd Data: 44-7-109-109-109-...

Undetected errors: 0.00 Transmission total (kB): 23.46
BER or Bit Error Rate expressed in b/s: 0.000000

So, the problem is not your library (I thought so :-)) and it is not the hardware. There has to be something wrong in my own Arduino program. Probably a conversion error or something? But unfortunately, even if I get this to work, this doesn't mean that all problems are solved. I assume that this particular error happens because I changed my program for debugging purposes (so I don't have to use my C# tool everytime). We'll see. First things first. :-)

@TeeTrizZz
Copy link
Author

So, after a few more hours of debugging, I found several potential issues in my code. One was a delay(1); in my main loop. I used it because of this example on the Arduino website. I guess this is not too much of a problem if MAX_ATTEMPTS is high enough. 1ms + other stuff like Serial.readStringUntil() and digitalRead() can add up to a critical amount of time, though. I will try to rearange stuff so network.receive() is called more often.

The problematic code, however, was this conversion from uint8_t to char:

char msg[1];
msg[0] = (char) data;
return network.send(rcv, msg, 1);

While this works with higher values (or when receiver id = 3), it does not when data = 3 and receiver id = 4. As I said, I compared what was being sent with what was being received. I didn't notice any problems, though. In your send_string function send_byte definitely was called with 3-5-1-3-4 and 4-5-1-3-3. Well, at least that's what it looked like...

Anyway, I changed my conversion code to:

char msg[1] = {data};
return network.send(rcv, msg, 1);

This solves a few problems. I'm probably going to check my code at the weekend to see if there are more problems. Thanks for your help so far!

@gioblu
Copy link
Owner

gioblu commented Jan 22, 2016

So for sure I should add in troubleshooting the check to conversion and type of the content you are triyng to send, I also had a lot of bugs because of this, so, thank you to have underlined this issue and potentially saved time to the future users.

@gioblu gioblu closed this as completed Jan 22, 2016
@TeeTrizZz
Copy link
Author

Unfortunately, it's still bugged. I have fixed several other problems and reverted my changes to your code (sender id inclusion is now on a higher level). But even with the original library I have problems sending a message from a device with id 1 to a device with id 3 or 4. This is strange enough but the more ridiculous thing is this:

Setup 1:

  • two devices (Arduino UNO), one with id 1, one with id 3
  • device 1 sends a message to device 2, 3, 4, 5, and 6 (one after the other in a Serial.available() loop with network.update() in between)
  • after a few seconds the CONNECTION LOST error is raised for the packets 2, 5, and 6.

=> packet 3 and packet 4 arrived, although there is no device with id 4 - wrong packet content, as described!

Setup 2:

  • two devices (Arduino UNO), one with id 1, one with id 4
  • device 1 sends a message to device 2, 3, 4, 5, and 6 (same as above)
  • after a few seconds the CONNECTION LOST error is raised for the packets 2, 3, 5, and 6.

=> packet 3 is lost, packet 4 arrived - everything is alright!

So, although the setup is the same, something strange happens when the id of the second device is changed. It works with every other id (at least with 2 and 5) but 3. That doesn't make any sense, does it?

One thing I've noticed, however, there seems to be a problem when using Serial.read*()/Serial.print() and sending (or receiving?) messages. I had a bug where I accidentally sent one message per loop. This affected my serial connection, too. I received a lot of garbage in my C# program and in my Arduino program instead of the expected serial data. And I can only use 9600 as the baud rate - otherwise the transmitted data is garbage, too. Maybe the reason for all this strange stuff is the connection to C#...?!

I will check that tomorrow and see if I can come up with a very simple broken program (like the ErrorTest with more serial stuff, more message loops, or something). I'm running out of ideas...

@TeeTrizZz
Copy link
Author

(Looks like I have to change the baud rate in Windows' device manager, too. Can't check that right now but the baud rate can't be the reason anyway. One thing I should check tomorrow is if the content of the packets matter for the two described setups.)

@TeeTrizZz
Copy link
Author

Finally, I found a very very simple test case. Please try to run these two programs on your Arduinos.

TestCase.zip

You should get a "Lost Connection to 2" message (as there is no device 2, of course). Then change the message in line 16 of Transmitter.ino to 1, 3 instead of 1, 4 and run the program again. I don't get the "Lost Connection" message anymore in this case (and your library will receive a lot of NAKs!).

Could you please check if this is the same for you? If so, something in your library might be broken.

@gioblu
Copy link
Owner

gioblu commented Jan 27, 2016

Ciao Tetrizzz Thank you so much, and I am sorry if I closed this issue, when was not still closed by your side. I will as soon as possible try your test case and give you feedback.

@gioblu gioblu reopened this Jan 28, 2016
@gioblu
Copy link
Owner

gioblu commented Jan 28, 2016

HI Tetrizzz. You found a bug 👍 and I want to really say you thank you for your support to the library.

Facts:

  • Issue appears only with short messages, with low content byte values and low values of ID
  • Issue stops if I change the initial value of the CRC from 0 to B10101010 (in send_string and receive), seems a fix, but could have moved the problem on a different range of values. I suspect that this CRC fix is fitting particulary only to this test case scenario.

I suspect this happens because a byte that is in the middle of a packet or its CRC is detected by receiver (that is starting to read in a unkown time), as its own ID (3), after this because of packet's content length near zero value, there is a high chance of reading a really short erroneous length, the rest of the stream is detected as part of the packet and saved, CRC is calculated and the packet content is considered, correctly, as damaged. With a really high length receive() function would likely end with a FAIL return not causing the problem.

This weekend I will develop a strategy I will post here to prove my ipothesys and to solve this issue.
Thank you so much again.

@gioblu gioblu added bug and removed question labels Jan 28, 2016
@gioblu
Copy link
Owner

gioblu commented Feb 1, 2016

You found an enormous bug O_O. Thank you 👍 .
FAIL that values 0x100 or 256 is saved in data that is an array of uint8_t (that is wrong -_-). For this reason the condition data[i] == FAIL was never true. For this reason every incoming byte resulted FAIL are saved as 0 instead of returning FAIL.

Will write here tomorrow a sketch code with a possible solution.

gioblu added a commit that referenced this issue Feb 2, 2016
@gioblu
Copy link
Owner

gioblu commented Feb 2, 2016

I finally completely uncovered this problem, and I am really ashamed :)

Here the solution:
2548e3c

int PJON::receive() {
  int state;
  int package_length = PACKET_MAX_LENGTH;
  uint8_t CRC = 0;

  for(uint8_t i = 0; i < package_length; i++) {
    data[i] = state = this->receive_byte();

    if(state == FAIL) return FAIL;

    if(i == 0 && data[i] != _device_id && data[i] != BROADCAST)
      return BUSY;

    if(i == 1)
      if(data[i] > 0 && data[i] < PACKET_MAX_LENGTH)
        package_length = data[i];
      else return FAIL;

    CRC ^= data[i];
  }

This bug appears because of two different mistakes I did in the receive() function.
Firstly I was saving this->receive_byte() in data, that is a uint8_t array, loosing the second part of FAIL (or 256) that comes as an integer, with a result of 0. For this reason data[i] == FAIL was never true. In addition to this really basic mistake there was an off by one in the for() domain i <= package_length, These two mistakes were affecting the functionality in a chained way:

After some tests I understood that this unwanted byte at the end was retuning always 0 as llogically I was expecting, so breaking the functionality if the first bug was removed. Thanks to the fact that the function was never returning FAIL this unwanted byte was not affecting communication with both mistakes present.

The master version is now fixed. You can simply pull and test it.

If you can, give me a feedback on your results so we can finally close this issue :).
Thanks again for your support.

@gioblu gioblu closed this as completed Feb 4, 2016
@TeeTrizZz
Copy link
Author

Thank you very much! I will try this today and answer your email!

gioblu pushed a commit that referenced this issue Apr 1, 2017
Merged from gioblu master
gioblu pushed a commit that referenced this issue Jul 17, 2018
Merged from gioblu master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants