Number of received bytes differ from message length (SmplMsgConnection::receiveMsg). #25

liborw · 2013-07-25T06:41:39Z

When testing the motoman driver, I got this error:

$ rosrun fs100 robot_state
[ INFO] [1374677672.437211315]: Added message handler for message type: 0
[ INFO] [1374677672.440014448]: Robot state connecting to IP address: '192.168.0.101:50241'
[ WARN] [1374677672.440910442]: Unable to find user-specified joint names in 'controller_joint_names'
[ WARN] [1374677672.441766703]: Unable to find URDF joint names in 'robot_description'
[ INFO] [1374677672.441864812]: Using standard 6-DOF joint names: [joint_1, joint_2, joint_3, joint_4, joint_5, joint_6]
[ INFO] [1374677672.442550563]: Connected to server
[ INFO] [1374677672.442607446]: Initializing message manager with default comms fault handler
[ INFO] [1374677672.442657083]: Default communications fault handler successfully initialized
[ INFO] [1374677672.442685145]: Initializing message manager
[ INFO] [1374677672.442726762]: Added message handler for message type: 1
[ INFO] [1374677672.446381376]: Added message handler for message type: 15
[ INFO] [1374677672.447653822]: Entering message manager spin loop
[ERROR] [1374677672.495079875]: Get unload pointer failed, buffer size: 3, smaller than byte size: 4
[ERROR] [1374677672.495149280]: Unload pointer returned NULL
[ERROR] [1374677672.495206461]: Failed to unload message joint: 5 from data[3]
[ERROR] [1374677672.495241827]: Failed to unload joint feedback positions
[ERROR] [1374677672.495278751]: Failed to unload joint feedback message data
[ERROR] [1374677672.495314845]: Failed to initialize joint feedback message
[ERROR] [1374677672.495351389]: Failed to convert SimpleMessage
[ WARN] [1374677672.495409474]: Recieved zero bytes: 0
[ERROR] [1374677672.495445331]: Failed to initialize message
[ERROR] [1374677672.495485843]: Failed to receive incoming message
[ WARN] [1374677672.495522284]: Send failure, no callback support
^C[ INFO] [1374677674.162045284]: Connection failed, attempting reconnect
[ERROR] [1374677674.162110342]: Failed to connect to server, rc: -1, errno: 106
[ WARN] [1374677674.162132291]: Not connected, bytes not sent
[ERROR] [1374677674.162152122]: Failed to receive message length
[ERROR] [1374677674.162164475]: Failed to receive incoming message
[ WARN] [1374677674.162174342]: Send failure, no callback support

When started with DEBUG logging level the ouput.

So, the problem is that the size of the ByteArray returned by receiveBytes is shorter that the expected message length (that should be checked). I have looked on the packets using Wireshark and they seems to be ok.

The text was updated successfully, but these errors were encountered:

gavanderhoorn · 2013-07-25T06:44:27Z

'looked at the packets using Wireshark': did you use packet-simplemessage?

Could you attach / upload a Wireshark capture somewhere?

liborw · 2013-07-25T08:30:27Z

Yes I have, the file can be downloaded here, but it will expire in one day.

liborw · 2013-07-25T08:34:42Z

I have an update on this, adding a MSG_WAITALL flag to the RECV command in simple_message/src/socket/tcp_socket.cpp:84 solved the problem. Is there a reason not to have this flag there?

gavanderhoorn · 2013-07-25T08:42:27Z

Ah, this might then be related to swri-ros-pkg/issues/23.

Yes, the first few simple_message packets arrive over 2 TCP frames (packet 786 in your capture). Seems the deserialisation step is confused by this.

JeremyZoss · 2013-07-25T19:30:40Z

I think this assessment is probably correct.

While Libor's fix probably solves this issue in the short-term, I think we should hold off on pushing this change to the public repository. This change causes the socket code to wait forever until the requested bytes are read. In the event of datastream errors, this might lead to long waits with no errors. A better solution would be to properly handle both split and merged simple_message packets in a mechanism that also provides timeout-capable waits. So that we could wait for the expected # of bytes, possibly across multiple packets, but return after a specified timeout period if the expected # of bytes is not received. This is indeed the issue logged in bug #23.

Please close this bug if your issue is resolved.

gavanderhoorn · 2013-07-25T19:35:21Z

One idea might be to utilise the following pattern:

read all bytes currently in the receive buffer (a single recv(..))
copy those bytes to a internal buffer
if len(my_buffer) >= nr_of_bytes_needed: handle(my_buffer[:nr_of_bytes_needed])
remove bytes from my_buffer
goto 'start'

Fairly standard. If there aren't enough bytes, wait for them to come in the next recv(..) (but with a configurable timeout).

JeremyZoss · 2013-07-25T19:54:04Z

That's mostly what I was thinking. Here are two possible variations:

only read up to the number of expected bytes (not full receive buffer)
- this prevents the need to shift bytes in my_buffer
- but potentially means slightly more reads of the socket buffers (not very expensive)
use SO_RCVTIMEO option on the socket, to automatically enforce the desired recv timeout. This may not be well-supported cross-platform (e.g. DX100 C++ server code), but could be #ifdef controlled to the linux side only. This would be a relatively painless fix, I think.

We should probably log this as an enhancement, or continue discussion on the original bug, since I don't know that the current discussion is directly relevant to Libor's error messages.

shaun-edwards · 2013-07-25T19:54:52Z

This is probably a good solution, but how would we test it? Has anybody seen this error before? Is there anything special about Libor's setup that would cause this? Perhaps network topology? We've always communicated with the robot on a small (single switch) network.

JeremyZoss · 2013-07-25T20:04:56Z

I think testing would be fairly easy. We could "hand-craft" TCP message packets that are either the combination of two SimpleMessage packets, or split a single message into multiple packets. These packets could be sent (with optional time-delays) to test the merged/split packet handling. I think that with TCP_NODELAY set, all packets should be sent "immediately".

I do remember this being an issue before. I can't remember the details, but I think it might have been Ed Venator using a wireless link to his ABB robot. The wireless delays caused split packets.

gavanderhoorn · 2013-07-25T20:14:37Z

If you go the receive timeout way, why not just use a select(..) with O_NONBLOCK on the socket. That is the 'normal' way AFAIK. Again, #ifdefed, as I don't think many of the controllers -- if they support C/C++ -- will support select.

As for the multiple-packets-in-single-TCP-frames or single-packet-over-multiple-TCP-frames: I've been able to reproduce that fairly easily (at least for the Fanuc). Either set a very high update rate on the controller side (so send out a lot of packets/sec), or configure the socket in 'interactive' mode. The latter is probably mapped onto TCP_NODELAY: every write results in a packet. These two result in both extremes.

gavanderhoorn · 2013-07-25T20:16:39Z

As for testing: if you mean manually: there are tools that can replay a Wireshark capture, I think I can get some fairly complicated (de)serialisation situations using the technique described above. ~~If you mean automatically: just create some hand-crafted packets (byte buffers) and feed them to the deserialisation routines by hand. That should be doable.~~ (right, JeremyZoss already said that)

JeremyZoss · 2013-07-25T20:25:40Z

select-based timeouts would work, and are probably more "standard", as you suggest. But I think the SO_RCVTIMEO method is more elegant (when supported), because it can be dropped into the current implementation with minimal changes. It also allows you to block until the desired # of bytes are read, while select() only considers whether any data is available or not.

In my opinion, select is a cleaner way to do the implementation you originally suggested (loop over recv(NOWAIT) until timeout or all data received). This eliminates the need to do a "polling" or "busy-wait" loop waiting for data to arrive. The select() call will block until data (or timeout).

But I still think the SO_RCVTIMEO option is more elegant, if it works. That's just my vote, though. I'm probably happy to leave it up to whoever feels motivated to do the implementation. =)

shaun-edwards · 2013-07-29T02:24:04Z

Another related issue (google code): https://code.google.com/p/swri-ros-pkg/issues/detail?id=55

…ecuted in parallel with colliding on the same ports

Issue #25 fix

shaun-edwards · 2013-09-16T22:14:45Z

Fixed

ghost assigned shaun-edwards Jul 25, 2013

shaun-edwards added a commit that referenced this issue Aug 8, 2013

Added test for issue #25 and added port macro to allow tests to be ex…

0bcdc06

…ecuted in parallel with colliding on the same ports

shaun-edwards mentioned this issue Aug 13, 2013

Issue #25 fix #33

Merged

JeremyZoss added a commit that referenced this issue Aug 23, 2013

Merge pull request #33 from ros-industrial/issue25_recv_bytes

dd0490e

Issue #25 fix

JeremyZoss mentioned this issue Aug 23, 2013

simple_message should include receive timeout #34

Closed

shaun-edwards closed this as completed Sep 16, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of received bytes differ from message length (SmplMsgConnection::receiveMsg). #25

Number of received bytes differ from message length (SmplMsgConnection::receiveMsg). #25

liborw commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

liborw commented Jul 25, 2013

liborw commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

shaun-edwards commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

shaun-edwards commented Jul 29, 2013

shaun-edwards commented Sep 16, 2013

Number of received bytes differ from message length (SmplMsgConnection::receiveMsg). #25

Number of received bytes differ from message length (SmplMsgConnection::receiveMsg). #25

Comments

liborw commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

liborw commented Jul 25, 2013

liborw commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

shaun-edwards commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

gavanderhoorn commented Jul 25, 2013

JeremyZoss commented Jul 25, 2013

shaun-edwards commented Jul 29, 2013

shaun-edwards commented Sep 16, 2013