Fix timeout mutation #80

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
2 participants

The modbus_set_response_timeout() API stores a timeout, that is subsequently used to limit TCP/IP connect() requests. Unfortunately, the C select() API on Linux is allowed to mutate the provided struct timeval. This results in permanent changes to the struct timeval representing the timeout, in the modbus_t structure! This is (at best) unexpected, because most users of the libmodbus API would expect that the timeout specified using a modbus_set_response_timeout() call should remain as they specified it, for all future libmodbus API calls, until it is changed using a subsequent modbus_set_response_timeout call.

The attached commits update the API to specify that the modbus_t's timeout is never passed by reference in a mutable way (use "const timeval *" instead of "timeval *"), and a copy of the provided timeval is used in calls to select(), to avoid corrupting the original.

Owner

stephane commented Nov 8, 2012

Very good patch I'm surprised to have missed this!

Owner

stephane commented Nov 8, 2012

Oh wait! The timeval structure is set by _modbus_receive_msg with response_timeout on first call and with byte_timeout between each byte.

So how did you come to this conclusion?

stephane was assigned Nov 8, 2012

Owner

stephane commented Jan 7, 2013

No answer to my question.

stephane closed this Jan 7, 2013

Sorry, Stephane; it is certainly a defect, and we have extensive unit tests in our project verifying it. We work around it by always re-setting the timeout before every call.

I will work toward back-porting some of our internal unit tests into the libmodbus project...

Owner

stephane commented Jan 7, 2013

Ok I stay tuned!

stephane reopened this Jan 7, 2013

OK, I've implemented a unit test in unit-test-client.c, which fails in 'master', but passes in branch 'fix-timeout-mutation'.

Basically, the select call on Linux alters the supplied struct timeval by whatever time elapses during the select. In most cases, this was harmless; for example, in modbus-tcp.c in _modbus_tcp_select, the supplied struct timeval used when modbus_t::select is invoked was always a copy, so no harm was done. However, these uses of select were also fixed, to avoid future problems if modbus.c was ever altered to pass a struct timeval that was expected to remain intact.

The real problem was in the use of modbus_t::connect on a Modbus/TCP connection; _modbus_tcp_connect calls _connect, which mutated the supplied struct timeval (in this case, a pointer to modbus_t::receive_timeout, subtracting whatever time elapsed during the TCP/IP "Eager Connection". This was often only a few microseconds in a LAN environment (ie. during any tests using loopback on the same host). However, in a WAN environment, this time could be significant.

As a result, the modbus_t::receive_timeout would be permanently reduced, affecting every subsequent communication I/O attempt!

Owner

stephane commented Oct 23, 2013

You're right about connect behavior but I'm against Cargo Cult code for handling of timeouts in select calls, it clutters code for nothing so I'll write a patch based on your work for connect + comment + test.
To simplify my current timeout handling, I'm intend to use pselect.

Thank you for your bug report and patches.

stephane closed this in 5665306 Nov 20, 2013

@mk8 mk8 added a commit to mk8/libmodbus that referenced this pull request Jan 29, 2014

@stephane @mk8 stephane + mk8 Fix response timeout modification on connect (closes #80)
Thanks to Perry Kundert for bug report and initial patches.
50a7ccc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment