Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix timeout mutation #80

Closed
wants to merge 4 commits into from

2 participants

@pjkundert

The modbus_set_response_timeout() API stores a timeout, that is subsequently used to limit TCP/IP connect() requests. Unfortunately, the C select() API on Linux is allowed to mutate the provided struct timeval. This results in permanent changes to the struct timeval representing the timeout, in the modbus_t structure! This is (at best) unexpected, because most users of the libmodbus API would expect that the timeout specified using a modbus_set_response_timeout() call should remain as they specified it, for all future libmodbus API calls, until it is changed using a subsequent modbus_set_response_timeout call.

The attached commits update the API to specify that the modbus_t's timeout is never passed by reference in a mutable way (use "const timeval *" instead of "timeval *"), and a copy of the provided timeval is used in calls to select(), to avoid corrupting the original.

@stephane
Owner

Very good patch I'm surprised to have missed this!

@stephane
Owner

Oh wait! The timeval structure is set by _modbus_receive_msg with response_timeout on first call and with byte_timeout between each byte.

So how did you come to this conclusion?

@stephane stephane was assigned
@stephane
Owner

No answer to my question.

@stephane stephane closed this
@pjkundert

Sorry, Stephane; it is certainly a defect, and we have extensive unit tests in our project verifying it. We work around it by always re-setting the timeout before every call.

I will work toward back-porting some of our internal unit tests into the libmodbus project...

@stephane
Owner

Ok I stay tuned!

@stephane stephane reopened this
@pjkundert

OK, I've implemented a unit test in unit-test-client.c, which fails in 'master', but passes in branch 'fix-timeout-mutation'.

Basically, the select call on Linux alters the supplied struct timeval by whatever time elapses during the select. In most cases, this was harmless; for example, in modbus-tcp.c in _modbus_tcp_select, the supplied struct timeval used when modbus_t::select is invoked was always a copy, so no harm was done. However, these uses of select were also fixed, to avoid future problems if modbus.c was ever altered to pass a struct timeval that was expected to remain intact.

The real problem was in the use of modbus_t::connect on a Modbus/TCP connection; _modbus_tcp_connect calls _connect, which mutated the supplied struct timeval (in this case, a pointer to modbus_t::receive_timeout, subtracting whatever time elapsed during the TCP/IP "Eager Connection". This was often only a few microseconds in a LAN environment (ie. during any tests using loopback on the same host). However, in a WAN environment, this time could be significant.

As a result, the modbus_t::receive_timeout would be permanently reduced, affecting every subsequent communication I/O attempt!

@stephane
Owner

You're right about connect behavior but I'm against Cargo Cult code for handling of timeouts in select calls, it clutters code for nothing so I'll write a patch based on your work for connect + comment + test.
To simplify my current timeout handling, I'm intend to use pselect.

Thank you for your bug report and patches.

@stephane stephane closed this pull request from a commit
@stephane Fix response timeout modification on connect (closes #80)
Thanks to Perry Kundert for bug report and initial patches.
5665306
@stephane stephane closed this in 5665306
@mk8 mk8 referenced this pull request from a commit in mk8/libmodbus
@stephane Fix response timeout modification on connect (closes #80)
Thanks to Perry Kundert for bug report and initial patches.
50a7ccc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
2  src/modbus-private.h
@@ -110,7 +110,7 @@ typedef struct _modbus_backend {
int (*connect) (modbus_t *ctx);
void (*close) (modbus_t *ctx);
int (*flush) (modbus_t *ctx);
- int (*select) (modbus_t *ctx, fd_set *rset, struct timeval *tv, int msg_length);
+ int (*select) (modbus_t *ctx, fd_set *rset, const struct timeval *tv, int msg_length);
void (*free) (modbus_t *ctx);
} modbus_backend_t;
View
12 src/modbus-rtu.c
@@ -949,12 +949,18 @@ static int _modbus_rtu_flush(modbus_t *ctx)
}
static int _modbus_rtu_select(modbus_t *ctx, fd_set *rset,
- struct timeval *tv, int length_to_read)
+ const struct timeval *tv, int length_to_read)
{
int s_rc;
+ // Select is allowed to modify its timeout timeval; take a copy
+ // Note: may be NULL (indicates no timeout)
+ struct timeval to;
+ if ( tv )
+ to = *tv;
+
#if defined(_WIN32)
s_rc = win32_ser_select(&(((modbus_rtu_t*)ctx->backend_data)->w_ser),
- length_to_read, tv);
+ length_to_read, ( tv ? &to : NULL ));
if (s_rc == 0) {
errno = ETIMEDOUT;
return -1;
@@ -964,7 +970,7 @@ static int _modbus_rtu_select(modbus_t *ctx, fd_set *rset,
return -1;
}
#else
- while ((s_rc = select(ctx->s+1, rset, NULL, NULL, tv)) == -1) {
+ while ((s_rc = select(ctx->s+1, rset, NULL, NULL, ( tv ? &to : NULL ))) == -1) {
if (errno == EINTR) {
if (ctx->debug) {
fprintf(stderr, "A non blocked signal was caught\n");
View
76 src/modbus-tcp.c
@@ -199,9 +199,40 @@ static int _modbus_tcp_pre_check_confirmation(modbus_t *ctx, const uint8_t *req,
}
errno = EMBBADDATA;
return -1;
- } else {
- return 0;
}
+
+ /* TODO: Check Protocol ID
+ *
+ * if (req[2] != rsp[2] || req[3] != rsp[3]) {
+ * if (ctx->debug) {
+ * fprintf(stderr, "Invalid Protocol ID received 0x%X (not 0x%X)\n",
+ * (rsp[2] << 8) + rsp[3], (req[2] << 8) + req[3]);
+ * }
+ * errno = EMBBADDATA;
+ * return -1;
+ * }
+ */
+
+ /*
+ * Check that the Modbus/TCP header length field, matches? Not really
+ * necessary, because check_confirmation has already confirmed that the
+ * Modbus/TCP code (eg. read holding registers) is correct. So, don't
+ * bother to check it here. We don't need to support "unknown" (new) Modbus
+ * codes.
+ */
+
+ /* TODO: Check Unit ID
+ *
+ * if (req[6] != rsp[6]) {
+ * if (ctx->debug) {
+ * fprintf(stderr, "Invalid Unit ID received 0x%X (not 0x%X)\n",
+ * (int)rsp[6], (int)req[6]);
+ * }
+ * errno = EMBBADDATA;
+ * return -1;
+ * }
+ */
+ return 0;
}
static int _modbus_tcp_set_ipv4_options(int s)
@@ -249,23 +280,42 @@ static int _modbus_tcp_set_ipv4_options(int s)
return 0;
}
+/*
+ * A TCP/IP "Eager Connection" setup allows a connect() call to complete after a
+ * server listen(), but before it invokes accept(). Therefore, this select
+ * complete as quickly as the client's and server's kernel TCP/IP will allow --
+ * regardless of whether the server has a delay between detecting readability on
+ * its listen socket and invoking accept(). This resulted in a subtle defect,
+ * difficult to detect in a LAN environment; passing the incoming timeval
+ * pointer directly to select would allow select to modify it by whatever
+ * (usually small) time was required to complete the connection -- permanently
+ * altering the supplied timeval. In the case of _connect, this was the
+ * modbus_t::response_timeout. Ensure we don't ever allow modification of the
+ * supplied timeval.
+ */
static int _connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen,
- struct timeval *tv)
+ const struct timeval *tv)
{
int rc;
-
rc = connect(sockfd, addr, addrlen);
if (rc == -1 && errno == EINPROGRESS) {
fd_set wset;
int optval;
socklen_t optlen = sizeof(optval);
+ /* Select is allowed to modify its timeout timeval; take a copy.
+ * WARNING: may be NULL (indicates no timeout) */
+ struct timeval to;
+ if ( tv )
+ to = *tv;
/* Wait to be available in writing */
FD_ZERO(&wset);
FD_SET(sockfd, &wset);
- rc = select(sockfd + 1, NULL, &wset, NULL, tv);
+ rc = select(sockfd + 1, NULL, &wset, NULL, ( tv ? &to : NULL ));
if (rc <= 0) {
- /* Timeout or fail */
+ /* Timeout or fail; ensures errno is always set on timeout! */
+ if (rc == 0)
+ errno = ETIMEDOUT;
return -1;
}
@@ -323,6 +373,10 @@ static int _modbus_tcp_connect(modbus_t *ctx)
addr.sin_port = htons(ctx_tcp->port);
addr.sin_addr.s_addr = inet_addr(ctx_tcp->ip);
rc = _connect(ctx->s, (struct sockaddr *)&addr, sizeof(addr), &ctx->response_timeout);
+ if (ctx->debug) {
+ printf("Connecting to %s:%d; %s\n", ctx_tcp->ip, ctx_tcp->port,
+ (( rc == -1 ) ? strerror( errno ) : "successful" ));
+ }
if (rc == -1) {
close(ctx->s);
return -1;
@@ -642,10 +696,16 @@ int modbus_tcp_pi_accept(modbus_t *ctx, int *socket)
return ctx->s;
}
-static int _modbus_tcp_select(modbus_t *ctx, fd_set *rset, struct timeval *tv, int length_to_read)
+static int _modbus_tcp_select(modbus_t *ctx, fd_set *rset, const struct timeval *tv, int length_to_read)
{
int s_rc;
- while ((s_rc = select(ctx->s+1, rset, NULL, NULL, tv)) == -1) {
+ // Select is allowed to modify its timeout timeval; take a copy
+ // Note: may be NULL (indicates no timeout)
+ struct timeval to;
+ if ( tv )
+ to = *tv;
+
+ while ((s_rc = select(ctx->s+1, rset, NULL, NULL, ( tv ? &to : NULL ))) == -1) {
if (errno == EINTR) {
if (ctx->debug) {
fprintf(stderr, "A non blocked signal was caught\n");
View
13 src/modbus.c
@@ -358,6 +358,9 @@ int _modbus_receive_msg(modbus_t *ctx, uint8_t *msg, msg_type_t msg_type)
* received */
p_tv = NULL;
} else {
+ /* Wait for the initial message payload, for up to the full
+ * response_timeout; if incomplete, subsequent characters must be
+ * received within ctx->byte_timeout (enforced below) */
tv.tv_sec = ctx->response_timeout.tv_sec;
tv.tv_usec = ctx->response_timeout.tv_usec;
p_tv = &tv;
@@ -526,7 +529,7 @@ static int check_confirmation(modbus_t *ctx, uint8_t *req,
if (function != req[offset]) {
if (ctx->debug) {
fprintf(stderr,
- "Received function not corresponding to the requestd (0x%X != 0x%X)\n",
+ "Received function not corresponding to the request (0x%X != 0x%X)\n",
function, req[offset]);
}
if (ctx->error_recovery & MODBUS_ERROR_RECOVERY_PROTOCOL) {
@@ -585,6 +588,14 @@ static int check_confirmation(modbus_t *ctx, uint8_t *req,
errno = EMBBADDATA;
rc = -1;
}
+
+ /*
+ * TODO: For Write Register(s), check for incorrect register and count
+ * in response (not matching request). Presently, this will allow
+ * incorrect write responses, containing non-matching register and
+ * count.
+ */
+
} else {
if (ctx->debug) {
fprintf(stderr,
View
38 tests/unit-test-client.c
@@ -32,8 +32,8 @@ enum {
int main(int argc, char *argv[])
{
- uint8_t *tab_rp_bits;
- uint16_t *tab_rp_registers;
+ uint8_t *tab_rp_bits = 0;
+ uint16_t *tab_rp_registers = 0;
uint16_t *tab_rp_registers_bad;
modbus_t *ctx;
int i;
@@ -81,11 +81,37 @@ int main(int argc, char *argv[])
modbus_set_slave(ctx, SERVER_ID);
}
- if (modbus_connect(ctx) == -1) {
- fprintf(stderr, "Connection failed: %s\n", modbus_strerror(errno));
- modbus_free(ctx);
- return -1;
+
+ /* Save original timeout before/after connect, and ensure it isn't mutated */
+ printf("\nTEST TIMEOUT MUTATION (CONNECT):\n");
+
+ modbus_get_response_timeout(ctx, &old_response_timeout);
+ {
+ struct timeval beg, end;
+ gettimeofday( &beg, NULL );
+ if (modbus_connect(ctx) == -1) {
+ fprintf(stderr, "Connection failed: %s\n", modbus_strerror(errno));
+ modbus_free(ctx);
+ return -1;
+ }
+ gettimeofday( &end, NULL );
+ printf( "\n CONNECTED in %fs\n", (((double)end.tv_sec + (double)end.tv_usec / 1000000)
+ - ((double)beg.tv_sec + (double)beg.tv_usec / 1000000)));
}
+ modbus_get_response_timeout(ctx, &response_timeout);
+ rc = memcmp( &response_timeout, &old_response_timeout, sizeof response_timeout );
+ if (rc == 0) {
+ printf("OK\n");
+ } else {
+ printf("FAILED: modbus_t::response_timeout has been permanently modified by %fs (from %d/%d to %d/%d)\n",
+ ((double)response_timeout.tv_sec + (double)response_timeout.tv_usec / 1000000)
+ - ((double)old_response_timeout.tv_sec + (double)old_response_timeout.tv_usec / 1000000),
+ (int)old_response_timeout.tv_sec, (int)old_response_timeout.tv_usec,
+ (int)response_timeout.tv_sec, (int)response_timeout.tv_usec );
+ goto close;
+ }
+
+
/* Allocate and initialize the memory to store the bits */
nb_points = (UT_BITS_NB > UT_INPUT_BITS_NB) ? UT_BITS_NB : UT_INPUT_BITS_NB;
Something went wrong with that request. Please try again.