Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded environment, multi-client connection, system deadlock. #66

Closed
xuwei2015 opened this issue Jun 5, 2018 · 2 comments
Closed

Comments

@xuwei2015
Copy link

Problem description:
I used ieclib61850 1.2.1 to run the server in an embedded Linux environment (arm bigendian). The first client can connect normally. Connect to the second client and the system will deadlock (the console will not respond). At this point, disconnect the first client, the second client will be immediately connected, and the system will return to normal.

Configuration:
Stack_config.h adds a macro definition of the bigendian:
# define PLATFORM_IS_BIGENDIAN 1

The rest are default and unchanged. The key parts are as follows:

/* Maximum MMS PDU SIZE - default is 65000 */
# define CONFIG_MMS_MAXIMUM_PDU_SIZE 120000

/ *
* Enable single threaded mode
*
* 1 ==> server runs in single threaded mode (a single thread for the server and all client connections)
* 0 => server runs in multi-threaded mode (one thread for each connection and)
* one server background thread)
* /
# define CONFIG_MMS_SINGLE_THREADED 0

/ *
* Optimize stack for threadless oper-don't semt use aphores
*
* WARNING: If set to 1 normal single- and multi-threaded server are no longer working!
* /
# define CONFIG_MMS_THREADLESS_STACK 0

/* number of concurrent MMS client connections the server accepts, -1 for no limit */
# define CONFIG_MAXIMUM_TCP_CLIENT_CONNECTIONS 5

/* activate TCP keep alive mechanism. 1-> activate */
# define CONFIG_ACTIVATE_TCP_KEEPALIVE 1

/* time (in s) between last message and first keepalive message */
# define CONFIG_TCP_KEEPALIVE_IDLE 5

After that, we kept alive messages if no ack received */
# define CONFIG_TCP_KEEPALIVE_INTERVAL 2

/* number of not missing keepalive responses until socket is considered dead */
# define CONFIG_TCP_KEEPALIVE_CNT 2

/* maximum COTP (ISO 8073) TPDU size-valid range is 1024-8192 */
# define CONFIG_COTP_MAX_TPDU_SIZE 8192

/* timeout while reading from TCP stream in ms */
# define CONFIG_TCP_READ_TIMEOUT_MS 1000

/* Ethernet interface ID for GOOSE and SV */
# define CONFIG_ETHERNET_INTERFACE_ID "eth0"

Cause analysis:
I compared the debug print for the first and second client connections and found that the Socket_read function always returns 0 for the second client connection. That is, "recv(self->fd, buf, size, MSG_DONTWAIT)" always returns -1, and errno = EAGAIN. Causes the CotpConnection_readToTpktBuffer() function to return TPKT_WAITING all the time, system deadlock.
This is not the case with the previous version of 0.7.6. A comparison of socket_linux.c shows that version 0.7.6 USES "read(self->fd, buf, size)" here.
So, when I change the recv to read, I can connect multiple clients. In addition, I tried to change this to block mode "recv(self->fd, buf, size, MSG_WAITALL)", which also solves this problem.

The first time I mentioned a bug on github, if there is anything wrong, please give me more advice. ^_^

@mzillgith
Copy link
Contributor

Hi.
Thank you for the report. I don't know what could be the reason. I tested on big endian linux (powerpc) and it works without problems (same configuration in stack_config.h). Also I can see no problem in the socket code. It is a strange behavior of the recv function in your case. As you recognized there had been some changes in the past. The socket handling had been changed to non-blocking in order to allow multiple connections handled in the same thread.

@xuwei2015
Copy link
Author

Hi.
You must be very busy to maintain such a large project. Thank you for taking time out of your busy schedule to verify my questions.
I thought later that this problem might be on our side. Because we have made some customization of kernel parameters, and even overwrote some packet-receiving drivers. Maybe there's a potential problem.
I personally spent a lot of time debugging this problem. My original intention here was to find out what was wrong immediately when I met someone else who had the same problem as me.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants