socketcan write() errors in csp_can_tx_frame(): ENOBUFS vs EAGAIN #453

Buzzzz · 2023-09-20T12:56:26Z

I'm using a Raspi 3B+ as a CAN bus participant, with the can0 interface. I have noticed that when sending many CSP packets, the write() in csp_can_tx_frame() eventually returns with EAGAIN = Resource temporarily unavailable. But the code only tests for ENOBUFS for retrying until timeout.
Is is problematic to check for both errno values? I have changed the code to this and so far have no problem. The EAGAIN condition goes away within the timeout. I have never seen write() fail with ENOBUFS.

Origial code in src/drivers/can/can_socketcan.c, function csp_can_tx_frame():

    while (write(ctx->socket, &frame, sizeof(frame)) != sizeof(frame)) {
        if ((errno != ENOBUFS) || (elapsed_ms >= 1000)) {

My new code:

    while (write(ctx->socket, &frame, sizeof(frame)) != sizeof(frame)) {
        if (((errno != ENOBUFS) && (errno != EAGAIN)) || (elapsed_ms >= 1000)) {

Is ENOBUFS actually a possible error for an AF_CAN socket? The man page says "ENOBUFS No buffer space available (POSIX.1 (XSI STREAMS option))."

The text was updated successfully, but these errors were encountered:

yashi · 2023-09-20T13:28:59Z

I'd do

check whether write() returned negative, if so get out.
check the returned value against sizeof(frame). Because this is writing to a network socket, write() could return with a partial write. I dout it though since the frame size is 8 bytes. Handle it if we need to. write() could return 1 every time it's called then you are sending one byte for 1000 times. -- edit: I assume we want a partial write to be a failure.

Note that,

Adding yet another error number doesn't catch all erros. A manpage for write(2) lists many more including EINTR.
errno is basically a thread-local global variable, meaning you have to set to 0 before calling a system call. Apearently we don't. If system call failed with -1 it is guaranteed to be update. No functions set it to 0 in any situations.
It might be better to change usleep() to a monotinic timer because using usleep() doesn't count while reading. This might not be important since it desn't take long to write or it doesn't loop that many.

Just my two cents.

johandc · 2023-09-28T16:18:58Z

ENOBUFS will come when the TX queue on the CAN bus becomes full, this happens quite frequently, unless the can interface init increases the number of TX buffers. We run the TX buffers at 1000. But we do se ENOBUFFS after loosing the connection to the CAN bus for a very long time.

I think adding the EAGAIN to the list does not hurt, if thats what your system is giving you. The philosophical troubles of the ERRNO is annoying, and indead we should have something like write_r which returns its reentrant error codes. That would be better, but its not standardised. I think instead using TLS (thread local storage) is the preferred method, but i might be wrong.

The usleep does not exactly need to be precise, so for simplicity i think we should keep it.

yashi self-assigned this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

socketcan write() errors in csp_can_tx_frame(): ENOBUFS vs EAGAIN #453

socketcan write() errors in csp_can_tx_frame(): ENOBUFS vs EAGAIN #453

Buzzzz commented Sep 20, 2023

yashi commented Sep 20, 2023 •

edited

johandc commented Sep 28, 2023

socketcan write() errors in csp_can_tx_frame(): ENOBUFS vs EAGAIN #453

socketcan write() errors in csp_can_tx_frame(): ENOBUFS vs EAGAIN #453

Comments

Buzzzz commented Sep 20, 2023

yashi commented Sep 20, 2023 • edited

johandc commented Sep 28, 2023

yashi commented Sep 20, 2023 •

edited