Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max of 320 messages on RX. Dropping packets #657

Closed
SpiRaiL opened this issue Jul 26, 2019 · 15 comments · Fixed by #1108
Closed

max of 320 messages on RX. Dropping packets #657

SpiRaiL opened this issue Jul 26, 2019 · 15 comments · Fixed by #1108

Comments

@SpiRaiL
Copy link

SpiRaiL commented Jul 26, 2019

I noticed that I can only fit a max of 320 (extened id, dlc=8) messages on the receiver. If I don't pull them off the socket fast enough the last ones get dropped. Is there any way to increase the queue size on can.interfaces.socketcan.socketcan?

@karlding
Copy link
Collaborator

No, unfortunately this isn't currently exposed via the python-can API.

As such, the size of this buffer should take on your system's defaults. If you need this functionality, as a temporary workaround you can try increasing /proc/sys/net/core/rmem_default (and /proc/sys/net/core/rmem_max if needed) like so, but note that this will increase the size for all sockets.

# Read current values
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/core/rmem_default

# Temporarily set new values. To permanently set, do it via sysctl.conf
echo $NEW_VALUE | sudo tee /proc/sys/net/core/rmem_max
echo $NEW_VALUE | sudo tee /proc/sys/net/core/rmem_default

You can try and see if that works.

If we were to implement support for this, we can probably set SO_RCVBUF on the socket to configure the receive buffer size. We can then perform a SO_RCVBUF read afterwards, and if the value we read back is less, then we know that we're being limited by /proc/sys/net/core/rmem_max and raise an appropriate exception. Some work would probably be needed to flesh out the API though, but the implementation shouldn't be terribly difficult.

Note: Also, if we have the CAP_NET_ADMIN capability, then we can set SO_RCVBUFFORCE, allowing us to bypass the system limit. I'm not familiar with capabilities in Python, but that's probably something we can address later if requested.

@felixdivo
Copy link
Collaborator

We should consider adding this to the documentation.

@thbuch
Copy link

thbuch commented Aug 2, 2021

Hi there!
Thanks for the information. I am using a Raspberry Pi 3B+, a Seeed-sTudio 2CAN-FD Hat and a Tkinter GUI.
I am noticing package drops even using the simplest examples given in the library when trying to log to blf.
I though making the buffer bigger might help but it didn't work for me.

Have you ever tested on raspberry pi? Any clues for message dropping? Thank you.

@felixdivo
Copy link
Collaborator

It tested on a Raspi 4B with socketcan (from Ubuntu 20.04) via USB and everything works there.

@thbuch
Copy link

thbuch commented Aug 16, 2021

I have always used it with Raspbian. I will give a try with Ubuntu. My can adapter is communicating via SPI, I guess it may affect too.
I will come back with my testing results in some days. Thanks.

@thbuch
Copy link

thbuch commented Aug 17, 2021

I am having a lot of problems just to install the drivers for the hat using Ubuntu.
Actually, it is working apparently fine on Raspian. But when trying to log CAN frames at a certain speed, then I noticed I was missing some data.
I need to log frames quite fast (4frames/ms). I also tried using buffered listeners but I miss around 10% messages.
I don't know if it is a hardware or software problem...

@hartkopp
Copy link
Collaborator

I have a Raspian with a Linux 5.10 kernel which works out-of-the-box WITHOUT compiling any drivers/scripts from Seeedstudio.
I only added the CAN interface assignments in config.txt

The Seeedstudio install procedure might have worked for older kernels but the best SPI drivers are in Mainline Linux now and compiling and installing the stuff from Seeedstudio made it worse and finally sabotaged the working setup (in my case).

@thbuch
Copy link

thbuch commented Aug 18, 2021

It sounds great! I actually have installed this linux Kernel version.
@hartkopp, can you give me an example of how to add the CAN interfaces correctly in config.txt?
Currently, my last line in config.txt is:
dtoverlay=seeed-can-fd-hat-v2

@hartkopp
Copy link
Collaborator

Yes, similar here:
dtoverlay=seeed-can-fd-hat-v1.dtbo

This was the only thing I changed on my RasPi (no compiling and installing of out-of-tree CAN drivers that came with the seeed package).

@thbuch
Copy link

thbuch commented Aug 19, 2021

I see. I am using v2 due to the RealTimeClock in the HAT.
I started some time ago with different dtoverlay (I guess pointing to diffrerent drivers). And when I noticed these problems I updated everything and setup with dtoverlay=seeed-can-fd-hat-v2.
What are these CAN drivers you mention? How can I check if I have them installed and unistall them?

@hartkopp
Copy link
Collaborator

There was a former installation process where a driver was compiled and installed (not only the dtoverlay):
https://github.com/Seeed-Studio/seeed-linux-dtoverlays/tree/master/modules/CAN-HAT

As you can see here https://github.com/Seeed-Studio/seeed-linux-dtoverlays/blob/master/modules/CAN-HAT/Makefile#L47 the mcp25xxfd.ko module is copied into the kernel modules directory. When the file date of mcp25xxfd.ko is different to the other drivers in that directory you likely compiled (and installed) the out-of-tree driver.

@thbuch
Copy link

thbuch commented Aug 20, 2021

All right! I've seen how to uninstall it.
I'll make some tests and I'll come with the results in a few days. Thanks a lot!

@thbuch
Copy link

thbuch commented Sep 10, 2021

Hi, I could improve quite a lot by uninstalling the old driver and setting correctly the can interfaces (thanks hartkopp) and setting big buffers (thanks karlding).
However, it only works fine for half an hour aproximately. Apparently, after this period, rx-buffer is full and I find gaps in the logging files compared to the one logged using Vector Canoe.

If I run in the terminal:dmesg | grep spi
I am getting: mcp251xfd spi0.0 can0: RX-0: FIFO overflow.

I guess that if the buffer is full is because I am not taking the messages out fast enough, right?
I am using Tkinter as user interface. That calls two threads to read Can0 and Can1 and log the data in two separated files in a USB memory.
Stuff I tried without big differences:
· saving files to SD (not USB memory)
· logging to ASC or BLF formats.
· I tried without tkinter interface.

Is there any way to check the buffer status? And to clear it?
Any ideas to make it faster?
Any other suggestion?

Thanks a lot

@marckleinebudde
Copy link

If you get a

mcp251xfd spi0.0 can0: RX-0: FIFO overflow.

that means that the kernel failed to get the messages fast enough out of the hardware. This is independent of any user space buffers or size, and/or applications receiving CAN frames.

However, if the system is loaded, the processor has less time to download incoming CAN frames from the hardware and finally the buffer will overrun.

@thbuch
Copy link

thbuch commented Sep 20, 2021

I did some test again checking the CPU load. It was slightly over 50%.
I managed to insert a time.sleep(0.1) in the GUI part and the main python program reduced around 20% CPU.
Then with a button I open a new window which activates the CAN logging.
In order to read CAN channels I launch a thread that increases CPU load around 20%.
As far as I know, this time.sleep may not affect the thread to continuously execute recv(), right?
In summary:
if I test read 1 CAN channel: CPU load around 15-30% (50% without time.sleep)
if I test read 2 CAN channels: CPU load around 35-50% (65-70% without time.sleep)

In al cases, even with the low CPU load I am receiving FIFO overflow.

Is that what you mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants