Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethernet connection stops if two packets are sent too close (ECP5 5A75B) #1268

Closed
faeboli opened this issue Apr 7, 2022 · 13 comments
Closed

Comments

@faeboli
Copy link

faeboli commented Apr 7, 2022

Hello,
I'm having a problem with etherbone: the connection stops working if two packets are sent too close to each other.
The only way to restore connection is to power cycle the board.
The packets need to be very close (few tens of us apart) to happen, but the problem appear after some minutes in my
raspi4 if I send 1 packet per ms to the board.
I was able to reproduce the problem on my ubuntu 20.04 LTS host machine with the default board target file:
Litex updated this evening.
Board 5A75B v8.0
colorlight_5a_75x.py edited with increased buffer depth on line 166:
self.add_etherbone(phy=self.ethphy, ip_address=eth_ip,buffer_depth=1060)

build command:
./colorlight_5a_75x.py --revision=8.0 --uart-name=crossover --eth-ip="192.168.2.50" --with-etherbone --csr-csv=csr.csv --build

For reproducing the problem I've used the python code attached here, the data is a packet that asks to read back contents of 45
contiguous addresses starting from 0x00000000.
If I comment out the delay, the board stops working and need a power cycle.
This situation can happen for example if the host machine sends arp messages near my UDP etherbone packets.
Is there something I'm doing wrong? Or some workaround?
If anybody can test please let me know if the problem can be reproduced.
Thanks


#!/usr/bin/env python3

import time
import socket

s = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00-\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00\x0c\x00\x00\x00\x10\x00\x00\x00\x14\x00\x00\x00\x18\x00\x00\x00\x1c\x00\x00\x00 \x00\x00\x00$\x00\x00\x00(\x00\x00\x00,\x00\x00\x000\x00\x00\x004\x00\x00\x008\x00\x00\x00<\x00\x00\x00@\x00\x00\x00D\x00\x00\x00H\x00\x00\x00L\x00\x00\x00P\x00\x00\x00T\x00\x00\x00X\x00\x00\x00\\\x00\x00\x00`\x00\x00\x00d\x00\x00\x00h\x00\x00\x00l\x00\x00\x00p\x00\x00\x00t\x00\x00\x00x\x00\x00\x00|\x00\x00\x00\x80\x00\x00\x00\x84\x00\x00\x00\x88\x00\x00\x00\x8c\x00\x00\x00\x90\x00\x00\x00\x94\x00\x00\x00\x98\x00\x00\x00\x9c\x00\x00\x00\xa0\x00\x00\x00\xa4\x00\x00\x00\xa8\x00\x00\x00\xac\x00\x00\x00\xb0'   
for i in range(1,10) :
	s.sendto(data,("192.168.2.50", 1234))
	print(data)
#	time.sleep(0.001)
@enjoy-digital
Copy link
Owner

enjoy-digital commented Apr 8, 2022

Hi @faeboli,

buffer_depth=1060 is higher than what could be supported by Etherbone, I just added an assertion in LiteEth with enjoy-digital/liteeth@bc9162d to avoid future miss-configuration. The colorlight already probably has trouble meeting timings and a such high buffer_depth value will not help.

Could you see if with the default value of 16 and with a maximum number of read of 4, 8, 16 the issue also happens?
If so, could you try increasing the sys_clk_freq?

This could also be interesting to put a LiteScope instance in your design (over UART or JTAG) and see what happens internally:

Now that JTABBone is supported in ECP5, it's pretty easy/convenient to use it with just a self.add_jtagbone() to your SoC.

@faeboli
Copy link
Author

faeboli commented Apr 8, 2022

Thank you for your answer!
Update:

  • removed the edit to buffer_depth, in order to have it back to default
  • increased a little clock frequency: ./colorlight_5a_75x.py --revision=8.0 --uart-name=crossover --sys-clk-freq=100e6 --eth-ip="192.168.2.50" --with-etherbone --csr-csv=csr.csv --build
  • tried to read back only 4 2 1 registers with this code:
#!/usr/bin/env python3

import time
import socket

s = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)

#data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00' #read 1 register
data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04' #read 2 registers
#data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00\x0c' #read 4 registers
for i in range(1,3) :
	s.sendto(data,("192.168.2.50", 1234))
	print(data)
#	time.sleep(0.001)

Results as similar, i.e. if the delay is commented out, litex panics when 2 packets are sent less than about 50us apart.
The failure behavior changes with packet length:

  • 4 registers readback fails locking out the board
  • 2 and 1 register readback fails flooding the connection with loads of packets

@faeboli
Copy link
Author

faeboli commented Apr 8, 2022

Now I have configured JTAG Bridge and I'd like to try to use LiteScope for understanding something more
and try to help, but I'm a little lost about what to measure and where,
do you have a suggestion of possible signals to acquire?
Thanks

@enjoy-digital
Copy link
Owner

@faeboli: Sorry for the delay, I'll try to reproduce here and if so, should be able to fix directly.
BTW I just found this: https://forum.linuxcnc.org/27-driver-boards/44422-colorcnc-colorlight-5a-75e-5a-75b-as-fpga-controller-board Is it related to this project?

@faeboli
Copy link
Author

faeboli commented Apr 22, 2022

@enjoy-digital
Thank you for your support, hope you can reproduce the problem.
To answer your question, yes it's related to that work, I've posted in that forum as "muvideo"

@enjoy-digital
Copy link
Owner

@faeboli: OK thanks. Funny thing is that I also played with Linux-CNC in the past and also thought about using LiteEth + FPGA for such purposes (but sadly don't have time for all projects...). So that's great seeing this project and also a motivation for me to provide more support now that things are more concrete and related to something I'm also interested in :)

@enjoy-digital
Copy link
Owner

@faeboli: The issue should be fixed with edd98c2 (I've been able to reproduce it and no longer see it with this fix). Can you do a test (and set buffer_depth to 255 which is the maximum supported value)?.

@faeboli
Copy link
Author

faeboli commented Apr 25, 2022

@enjoy-digital
Hi, wonderful, just made some tests and the problem seem solved in my testbench,
I'll test the fix also on Linuxcnc to confirm that the connection remains stable also on a raspberry with preempt RT,
where it was first noticed, this will take some time, but I'm positive that the problem will be solved.
Thanks, Fabio
Update: confirmed fix for my setup also in linuxcnc

@enjoy-digital
Copy link
Owner

Great, thanks for the feedback @faeboli. Please ask if any issue in the future since as I said I find this project very interesting and willing to help. I now also better understand the request with enjoy-digital/liteeth#103, but still haven't been able to think about it.
Florent

@faeboli
Copy link
Author

faeboli commented Apr 28, 2022

Thank you for your support, there is a bunch of smart guys at linuxcnc forum that are actively working on litex for linuxcnc, in my opinion the availability of cheap fpga boards with gigabit ethernet together with linuxcnc on raspberry is a game changer for low cost cnc builds. The possibility to daisy chain several boards will help to build robust and dependable control hardware.
Litex is enabling all this potential to come together.

Fabio.

@enjoy-digital
Copy link
Owner

That's great to see this, I'll have think about the best way to enable daisy-chain in LiteEth. And if features are missing or issues are found during your efforts to use it with linuxcnc, feel free to ask on github issues or join the #litex channel on libera.chat.

@romanetz
Copy link

romanetz commented May 10, 2022

Hi @faeboli, @enjoy-digital

buffer_depth=1060 is higher than what could be supported by Etherbone, I just added an assertion in LiteEth with enjoy-digital/liteeth@bc9162d to avoid future miss-configuration. The colorlight already probably has trouble meeting timings and a such high buffer_depth value will not help.

Could you see if with the default value of 16 and with a maximum number of read of 4, 8, 16 the issue also happens? If so, could you try increasing the sys_clk_freq?

This could also be interesting to put a LiteScope instance in your design (over UART or JTAG) and see what happens internally:

* https://github.com/enjoy-digital/litex/wiki/Use-Host-Bridge-to-control-debug-a-SoC

* https://github.com/enjoy-digital/litex/wiki/Use-LiteScope-To-Debug-A-SoC

Now that JTABBone is supported in ECP5, it's pretty easy/convenient to use it with just a self.add_jtagbone() to your SoC.

The intention of increasing buffer size up to 1060 was to increase available amount of wishbone registers exchanged in a single packet.

@enjoy-digital
Copy link
Owner

enjoy-digital commented May 10, 2022

Hi @romanetz,

buffer_depth is expressed in wishbone words and a maximum of 255 is supported by the Etherbone protocol, so 1060 was not a valid value. A check has been added to LiteEth to prevent miss-configuration: enjoy-digital/liteeth@bc9162d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants