Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Mellanox ConnectX-3 Pro (Dual port SFP+ Ethernet / MT27520) #139

Closed
Doridian opened this issue May 21, 2021 · 9 comments
Closed

Test Mellanox ConnectX-3 Pro (Dual port SFP+ Ethernet / MT27520) #139

Doridian opened this issue May 21, 2021 · 9 comments

Comments

@Doridian
Copy link

Doridian commented May 21, 2021

Had to compile in Mellanox card support as a module, no further changes so far

root@raspberrypi:/home/doridian# lspci -v
01:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
        Subsystem: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] (ConnectX-3 Pro 10 GbE Dual Port SFP+ Adapter)
        Flags: bus master, fast devsel, latency 0, IRQ 67
        Memory at 600800000 (64-bit, non-prefetchable) [size=1M]
        Memory at 600000000 (64-bit, prefetchable) [size=8M]
        [virtual] Expansion ROM at 600900000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number ec-0d-9a-03-00-f2-ac-00
        Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [154] Advanced Error Reporting
        Capabilities: [18c] #19
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core

Picture of the setup (using some PCIe riser I found on eBay for cheap, no-name brand, and powering the entire thing off my bench PSU because I don't have a spare ATX PSU laying around)

IMG_0038

Will report back with iperf3 results once I managed to wire it up. See if both ports work fine.

//EDIT: Forgot to post. Both interfaces appear in ifconfig

root@raspberrypi:/home/doridian# ifconfig -a
[...]

eth1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether ec:0d:9a:f2:ac:00  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether ec:0d:9a:f2:ac:01  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[...]
@Doridian
Copy link
Author

Doridian commented May 21, 2021

Plugging in a SFP+ fiber module gets detected correctly.

root@raspberrypi:/home/doridian# dmesg
[...]
[   44.869757] mlx4_en: eth1: Link Up
[   44.869999] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready

root@raspberrypi:/home/doridian# ethtool -m eth1
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : FINISAR CORP.
        Vendor OUI                                : 00:90:65
        Vendor PN                                 : FTLX8571D3BCL
[...]

@Doridian
Copy link
Author

Doridian commented May 21, 2021

And we have a kernel panic shortly after plugging in the fiber (from the calltrace, this was likely triggered by me asking for the SFP+ module details):
ApplicationFrameHost_tT9V5CBCL2

After a reboot, we don't have a panic (yet), but we cannot seem to send any data, dmesg shows

[   28.893864] ------------[ cut here ]------------
[   28.893903] NETDEV WATCHDOG: eth1 (mlx4_core): transmit queue 0 timed out
[   28.894011] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x398/0x3a0
[   28.894019] Modules linked in: bnep hci_uart btbcm bluetooth ecdh_generic ecc mlx4_en 8021q garp stp llc brcmfmac vc4 brcmutil cec v3d drm_kms_helper cfg80211 gpu_sched rfkill drm raspberrypi_hwmon drm_panel_orientation_quirks mlx4_core snd_soc_core dwc2 roles bcm2835_v4l2(C) snd_compress bcm2835_codec(C) snd_pcm_dmaengine v4l2_mem2mem bcm2835_isp(C) videobuf2_vmalloc snd_bcm2835(C) syscopyarea bcm2835_mmal_vchiq(C) snd_pcm videobuf2_dma_contig sysfillrect videobuf2_memops i2c_brcmstb videobuf2_v4l2 videobuf2_common sysimgblt snd_timer fb_sys_fops backlight snd videodev rpivid_mem mc vc_sm_cma(C) uio_pdrv_genirq nvmem_rmem uio aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[   28.894385] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G         C        5.10.38-v8+ #3
[   28.894392] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[   28.894404] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   28.894413] pc : dev_watchdog+0x398/0x3a0
[   28.894422] lr : dev_watchdog+0x398/0x3a0
[   28.894429] sp : ffffffc0115e3cf0
[   28.894436] x29: ffffffc0115e3cf0 x28: ffffff804cb13f40
[   28.894454] x27: 0000000000000004 x26: 0000000000000140
[   28.894477] x25: 00000000ffffffff x24: 0000000000000003
[   28.894492] x23: ffffff804cae03dc x22: ffffff804cae0000
[   28.894508] x21: ffffff804cae0480 x20: ffffffc0112a6000
[   28.894523] x19: 0000000000000000 x18: 0000000000000000
[   28.894538] x17: 0000000000000000 x16: 0000000000000000
[   28.894553] x15: ffffff8040238560 x14: ffffffffffffffff
[   28.894568] x13: ffffffc01148ec10 x12: ffffffc01148e864
[   28.894583] x11: ffffffc011335438 x10: ffffffc01131d3f8
[   28.894598] x9 : ffffffc0100e6490 x8 : 0000000000017fe8
[   28.894613] x7 : c0000000ffffefff x6 : 0000000000000003
[   28.894628] x5 : 0000000000000000 x4 : 0000000000000000
[   28.894643] x3 : ffffffc0112aa6a8 x2 : 0000000000000103
[   28.894660] x1 : 0000000000000000 x0 : 0000000000000000
[   28.894683] Call trace:
[   28.894693]  dev_watchdog+0x398/0x3a0
[   28.894712]  call_timer_fn+0x38/0x200
[   28.894722]  run_timer_softirq+0x470/0x5e8
[   28.894732]  __do_softirq+0x1b4/0x514
[   28.894743]  irq_exit+0xec/0x100
[   28.894754]  __handle_domain_irq+0xa0/0x110
[   28.894762]  gic_handle_irq+0xb0/0xf0
[   28.894770]  el1_irq+0xcc/0x180
[   28.894785]  arch_cpu_idle+0x18/0x28
[   28.894795]  default_idle_call+0x3c/0x1d0
[   28.894807]  do_idle+0x248/0x260
[   28.894817]  cpu_startup_entry+0x30/0x70
[   28.894830]  secondary_start_kernel+0x15c/0x188
[   28.894838] ---[ end trace ecca4effcd0c044e ]---
[   28.894869] mlx4_en: eth1: TX timeout on queue: 0, QP: 0x208, CQ: 0x84, Cons: 0xffffffff, Prod: 0x1
[   28.946306] mlx4_en: eth1: Steering Mode 1
[   28.958817] mlx4_en: eth1: Link Down
[   28.996311] mlx4_en: eth1: Link Up

@Doridian
Copy link
Author

Doridian commented May 23, 2021

Another note: In tcpdump I can see receiving packets works absolutely fine. (All the broadcasts on my LAN etc arrive).
So this most be something to do with the TX rings maybe? Or how those cards transmit in general?
Sadly I don't know much, right now trying to get mellanox's official OFED driver to run, we'll see where that goes, if anywhere...

//EDIT: Seems that the driver is really picky about kernel versions, so getting it to run with the RPi's kernel is...hard

@geerlingguy
Copy link
Owner

I also just received two same-generation single-port cards and will be testing them as well: #143

@Doridian
Copy link
Author

I also just received two same-generation single-port cards and will be testing them as well: #143

Let me know if you hit the same issue (probably) with TX not working. And if you have any pointers as for what to look for in the kernel module, if there's any specific functions that are known to be problematic. The mlx4_en and mlx4_core modules are pretty huge. Is there maybe (or should there be?) a Wiki page of "this is what the RPi's PCIe implementation struggles with"?

@geerlingguy
Copy link
Owner

@Doridian - I hit the same issue :P

@geerlingguy
Copy link
Owner

I closed out the single-port ConnectX-3 testing for now... I don't see myself spending more time trying to fix it up or patch the drivers, mostly because at least to Nvidia, it seems like they actively don't support older products and it'd be an uphill battle where there are already other happily-working alternatives like the Intel X520.

I'll leave this open if you'd like, but if you can't make any further progress, we could add the card to the site as 'doesn't work' and close it until someone convinces us otherwise :)

@Doridian
Copy link
Author

Doridian commented Jul 2, 2021

I closed out the single-port ConnectX-3 testing for now... I don't see myself spending more time trying to fix it up or patch the drivers, mostly because at least to Nvidia, it seems like they actively don't support older products and it'd be an uphill battle where there are already other happily-working alternatives like the Intel X520.

I'll leave this open if you'd like, but if you can't make any further progress, we could add the card to the site as 'doesn't work' and close it until someone convinces us otherwise :)

Feel free to close it out. I have no clue what I'm doing here at all in terms of kernel driver changes, so I doubt I can make it work fully at all.

@geerlingguy
Copy link
Owner

Going to close this now as I have added the relevant info to that new page.

melanj pushed a commit to melanj/raspberry-pi-pcie-devices that referenced this issue Nov 10, 2021
melanj pushed a commit to melanj/raspberry-pi-pcie-devices that referenced this issue Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants