Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for Mellanox ConnectX-4 support #706

Open
19 of 35 tasks
lukego opened this issue Jan 12, 2016 · 8 comments
Open
19 of 35 tasks

Plan for Mellanox ConnectX-4 support #706

lukego opened this issue Jan 12, 2016 · 8 comments

Comments

@lukego
Copy link
Member

lukego commented Jan 12, 2016

This issue spells out a plan for adding support for Mellanox ConnectX-4 NICs to Snabb Switch.

The idea is to develop "native" support for Mellanox cards. This requires understanding the hardware in depth and writing our own drivers. The necessary information is now public (as of June 2016): Mellanox Adapter Programmer's Reference Manual.

Logistics:

  • Get confirmation that Mellanox will release the Programmer's Reference Manual (PRM) without NDA.
  • Order hardware. (Mellanox are sending a dozen NICs to the lab: mixed 10G/40G/100G.)
  • Confirm public release date for the PRM.
  • Install hardware in the Snabb Lab.

Driver:

  • Implement Command Queue for talking to the firmware.
  • Initialize NIC
    • ENABLE_HCA
    • QUERY_ISSI
    • SET_ISSI
    • QUERY_PAGES (boot)
    • MANAGE_PAGES
    • QUERY_HCA_CAP
    • SET_HCA_CAP
    • QUERY_PAGES (init)
    • MANAGE_PAGES
    • INIT_HCA
    • SET_DRIVER_VERSION (Skipped; see comment)
    • CREATE_EQ
    • QUERY_VPORT_STATE
    • MODIFY_VPORT_CONTEXT
  • Transmit
    • Single-queue
      • Transport Interface Send
      • Send Queue
      • Completion Queue
    • Multi-queue
    • VLAN insert
  • Receive
    • Single-queue
    • Hashed
    • Switched
    • VLAN remove

Integration:

  • Integrate Mellanox support with packetblaster.
  • Integrate Mellanox support with Snabb NFV.
  • Integrate Mellanox support with software dispatching.

Related work:

Risks and unknowns:

  • Public PRM may be released too late or be redacted too heavily.
  • Hardware may have some unanticipated property that disqualifies it for some reason.

Those risks should be taken care of when the PRM is released and we have a card to experiment with.

@ghost
Copy link

ghost commented Jan 12, 2016

Regarding packetblaster is there a possibility that it is driver agnostic?
What if I wanted to use it with a tap device?

@lukego
Copy link
Member Author

lukego commented Jan 12, 2016

@nnikolaev-virtualopensystems The normal operation of packetblaster is to create transmit descriptors and then reuse them in a loop. This works on Intel NICs. Hopefully works on Mellanox NICs too. I suspect it would also work on Virtio-net devices by changing the vring used and avail indexes. If it works for Virtio-net vrings then it should also work for a tap device if you use /dev/vhost-net to access that with a vring.

Failing all of that, it would also be possible for packetblaster to have a mode where it doesn't use the "DMA reset" trick and simply transmits packets. That should work for any I/O device but would likely become CPU bound if you use many NICs. (packetblaster does 100G - 10x10G - without breaking a sweat and that is because of the trick.)

@lukego
Copy link
Member Author

lukego commented Jan 22, 2016

Added a link to Mellanox firmware release notes. Interesting reading. Gives some insight into the line between hardware/firmware on the cards i.e. which bugs are fixed and features added by firmware upgrades.

@lukego
Copy link
Member Author

lukego commented Aug 29, 2016

I have pushed a major update of the ConnectX-4 driver in commit 7659eb6.

I have been able to initialize the card and transmit and receive packets now. I am reformulating the code more cleanly now. The initialization side of this is pushed and next I am doing clean transmit and receive. I also need to provide a suitable API for multiprocess operation and for setting up the Flow Tables and hashing in useful ways.

I squashed the history on that branch before I pushed to github. I will be fetching some draft code from the more complete history at lukego/mellanox-refactor branch in the short term. Part of the reason to squash is that I had included a relatively large log file from the Linux mlx5 driver in the repo and I'd like to avoid bloating the snabbco/snabb repo with that.

I also started filing issues for things that I am following up with Mellanox support. See issues with tag 'mellanox'.

@lukego
Copy link
Member Author

lukego commented Oct 28, 2016

I pushed a big update to the Mellanox driver with commit 21d0dc3. There is still work to do but most things are in place now.

The driver is now designed for multiprocess operation for use with #1021. The design is to have one ConnectX4 app for each NIC that performs initialization of all the queues, then to have any number of IO apps that attach to queue-pairs and can run in other Snabb processes.

Example:

-- App to setup the NIC
config.app(c, 'nic', ConnectX4, {pciaddress = '01:00.0', 
                                 queues = {'a', 'b', 'c'}})
-- Apps to perform I/O (can be in other processes)
config.app(c, 'io-a', IO, {pciaddress = '01:00.0', queue = 'a'})
config.app(c, 'io-b', IO, {pciaddress = '01:00.0', queue = 'b'})
config.app(c, 'io-c', IO, {pciaddress = '01:00.0', queue = 'c'})

Currently all queues are setup for hashing (RSS).

There is more to do:

  • Setup a Hydra job to run the selftests.
  • Create a truly torturous selftest with multiqueue, restarts, etc. (Could be part of IO app One IO app to rule them all #1043?)
  • Support L2 switching in addition to hashing (same "Flow Tables" mechanism).
  • Get "self-loopback" under control (currently broadcasts go onto the wire and back to the NIC).
  • Cleanup & document interface (incl. better names for the apps).
  • Expose configurations options for VLAN insert/remove, MTU, etc.

Current basic selftest output with sending/receiving packets between
two NICs. (Here we see the apparent issue with the NIC duplicating
broadcast packets i.e. sending onto the wire and also back onto local
RX.)

selftest: waiting for both links up
Links up. Sending 10,000,000 packets.

NIC0
2,000,000,000 rx_bcast_octets
  20,000,000 rx_bcast_packets
           0 rx_error_octets
           0 rx_error_packets
           0 rx_mcast_octets
           0 rx_mcast_packets
           0 rx_ucast_octets
           0 rx_ucast_packets
1,000,000,000 tx_bcast_octets
  10,000,000 tx_bcast_packets
           0 tx_error_octets
           0 tx_error_packets
           0 tx_mcast_octets
           0 tx_mcast_packets
           0 tx_ucast_octets
           0 tx_ucast_packets

NIC1
2,000,000,000 rx_bcast_octets
  20,000,000 rx_bcast_packets
           0 rx_error_octets
           0 rx_error_packets
           0 rx_mcast_octets
           0 rx_mcast_packets
           0 rx_ucast_octets
           0 rx_ucast_packets
1,000,000,000 tx_bcast_octets
  10,000,000 tx_bcast_packets
           0 tx_error_octets
           0 tx_error_packets
           0 tx_mcast_octets
           0 tx_mcast_packets
           0 tx_ucast_octets
           0 tx_ucast_packets
selftest: complete

@eugeneia
Copy link
Member

Great progress! I will try to meet you half way and get the IO(Control) abstractions in order and think about nefarious selftests.

@tsuraan
Copy link

tsuraan commented Aug 9, 2017

Is this still progressing? It looks like things were close to done, and then the ticket stalled or something. It looks like Snabb is seeing a lot of development; is there just a ton of stuff going on around the I/O 2.0 that is holding up these other tickets?

@lukego
Copy link
Member Author

lukego commented Aug 11, 2017

Hi @tsuraan. Yes, I am actually planning to loop back this month and try to get the driver branch ready for upstream.

There is a quite complete driver here: connectx_4.lua. This basically works but seems to often exercise some bad cases in the firmware (sometimes the NIC gets wedged and requires a cold boot server power cycle to recover.) The more recent firmwares are also lacking some important information from their release notes (definitions of new error codes that are appearing sometimes.) It's a bit of a slow and frustrating process to resolve these issues but I have some new leads that I plan to follow up shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants