Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on NUMA #5

Open
emmericp opened this issue Jan 20, 2018 · 0 comments
Open

Thoughts on NUMA #5

emmericp opened this issue Jan 20, 2018 · 0 comments

Comments

@emmericp
Copy link
Owner

NUMA is really important for performance. There are two things to consider: thread-pinning and memory-pinning. Thread pinning is trivial and can be done with the usual affinity mask. The best way to pin memory is by linking against libnuma.
A dependency, eeww. But a simple dependency (just a wrapper for a few syscalls) that I'd see on a level with libpthread; a necessary evil.

Let's look at a forwarding application on a NUMA system with NICs connected to both CPUs.
It will typically have at least one thread per NIC that handles incoming packets and forwards them somewhere. It might need to cross a NUMA-boundary to do so.
In our experience, it's most efficient to pin both the thread and packet memory to the CPU node to which NIC receiving packets is connected. Sending from the wrong node is not as bad as receiving to the wrong node. Also, we (usually) can't know where to send the packets when receiving them, so we can't pin the memory correctly for that.

How to implement this?

  • read numa_node in NIC's sysfs directory to figure out where it's connected to
  • use libnuma to set a memory policy before allocating memory for it
  • pin the thread correctly

Sounds easy, right?
But is it worth implementing it? What do we gain beside added complexity?
Sure, this is obviously a must-have feature for a real-world high-performance driver.

But we've decided against implementing it for now.
Almost everyone will just look at the code and that NUMA stuff is not particularly interesting compared to the rest and it just adds noise.

That doesn't mean you can't use ixy on a NUMA system.
We obviously want to run some benchmarks and performance tests with different NUMA scenarios and we are just going to use the numactl command for that:

 numactl --strict --membind=0 --cpunodebind=0 ./ixy-pktgen <id> <id>

That works just fine with the current memory allocator and allows us to benchmark all relevant scenarios on a NUMA system with NICs attached to both nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant