-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create dlb blog #1759
Open
daixiang0
wants to merge
3
commits into
linkerd:main
Choose a base branch
from
daixiang0:dlb
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+88
−0
Open
create dlb blog #1759
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
title: Accelerate Linkerd2 with Intel Dynamic Load Balancer | ||
date: 2024-04-09T00:00:00Z | ||
tags: | ||
- performance | ||
- dlb | ||
author: daixiang0 | ||
thumbnail: "/uploads/thumbnail.jpg" | ||
description: 'Introduce solution of combining software and hardware about how Intel Dynamic Load Balancer helps Linkerd2.' | ||
keywords: [performance, dlb] | ||
--- | ||
|
||
[Intel® Dynamic Load Balancer (Intel® DLB)](https://www.intel.com/content/www/us/en/download/686372/intel-dynamic-load-balancer.html) is a hardware-managed system of queues and arbiters connecting producers and consumers. It is a PCI device envisaged to live in the server CPU uncore that can interact with software running on cores and potentially with other devices. | ||
|
||
Intel DLB implements the following load balancing features: | ||
|
||
- Offloads queue management from software: | ||
- Improves multi-producer / multi-consumer scenarios and enqueue batching to multiple destinations. | ||
- Intel DLB implements lockless access to shared queues. This removes the need for overhead locks when accessing shared queues in the software. | ||
- Dynamic, flow aware load balancing and reordering: | ||
- Ensures equal distribution of tasks and better CPU core utilization. Can provide flow-based atomicity if required. | ||
- Distributes high bandwidth flows across many cores without loss of packet order. | ||
- Better determinism and avoids excessive queuing latencies. | ||
- Uses less IO memory footprint and saves DDR Bandwidth. | ||
- Priority queuing (up to 8 levels) —allows for QoS: | ||
- Lower latency for traffic that is latency sensitive. | ||
- Optional delay measurements in the packets. | ||
- Scalability: | ||
- Allows dynamic sizing of applications, with seamless scaling up/down. | ||
- Power aware; application can drop workers to lower power state in cases of lighter loads. | ||
|
||
There are three types of load balancing queues: | ||
|
||
- Unordered: For multiple producers and consumers, where the order of tasks is not important. Each task is assigned to the processor core with the lowest current load. | ||
- Ordered: For multiple producers and consumers, where the order of tasks is important. When multiple tasks are processed by multiple processor cores, they must be rearranged in the original order. | ||
- Atomic: For multiple producers and consumers, where tasks are grouped according to certain rules. These tasks are processed using the same set of resources and the order of tasks within the same group is important. | ||
|
||
## How Intel DLB accelerates Linkerd2 | ||
|
||
Intel DLB accelerates Linkerd2 by accelerating [Tokio](https://tokio.rs/), which is Linkerd2's async runtime written in Rust. | ||
|
||
Rust currently provides only the essentials for writing async code. Rust has very strict backward compatibility requirements and a specific runtime for Rust standard library has not been chosen. Along comes Tokio, which gets the biggest support from the community and has many sponsors. | ||
|
||
Tokio is generic, reliable, easy to use, and flexible for most--but not all cases because of its scheduler. | ||
|
||
## How Tokio implements its scheduler | ||
|
||
Tokio’s scheduler is modeled on a work-stealing scheduler. | ||
|
||
![work-stealing scheduler](/uploads/work-stealing-scheduler.png) | ||
|
||
As shown in the above picture, in a work-stealing scheduler, | ||
|
||
1. each processor spawns tasks, puts them in its own queue, and runs them. | ||
2. If the queue is empty, the processor tries to steal from other threads. | ||
|
||
The scheduling overhead is from synchronization. To reduce cost, CAS (compare and swap) is a common solution, but CAS cannot perfectly **scale with core count**. | ||
|
||
Although scheduling overhead only occurs when it tries to “steal”, it is hard to balance the workload of all processors, which leads to high tail latency in high traffic cases. | ||
|
||
## How Intel Dynamic Load Balancer helps Tokio | ||
|
||
Intel DLB can be a lockless multiple-producer and multiple-consumer queue. In this scenario, we replaced the Tokio scheduler with Intel DLB as below: | ||
![this picture](/uploads/work-balancing-scheduler.png) | ||
|
||
The new work-balancing scheduler shows: | ||
1. Threads spawn tasks. | ||
2. Threads send tasks to Intel DLB. | ||
3. Threads are notified by Intel DLB to get tasks. Then, it puts the tasks into its own queue and runs them. | ||
|
||
In this way, the workload of all threads can be balanced by Intel DLB and perfectly scaled with core count. | ||
|
||
## How to deploy the benchmark | ||
|
||
The best case for Intel DLB-enabled Tokio is high traffic, like ingress. Since Linkerd2 should work with existing ingress solutions such as Nginx Ingress, we deploy the benchmark environment as below: | ||
![dlb-benchmark-env](/uploads/dlb-benchmark-env.png) | ||
|
||
In our lab, we compared the baseline of pure Linkerd2-Proxy to the target of Linkerd2-Proxy plus Intel DLB. The result shows that the request per second has been greatly improved and the latency has been reduced. | ||
|
||
## Conclusion | ||
|
||
With the help of the DLB hardware accelerator card built into the Intel Sapphire Rapids processor, provides Linkerd2 with a hardware accelerated solution, avoiding CAS scale issue and workload unbalancing issue, and effectively reducing latency. It is suitable for applications such as ingress gateways and other scenarios that need to efficiently handle high traffic. | ||
|
||
The fourth-generation Intel Xeon scalable processor, codenamed Sapphire Rapids, is the successor to Ice Lake. The platform is built on Intel 7 node (formerly 10nm) and features up to 60 Golden Cove cores per processor along with new hardware acceleration cards that deliver significant performance improvements over the previous generation. DLB is one of the new hardware accelerator cards. If you are interested, you can go to the official website to view [more hardware accelerator card information](https://link.zhihu.com/?target=https%3A//www.intel.com/content/www/us/en/now/xeon-accelerated/accelerators.html). | ||
|
||
The whole solution is experimental, please contact [me](mailto:loong.dai@intel.com) for any details if interested. | ||
|
||
We firmly believe that with the development of cloud computing and service mesh, solutions combining software and hardware can provide users with higher performance. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @daixiang0. Do you have any data or graphs that can be included to support this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lab data cannot be public, maybe just remove those words?
As discuss in the slack before, I do not have a public env to test it.