Skip to content
Nicola Bonelli edited this page Jul 14, 2017 · 10 revisions

Description

In this page we report the performance of PFQ running on top of different Intel processor architectures, Xeon and i7.

Processors

  • Intel(R) Xeon(R), CPU X5650, 6 cores @2.66Ghz, 16GB RM, NIC Intel 10G 82599.

  • Intel(R) Core(TM), CPU i7-2600, 4 cores @3.40Ghz, 8GB RM, NIC Intel 10G 82599.

  • Intel(R) Xeon(R) CPU E5-1660 v3 @ 3.00GHz

Software configuration

  • To properly enable DCA, load ioatdma kernel module first.

  • The device driver used is the Intel ixgbe-3.23.2.1, compiled through [pfq-omatic] (https://github.com/pfq/PFQ/wiki/PFqOmatic) script.

  • The PFQ kernel module is configured and loaded with pfq-load, using the following config file:

# ~/.pfq.conf 

Config
{
    pfq_module   = "/opt/PFQ/kernel/pfq.ko",

    pfq_options  = [ "capture_incoming=0", "capt_batch_len=32", "xmit_batch_len=128", "skb_pool_size=256" ],

    exclude_core = [],

    irq_affinity = "round-robin",

    drivers =
    [
        Driver
        {
            drvmod  = "/opt/ixgbe/src/ixgbe.ko",
            drvopt  = [ "LRO=0,0", "DCA=1,1", "AtrSampleRate=0,0" ],

            devices =
            [
               Device
               {
                   devname  = "eth2",
                   devspeed = Just 10000,
                   flowctrl = No,
                   ethopt   = [("-G", "tx", 768),
                               ("-C", "tx-frames-irq", 1024),
                               ("-C", "rx-usecs", 50)]
               }
            ]
        }
    ]
}

Test Single Thread

  • traffic generated: 60 bytes long UDP packets with random IP addresses, at 14.8Mpps

Capture 10G

  • Xeon processor: pfq-counters -c 64 -t 0.5.eth2

  • i7 processor: pfq-counters -c 64 -t 0.3.eth2

    RSS | setup | Xeon-E51660 | Xeon X5650 | i7-2600
    ---------|----------------|----------------|--------------|----------- 1 | pfq-load -q1 | 8.3 Mpp | 5.72 Mpps | 6.4 Mpps 2 | pfq-load -q2 | ~14.8 Mpps | 10.6 Mpps | 12.1 Mpps 3 | pfq-load -q3 | ~14.8 Mpps | ~14.8 Mpps | 14.6 Mpps 4 | pfq-load -q4 | ~14.8 Mpps | ~14.8 Mpps | 14.4 Mpps

Capture 20G

  • Xeon processor: pfq-counters -c 64 -t 0.7.eth2:eth4

Note: this test is performed with a single thread capturing traffic from two different boards. IRQ affinities of the two NICs are not overlapped.

RSS setup Xeon-E51660
1 pfq-load -q1 15.48 Mpps
2 pfq-load -q2 ~26.97 Mpps
3 pfq-load -q3 ~29.13 Mpps
4 pfq-load -q4 ~29.13 Mpps

Traffic generation

  • Xeon processor: pfq-gen -l 60 -R -t 0.5.eth2 -k 1,2...

  • i7 processor: pfq-gen -l 60 -R -t 0.3.eth2 -k 1,2...

    TSS | setup | Tx (Xeon) | TX Speed (i7) ---------|----------------|----------------|------------ 1 | -k 1 | 7.5 Mpps | 9.5 Mpps 2 | -k 1,2 | 14.8 Mpps | 13.8 Mpps 3 | -k 1,2,3 | 14.8 Mpps | 13.0 Mpps 4 | -k 1,2,3,4 | 14.8 Mpps | -

Test Multiple Threads

The traffic is balanced across 2 (or more) user-space threads with the PFQ/lang function steer_flow. Additional steering functions are described in [PFQ/lang wiki] (http://www.pfq.io/v6.x/lang/haskell/Network-PFQ-Lang-Default.html).

  • traffic generated: 60 bytes long UDP packets with random IP addresses, at 14.8Mpps
  • capture tool: user/tool/pfq-counters

Command Line

  • Xeon processor: pfq-counters -f steer_flow -c 64 -t 0.5.eth2 -t 0.4

  • i7 processor: pfq-counters -f steer_flow -c 64 -t 0.3.eth2 -t 0.2

    RSS | setup | Xeon-E51660 | Xeon X5650 | i7-2600
    ---------|----------------|----------------|--------------|------------ 1 | pfq-load -q1 | 7.05 Mpps | 5 Mpps | 5.44 Mpps
    2 | pfq-load -q2 | 14.07 Mpps | 9.7 Mpps | 10.33 Mpps 3 | pfq-load -q3 | ~14.8 Mpps | 14.1 Mpps | 14.6 Mpps 4 | pfq-load -q4 | 14.8 Mpps | 14.6 Mpps | 14.5 Mpps