Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring port recirculation #33

Closed
kfertakis opened this issue Nov 21, 2022 · 6 comments
Closed

Configuring port recirculation #33

kfertakis opened this issue Nov 21, 2022 · 6 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@kfertakis
Copy link

Hi,

I'm running some SwitchML benchmarks on a BF2556-1T switch and I would like to use some of its QSFP ports in the experiments. However, these ports correspond to the pipe num.3 which are all statically reserved for packet recirculation by the SwitchML P4 code here:

state parse_port_metadata {
// parse port metadata
ig_md.port_metadata = port_metadata_unpack<port_metadata_t>(pkt);
transition select(ig_intr_md.ingress_port) {
64: parse_recirculate; // pipe 0 CPU Eth port
68: parse_recirculate; // pipe 0 recirc port
320: parse_ethernet; // pipe 2 CPU PCIe port
0x080 &&& 0x180: parse_recirculate; // all pipe 1 ports
0x100 &&& 0x180: parse_recirculate; // all pipe 2 ports
0x180 &&& 0x180: parse_recirculate; // all pipe 3 ports
default: parse_ethernet;
}
}

Could you let me know what changes I would need to make in order to make use of these ports? Could I perhaps substitute the target ports with some other free ports from pipe 0 for performing packet recirculation? Is there a specific number of ports that need to be in recirculation mode in order for SwitchML to operate normally?

Thank you in advance,

@AmedeoSapio
Copy link
Member

Hi,
with 256B payloads (64 elements per packet) packets from the workers are processed or "consumed" without recirculation. Only the packet from the last worker is recirculated once to collect or "harvest" the results before sending the packet back to the workers. In this case we only recirculate one packet every N (=number of workers) and so we use only one port for recirculation (port 68, which is by default in loopback mode).

See the PacketSize.MTU_256 entries of the next_step_selector, particularly this one here:

( PacketSize.MTU_256, None, PacketType.CONSUME0, Flag.FIRST, None, 3, 'recirculate_for_HARVEST7', port[7]),

The 2 passes through tofino are coded as:
CONSUME0 --- if it is the last packet ---> HARVEST7

In the folded-pipe case, to process 1024B=256 elements per packet, we recirculate every packet 3 times (we need 4 passes through tofino to consume the data), and only the packet from the last worker is recirculated 7 additional times to harvest the results (Tofino1 ALUs have asymmetric read and write bandwidth). See the PacketSize.MTU_1024 entries of the next_step_selector.

The 11 passes through tofino are coded as:
CONSUME0 -- CONSUME1 -- CONSUME2 -- CONSUME3 --- if it is the last packet ---> HARVEST1 -- HARVEST2 -- HARVEST3 -- HARVEST4 -- HARVEST5 -- HARVEST6 -- HARVEST7

Each CONSUME pass writes 256B in Tofino's registers, and each HARVEST pass reads 128B from the registers. We skip HARVEST0 because we are able to merge the CONSUME3 pass with the collection of the first chunk of results.

For the additional 3 CONSUME passes, we use all the front panel ports of pipes 1,2,3. Since we recirculate all packets, we need as much bandwidth for a recirculation pass as the incoming traffic. For the HARVEST passes (done only for 1/N packets) we use ports 64 and 68 in all pipes for recirculation, except port 64 in pipe 2 (port 320) because that is the one that goes to the switch CPU:

port = {1: 452, 2: 324, 3: 448, 4: 196, 5: 192, 6: 64, 7: 68}

You can change the ports used for recirculation by changing:

  1. the parser, as you mentioned
  2. the port dictionary in next_step_selector
  3. the ports that are put in loopback mode at the start of the controller

loopback_ports = (
[64] + # Pipe 0 CPU ethernet port
# Pipe 0: all 16 front-panel ports
#list(range( 0, 0+64,4)) +
# Pipe 1: all 16 front-panel ports
list(range(128, 128 + 64, 4)) +
# Pipe 2: all 16 front-panel ports
list(range(256, 256 + 64, 4)) +
# Pipe 3: all 16 front-panel ports
list(range(384, 384 + 64, 4)))

One important detail to keep in mind is that registers in tofino are pipe-local, for this reason:

  • CONSUME0 must happen is the same pipe as HARVEST6 and HARVEST7
  • CONSUME1 must happen is the same pipe as HARVEST4 and HARVEST5
  • CONSUME2 must happen is the same pipe as HARVEST2 and HARVEST3
  • CONSUME3 must happen is the same pipe as HARVEST1

because the harvest passes are reading the data written during the consume pass.
Let me know if you have additional questions.

@AmedeoSapio AmedeoSapio self-assigned this Nov 22, 2022
@AmedeoSapio AmedeoSapio added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 22, 2022
@kfertakis
Copy link
Author

Thank you very much for the thorough walkthrough.

So just to clarify something, for the 2-pipe switches (such as the BF2556-1T I'm using) for which the folded-pipe case is not enabled, thus 256B payloads (64 elements per packet), in order to use the additional front panel ports that correspond to the second pipe, all that is needed is to change the parser to whitelist these ports, correct? since the only port that is used for recirculation as you described is port 68 which is already in loopback by default. Thus, no need to change the dictionary or the ports that are put in loopback mode manually for the folded pipe case. Is this correct or did I miss something?

@AmedeoSapio
Copy link
Member

AmedeoSapio commented Nov 24, 2022 via email

@kfertakis
Copy link
Author

Could there be any additional place where the ports for pipe 1,2,3 are disabled somehow? I have commented out the recirculation parsing logic as shown below:

state parse_port_metadata {
        // parse port metadata
        ig_md.port_metadata = port_metadata_unpack<port_metadata_t>(pkt);

        transition select(ig_intr_md.ingress_port) {
            64: parse_recirculate; // pipe 0 CPU Eth port
            68: parse_recirculate; // pipe 0 recirc port
            320: parse_ethernet;   // pipe 2 CPU PCIe port
            // 0x080 &&& 0x180: parse_recirculate; // all pipe 1 ports
            // 0x100 &&& 0x180: parse_recirculate; // all pipe 2 ports
            // 0x180 &&& 0x180: parse_recirculate; // all pipe 3 ports
            default:  parse_ethernet;
        }
    }

but I get a strange behavior when I include a worker from these ports in the execution:

image

packets don't seem to be sent back to the workers, rather they seem to be dropped. Also, workers report locally zero received packets. Any ideas why could that be? Thank you again.

@kfertakis
Copy link
Author

Could it be perhaps that it is related to what you described @AmedeoSapio that registers in tofino are pipe-local and thus we can't have workers of the same job spanning different pipes? As it would lead to CONSUM0 happening on a different pipe than that of HARVEST? Thanks.

@kfertakis
Copy link
Author

So I think the issue is indeed related to having workers on different pipes concurrently on the same job. I've tested this by only including workers on pipe number 3 (previously reserved in the parser for recirculation) and the execution completed successfully. I'll open a separate issue for this for better readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants