Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT extensions for domain specific features #54

Merged
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 194 additions & 11 deletions telemetry/specs/INT.mdk
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Title : In-band Network Telemetry (INT) Dataplane Specification
Title Note : Working draft. Note: consider using tagged versions for implementation.
Title Footer: 2018-05-08
Author : The P4.org Applications Working Group. Contributions from
Affiliation : *Alibaba, Arista, Barefoot Networks, Dell, Intel, Marvell, Netronome, VMware*
Affiliation : *Alibaba, Arista, Barefoot Networks, Cisco Systems Inc., Dell, Intel, Marvell, Netronome, VMware*
Heading depth: 5
Pdf Latex: xelatex
Document Class: [11pt]article
Expand Down Expand Up @@ -752,12 +752,12 @@ hop-by-hop INT header must fit in a single Geneve option.
In this section, we define the format for INT hop-by-hop metadata headers,
and the metadata itself.

INT Metadata Header and Metadata Stack:
INT Metadata Header and Metadata Stack (Version = 1):
rsivakolundu marked this conversation as resolved.
Show resolved Hide resolved
`
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver |Rep|C|E|M| Reserved | Hop ML |RemainingHopCnt|
|Ver = 1|Rep|C|E|M| Reserved | Hop ML |RemainingHopCnt|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Instruction Bitmap | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Expand Down Expand Up @@ -821,7 +821,7 @@ The original packet must have C bit set to 0.
switch(es) set the M bit based on knowledge of the network topology
and "Switch ID, Ingress port ID, Egress port ID" tuples in the INT
metadata stack.
- R: Reserved bits.
- R (10b): Reserved bits.
- Hop ML (5b): Per-hop Metadata Length, the length of metadata in 4-Byte words
to be inserted at each INT hop.
- While the largest value of Per-hop Metadata Length is 31, an INT-capable
Expand Down Expand Up @@ -854,11 +854,14 @@ each bit corresponds to a specific standard metadata as specified in Section 3.
- bit5: Egress timestamp
- bit6: Level 2 Ingress Port ID + Egress Port ID (4 bytes each)
- bit7: Egress port Tx utilization
- bit14: Domain Specific Metadata
- bit15: Checksum Complement
- The remaining bits are reserved.
Each instruction requests 4 bytes of metadata to be inserted at each hop,
except if bit 6 is set, which requires 8 bytes of metadata. Per-hop
metadata length is set accordingly at the INT source.
except if bit 6 and bit 14 is set. If bit 6 is set, the instruction requires
8 bytes of metadata. If bit 14 is set, the instruction requires a domain specific
metadata of n bytes, n being a multiple of 4 bytes. Per-hop metadata length is
set accordingly at the INT source.
* Each INT Transit device along the path that supports INT adds its own metadata
values as specified in the instruction bitmap immediately after the INT metadata
header.
Expand Down Expand Up @@ -910,6 +913,184 @@ from (shim header length \* 4).
For INT over Geneve it is 8 bytes subtracted from (length in Geneve tunnel
option header \* 4).


INT Metadata Header and Metadata Stack (Version = 2):
`
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2|Rep|C|E|M| Reserved | Hop ML |RemainingHopCnt|
mickeyspiegel marked this conversation as resolved.
Show resolved Hide resolved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Instruction Bitmap | Domain Specific ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DS Flags | DS Field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| INT Metadata Stack (Each hop inserts Hop ML * 4B of metadata) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Last INT metadata |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`

* INT metadata header is 8 bytes long followed by a stack of INT metadata.
Each metadata is either 4 bytes or 8 bytes in length. Each INT hop adds
the same length of metadata. The total length of the metadata stack is
variable as different packets may traverse different paths and hence
different number of INT hops.

* The fields in the INT metadata header are interpreted the following way:
- Ver (4b): INT metadata header version. Should be 2 for this version.
- Rep (2b): Replication requested. Support for this request is optional. If
this value is non-zero, the device may replicate the INT packet. This is useful
to explore all the valid physical forwarding paths when multi-path forwarding
techniques (e.g., ECMP, LAG) are used in the network. Note the Rep bits should
be used judiciously (e.g., only for probe packets, not for every data packet).
While we recommend that Rep bits be set only for probe packets, the INT
architecture does not (and perhaps cannot) disallow use of the Rep bits for real
data packets.
- 0: No replication requested.
- 1: Port-level (L2-level) replication requested. If the INT packet is
forwarded through a logical port that is a port-channel (LAG), then replicate
the packet on each physical port in the port-channel and send a single copy per
physical port.
- 2: Next-hop-level (L3-level) replication requested. Forward the packet
to each L3 ECMP next-hop valid for the destination address, with INT headers
replicated in each forwarded copy.
- 3: Port-level and Next-hop-level replication requested.
- C (1b): Copy.
- If replication is requested for data packets, the INT Sink must be
able to distinguish the original packet from replicas so that it can forward
only original packets up the protocol stack, and drop all the replicas. The C
bit must be set to 1 on each copy, whenever an INT hop replicates a packet.
The original packet must have C bit set to 0.
- C bit must be set to 0 in the original packet by INT source
- E (1b): Max Hop Count exceeded.
- This flag must be set if a device cannot prepend its own metadata due to
the Remaining Hop Count reaching zero.
- E bit must be set to 0 by INT source
- M (1b): MTU exceeded
- This flag must be set if a device cannot add all of the requested metadata
because doing so will cause the packet length to exceed egress link MTU.
In this case, the device must not add any metadata to the packet, and set
the M bit in the INT header. Note that it is possible for egress MTU
limitation to prevent INT metadata insertion at multiple hops along a
path. The M bit simply serves as an indication that INT metadata was not
inserted at one or more hops and corrective action such as reconfiguring
MTU at some links may be needed, particularly when INT switches are not
participating in path MTU discovery. The M bit is not aimed at readily
identifying which switch(es) did not insert INT metadata due to egress MTU
limitation. In theory, if this does not occur at consecutive hops,
it may be possible for the monitoring system to derive which
switch(es) set the M bit based on knowledge of the network topology
and "Switch ID, Ingress port ID, Egress port ID" tuples in the INT
metadata stack.
- R (10b): Reserved bits.
- Hop ML (5b): Per-hop Metadata Length, the length of metadata, including the
Domain Specific Metadata in 4-Byte words to be inserted at each INT hop.
- While the largest value of Per-hop Baseline Metadata Length is 31, the largest
value of Per-hop Domain Specific Metadata is variable and not specified.
rsivakolundu marked this conversation as resolved.
Show resolved Hide resolved
An INT-capable device may be limited in the maximum number of instructions
it can process and/or maximum length of metadata it can insert in data packets.
An INT hop that cannot process all instructions must still insert Per-hop
Metadata Length \* 4 bytes, with all-ones reserved value (4 or 8 bytes
of 0xFF depending on the length of metadata) for the metadata
corresponding to instructions it cannot process. An INT hop that
cannot insert Per-hop Metadata Length \* 4 bytes must skip INT
processing altogether and not insert any metadata in the packet.
- Remaining Hop Count (8b): The remaining number of hops that are allowed to
add their metadata to the packet.
- Upon creation of an INT metadata header, the INT Source must set this
value to the maximum number of hops that are allowed to add metadata
instance(s) to the packet. Each INT-capable device on the path, including
the INT Source as well as INT Transit Hops, must decrement the
Remaining Hop Count if and when it pushes its local metadata onto the
stack.
- When a packet is received with the Remaining Hop Count equal to 0, the
device must ignore the INT instruction, pushing no new metadata onto
the stack, and the device must set the E bit.
* INT instructions are encoded as a bitmap in the 16-bit INT Instruction field:
each bit corresponds to a specific standard metadata as specified in Section 3.
- bit0 (MSB): Switch ID
- bit1: Level 1 Ingress Port ID (16 bits) + Egress Port ID (16 bits)
- bit2: Hop latency
- bit3: Queue ID (8 bits) + Queue occupancy (24 bits)
- bit4: Ingress timestamp
- bit5: Egress timestamp
- bit6: Level 2 Ingress Port ID + Egress Port ID (4 bytes each)
- bit7: Egress port Tx utilization
- bit14: Domain Specfic Instruction
rsivakolundu marked this conversation as resolved.
Show resolved Hide resolved
- bit15: Checksum Complement
- The remaining bits are reserved.
Bits 0 - 13 are Baseline INT Instructions and Bit 14, Domain Specific Instruction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bit 14 is Domain ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, let's correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also bit 14 should be added to the list of bits above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Bit 14 and added text to address Domain Specific processing

Domain Specific Instruction is an instruction that requires additional
processing of Domain Specific ID and Domain Specific Flags. If the
Domain Specific ID doesn't match the Domain ID of the node, then the transit
node is required to pad the node's INT Metadata stack with the special all-ones
reserved value for Domain Specific Metadata length as calculated by subtracting
the Baseline Metadata Length from Hop_ML.
If the Domain Specific ID matches the node's Domain ID then additional processing
mickeyspiegel marked this conversation as resolved.
Show resolved Hide resolved
of the Domain Specific Flags and Domain Specific Field is required and
Domain Specific Metadata is appended to the Baseline Metadata before
Checksum Complement is inserted.
Each instruction requests 4 bytes of metadata to be inserted at each hop,
except if bit 6 and bit 14 is set. If bit 6 is set, the instruction requires
mickeyspiegel marked this conversation as resolved.
Show resolved Hide resolved
8 bytes of metadata. If bit 14 is set, the instruction requires a domain specific
metadata of n bytes, n being a multiple of 4 bytes. Per-hop metadata length is
set accordingly at the INT source.
* Each INT Transit device along the path that supports INT adds its own metadata
values as specified in the instruction bitmap immediately after the INT metadata
header.
- When adding a new metadata, each device must prepend its metadata in
front of the metadata that are already added by the upstream devices.
This is similar to the push operation on a stack. Hence, the most recently
added metadata appears at the top of the stack. The device must add
metadata in the order of bits set in the instruction bitmap.
- If a device is unable to provide a metadata value specified in the
instruction bitmap because its value is not available, it must add a special
all-ones reserved value indicating "invalid" (4 or 8 bytes of 0xFF
depending on metadata length).
- If a device cannot add all the metadata required by the instruction bitmap
(irrespective of the availability of the metadata values that are asked
for), it must skip processing that particular INT packet entirely. This
ensures that each INT Transit device adds either zero bytes or
Per-hop Metadata Length\*4 bytes to the packet.
- Reserved bits in the instruction bitmap are to be handled similarly. If an
INT transit hop receives a reserved bit set in the instruction bitmap (e.g.
set by a INT source that is running a newer version), the transit hop must
either add corresponding metadata filled with the reserved value 0xFFFFFFFF
or must not add any INT metadata to the packet. This means that an
instruction bit marked reserved in this specification may be
used for a 4B metadata in a subsequent minor version while still being
backward compatible with this specification. However, an instruction bit
marked reserved in this specification may be used for a 8B metadata only
in the next major version, breaking backward compatibility and requiring all
INT switches to be upgraded to the new major version. For example
a version 1.0 INT switch cannot operate alongside version 2.0 INT switches
if a new 8B metadata is introduced in version 2.0, as the version 1.0
INT switch could insert 0xFFFFFFFF reserved value for a 8B metadata field,
thus breaking the metadata stack length invariance - the length of
metadata stack will not be a multiple of Per-Hop Metadata length \* 4
in this case.
- If an INT transit hop does not add metadata to a packet due to any of the
above reasons, it must not decrement the remaining INT hop count in the INT
metadata header.
* Summary of the field usage
- The INT Source must set the following fields:
- Ver, Rep, C, M, Per-hop Metadata Length, Remaining Hop Count,
and Instruction Bitmap.
- INT Source must set all reserved bits to zero.
- Intermediate devices can set the following fields:
- C, E, M, Remaining Hop Count
* The length (in bytes) of the INT metadata stack must always
be a multiple of (Per-hop Metadata Length \* 4). This length can be determined
by subtracting the total INT fixed header sizes (12 bytes)
from (shim header length \* 4).
For INT over Geneve it is 8 bytes subtracted from (length in Geneve tunnel
option header \* 4).



# Examples

This section shows example INT Headers with two hosts (Host1 and Host2),
Expand Down Expand Up @@ -937,6 +1118,7 @@ port ID)
- C = 0
- E = 0 (Max Hop Count not exceeded)
- M = 0 (MTU not exceeded at any switch)
- S = 0
rsivakolundu marked this conversation as resolved.
Show resolved Hide resolved
- Per-hop Metadata Length = 2 (for switch id & queue occupancy)
- Remaining hop count starts at 8, decremented by 1 at each hop
that inserts INT metadata
Expand Down Expand Up @@ -1003,7 +1185,7 @@ INT Metadata Header and Metadata Stack, followed by TCP payload:
0 1 2 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length above in the shim header does not seem correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver=1 | 0 |0|0|0| Reserved | HopML=2 |RemainingHopC=6|
| Ver=1 | 0 |0|0|0|0| Reserved | HopML=2 |RemainingHopC=6|
rsivakolundu marked this conversation as resolved.
Show resolved Hide resolved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Expand Down Expand Up @@ -1058,7 +1240,7 @@ INT Metadata Header and Metadata Stack:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver=1 | 0 |0|0|0| Reserved | HopML=2 |RemainingHopC=5|
| Ver=1 | 0 |0|0|0|0| Reserved | HopML=2 |RemainingHopC=5|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Expand Down Expand Up @@ -1115,7 +1297,7 @@ INT Metadata Header and Metadata Stack:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver=1 | 0 |0|0|0| Reserved | HopML=2 |RemainingHopC=5|
| Ver=1 | 0 |0|0|0|0| Reserved | HopML=2 |RemainingHopC=5|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Expand Down Expand Up @@ -1172,7 +1354,8 @@ header int_header_t {
bit<1> c;
bit<1> e;
bit<1> m;
bit<7> rsvd1;
bit<1> s;
bit<6> rsvd1;
bit<3> rsvd2;
bit<5> hop_metadata_len;
bit<8> remaining_hop_cnt;
Expand Down