Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Title : Telemetry Report Format Specification
Title Note : Version 2.0
Title Footer : 2020-10-08
Author : The P4.org Applications Working Group
Affiliation: Contributions from *CableLabs, Cisco Systems, Intel, VMware, Xilinx*
Heading depth : 5
Pdf Latex: xelatex
Document Class: [11pt]article
Package: [top=1in, bottom=1.25in, left=1in, right=1in]{geometry}
Package: fancyhdr
Tex Header:
\setlength{\headheight}{30pt}
\renewcommand{\footrulewidth}{0.5pt}
@if html {
body.madoko {
font-family: utopia-std, serif;
}
title,titlenote,titlefooter,authors,h1,h2,h3,h4,h5 {
font-family: helvetica, sans-serif;
font-weight: bold;
}
pre, code {
language: p4;
font-family: monospace;
font-size: 10pt;
}
}
@if tex {
body.madoko {
font-family: UtopiaStd-Regular;
}
title,titlenote,titlefooter,authors {
font-family: sans-serif;
font-weight: bold;
}
pre, code {
language: p4;
font-family: LuxiMono;
font-size: 75%;
}
}
Colorizer: p4
.token.keyword {
font-weight: bold;
}
@if html {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
padding:6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if tex {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
breakable: true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if html {
p4pseudo {
replace: "~ Begin P4PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4PseudoBlock";
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #e9fce9;
border-width: 0.5pt;
}
}
@if tex {
p4pseudo {
replace: "~ Begin P4PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4PseudoBlock";
breakable : true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
background-color: #e9fce9;
border: solid;
border-width: 0.5pt;
}
}
@if html {
p4grammar {
replace: "~ Begin P4GrammarBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4GrammarBlock";
border: solid;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border-width: 0.5pt;
}
}
@if tex {
p4grammar {
replace: "~ Begin P4GrammarBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4GrammarBlock";
breakable: true;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border: solid;
border-width: 0.5pt;
}
}
[TITLE]
[]{tex-cmd: "\newpage"}
[]{tex-cmd: "\fancyfoot[L]{&date; &time;}"}
[]{tex-cmd: "\fancyfoot[C]{Telemetry Report Format}"}
[]{tex-cmd: "\fancyfoot[R]{\thepage}"}
[]{tex-cmd: "\pagestyle{fancy}"}
[]{tex-cmd: "\sloppy"}
[TOC]
# Introduction
Traditional network monitoring has relied on statistics and probe packets
such as ICMP echo requests/replies. Recent innovations provide greater
insight into network behavior by generating detailed reports of telemetry
metadata such as paths, queue occupancy, latency experienced by data
packets, and timestamps that can be used to determine hop-by-hop and
end-to-end delay. Generation of telemetry reports can be triggered by
various events in categories such as flow monitoring, queue congestion, and
packet drops. Further information regarding the motivation and usage of
detailed telemetry information can be found in the IETF draft for
In-situ OAM [^IOAM_reqs].
Specifications are being defined for embedding telemetry metadata within
data packets, such as INT [^INT] and IOAM [^IOAM]. This allows for telemetry
metadata to be collected as packets traverse a network. When the packets
reach the edge of the network, the telemetry metadata is removed and
telemetry reports are generated.
This specification defines packet formats for telemetry reports from data plane
network devices (e.g. switches, routers, NICs) to a distributed telemetry
monitoring system. The packet formats use headers that describe the contents of
telemetry reports, along with existing (non-telemetry specific) packet
headers that can be used to categorize flows.
## Scope
The scope of this specification is interoperability between network devices
that generate telemetry reports based on what they see in the data plane,
and the initial preprocessors within distributed telemetry monitoring
systems that receive the telemetry reports. This specification is
applicable when telemetry reports are generated by network devices at the
edges of a network, with source and transit network devices embedding
telemetry metadata in data packets according to specifications such as IOAM
[^IOAM] and INT [^INT], when using INT-MD mode. This specification is also
applicable when each network device directly generates telemetry reports,
including transit network devices in the middle of the network, such as in
INT-XD (where data packet formats between successive network devices are
not affected) and in INT-MX (where only INT instructions are embedded in
data packets).
Telemetry report encapsulation formats are defined that allow for the
inclusion of additional telemetry metadata, beyond the (optional) telemetry
metadata embedded between other packet headers as defined in INT-MD and
IOAM. The embedded telemetry metadata is included as is in telemetry
reports, so the packet formats defined in INT-MD and IOAM also define
some aspects of the telemetry report format. See Section [#embedded] for
further discussion.
This specification does not address any of the following, which are
considered out of scope:
* Configuration of network devices so that they can determine when to
generate telemetry reports, and what information to include in those
reports, such as SAI DTel [^SAI_DTel] and SAI TAM 2.0 [^SAI_TAM].
* Events that trigger generation of telemetry reports.
* Selection of particular destinations within distributed telemetry
monitoring systems, to which telemetry reports will be sent.
* Export format for flow statistics or summarized flow records such as
IPFIX [^IPFIX].
# Key Concepts
## Telemetry Report Definition
We define a *telemetry report* as a message that a network device sends to
the monitoring system. A *telemetry report* typically carries a snapshot of the
original data packet (mostly the inner + outer headers), which triggered
the reporting, together with additional telemetry metadata collected from
the reporting network device, and possibly from its upstream network
devices (in case of an in-band mechanism like INT-MD or IOAM). The report
message is encapsulated by IP+UDP, hence it can be forwarded from the
reporting network device through the data network, and to the destination
monitoring system.
The network devices that generate telemetry reports are referred to as
*nodes* in the rest of this specification. Depending on deployment
scenarios, examples of such *nodes* may include network devices such as
switches, routers, and NICs.
The following sections will cover the details of report generation,
report format and encapsulation.
## Telemetry Report Associations
There are many reasons why users may want telemetry reports to be
generated. This specification currently considers three categories for
telemetry report generation:
* Tracked Flows
: Telemetry reports are generated matching certain flow definitions. A
telemetry specific access control list (called a *watchlist* in this
specification) determines which data packets to monitor by matching
packet header fields and optionally identification of the ingress
interface. The action in the matched entry in the *watchlist* may
specify monitoring of this flow, triggering generation of telemetry
reports based on these packets. (Note that the telemetry specific
watchlist is not performing any access control. It only makes
decisions related to monitoring actions.) The telemetry reports
include information about the path that packets traverse as well as
other telemetry metadata such as hop latency and queue occupancy.
* Dropped Packets
: Telemetry reports are generated for dropped packets matching a
telemetry specific access control list (called a *watchlist* in this
specification), when the action in the matched entry specifies
monitoring of dropped packets. This provides visibility into the
impact of packet drops on user traffic.
* Congested Queues
: Telemetry reports are generated for traffic entering a specific queue
during a period of queue congestion. This provides visibility into the
traffic causing and prolonging queue congestion, for example a few
large elephant flows that overwhelm a queue, as well as the victim
traffic (mice flows) getting hurt by the congestion. This also enables
the detection and “re-play” of a short microburst, caused by a large
number of mice flows arriving at the queue at the same time.
Each telemetry report may be associated with one or more of these
categories. This is indicated in the telemetry report by defining
association bits, one for each category, as will be shown in Section
[#individual-report]. New categories (and corresponding association bits) may
be added to future versions of this specification.
Nodes will need to be configured so that they can determine when
to generate telemetry reports, and what information to include in those
reports. Such configuration is considered to be beyond the scope of this
specification. See SAI DTel [^SAI_DTel] for one API proposal to enable
data plane telemetry capabilities in nodes across all three categories.
## Telemetry Report Events
Telemetry reports are typically triggered by packet processing at a node.
However, even when processed packets match a watchlist for a
telemetry report category, it is not necessary for each inspected packet to
trigger generation of a telemetry report. Nodes may apply filters
to determine when significant events occur that should be reported. This is
called event detection in this specification. For example, a node
may trigger telemetry report generation whenever a packet matching a
tracked application flow is received or transmitted on a different path
than previous packets, or if a significant change in latency is experienced
at one particular hop.
Determination of which packets trigger reports, in other words the specific
conditions and logic to determine the events of interest, is left open for
implementations to differentiate themselves, and is considered to be beyond
the scope of this specification.
## Telemetry Reporting Modes
There are different modes which differ with regard to the locations
from which telemetry reports are generated.
### Per Hop Reports in INT-XD/MX modes
~ Figure { #fig-INT-XD-arch; caption: "INT-XD mode - Telemetry Architecture \
with per hop reports generated by each node"; page-align: here }
![INT-XD-arch]
~
[INT-XD-arch]: images/telemetry_arch_per_hop.jpg { width: 5.1in }
In the *INT-XD (eXport Data)* mode, as defined in the INT specification [^INT],
each node generates its own telemetry reports (Figure [#fig-INT-XD-arch]).
The distributed telemetry monitoring system will receive reports from different
nodes, each describing the telemetry metadata (such as node IDs, interface IDs,
latency) for one hop. Within the per hop telemetry reports, the telemetry
metadata precedes the details of the original packet header. There is no change
to data packets traversing the network.
This mode was known as "Postcard" mode in previous versions of this
Telemetry Report specification.
In the *INT-MX (eMbed instruct(X)ions)* mode, as defined in the INT specification [^INT], the
source node embeds instructions in the INT-MX header. Upon receipt of a packet
with this INT type, each node in the path generates its own telemetry reports,
as shown in Figure [#fig-INT-MX-arch]. The distributed telemetry monitoring
system will receive reports from different nodes, each describing the telemetry
metadata (such as node IDs, interface IDs, latency) for one hop. The only
change to data packets is the source node embedding the INT-MX Header with
instructions. The sink node removes the INT-MX Header from such packets. When
using INT-MX mode, the telemetry metadata precedes the details of the original
packet headers within the telemetry report.
~ Figure { #fig-INT-MX-arch; caption: "INT-MX mode - Telemetry Architecture \
with per hop reports generated by each node" }
![INT-MX-arch]
~
[INT-MX-arch]: images/telemetry_arch_int_mx.jpg { width: 5.35in }
### Stacked Reports in INT-MD mode
In the *INT-MD (eMbed Data)* mode, telemetry metadata is embedded in between the
original headers of data packets as they traverse the network, as shown in
Figure [#fig-INT-MD-arch]. This may be done using any of the telemetry data
plane specifications such as INT or IOAM. When a packet enters the
network, the source node may insert a telemetry instruction header,
thereby instructing downstream nodes to add the desired telemetry
metadata. At each hop, the transit node inserts its telemetry metadata at the
top of the stack. The sink node extracts the telemetry instruction header
before progressing the original packet. Depending on the result of event
detection, the sink node may generate a telemetry report containing
the stacked telemetry metadata from all hops across the network.
~ Figure { #fig-INT-MD-arch; caption: "INT-MD mode - Telemetry Architecture \
with stacked reports generated by sink nodes" }
![INT-MD-arch]
~
[INT-MD-arch]: images/telemetry_arch_figure.jpg { width: 6.8in }
In order to reduce complexity at the sink node, some telemetry reports
may include embedded telemetry metadata intermingled with the details of
original packet headers. This simplifies generation of telemetry reports
due to receipt of data packets with embedded telemetry metadata. The
telemetry data plane specification such as INT or IOAM specifies the format
for this portion of the telemetry metadata. This approach reduces data plane
complexity, allowing for all telemetry report processing and generation to
be done in the data plane itself without any need to punt to the control
plane for further processing.
The sink node has the option to add its local telemetry metadata either
in the telemetry report headers defined in this specification, or in the
embedded telemetry metadata intermingled with the original packet headers.
### Using Different Telemetry Modes for Different Telemetry Categories
Even when stacked reports are generated for the category of tracked flows
using INT-MD mode, it is possible to generate per hop reports for other
categories such as dropped packets and congested queues. The latter
categories are often monitored as per node, per port, or per queue local
events, suggesting that telemetry reports should be generated directly from
the affected node(s).
## Correlation of Telemetry Reports
Telemetry reports for a specific application flow matching a watchlist
may be received from multiple nodes. In case of INT-XD and INT-MX modes,
each hop will generate a separate report. Even when stacked telemetry
metadata is embedded in the data plane according to a specification such as
INT or IOAM, telemetry reports for one flow may still be generated by
multiple nodes in case of path change or in case of dropped packets.
The distributed telemetry monitoring system may want to correlate these
telemetry reports on a per flow basis (see Section [#flow-identification] for
details on flow identification). The telemetry reports include one association
bit for each telemetry report category, providing hints to the distributed
telemetry monitoring system that it can use to assist with telemetry report
correlation. In particular, the distributed telemetry monitoring system may
want to apply certain types of telemetry report correlation only when the
corresponding bits are set.
The mechanisms for correlation are left to each implementation, and are
considered to be beyond the scope of this specification.
## Flow Identification { #flow-identification }
There is no explicit metadata defined for flow identification. The
expectation is that either:
- a truncated packet fragment including the original packet headers will
be included in the telemetry report, allowing the distributed telemetry
monitoring system to categorize and identify flows in any manner that
it desires, or
- domain specific flow identification metadata will be included.
Tunneled packets such as VXLAN packets raise the question whether flow
identification should be based on outer or inner headers. The answer may
vary depending on the goals of tracked flow monitoring, deployment aspects,
operational issues, and the capabilities of the distributed telemetry
monitoring system. Note that it is possible to identify flows based on
inner packet headers even when using an INT encapsulation based on outer
headers such as INT over TCP/UDP.
When using INT-MD and flow identification based on inner headers is
desired, the distributed telemetry monitoring system should parse the
truncated packet fragment all the way down past any embedded telemetry
metadata (if present), even when the Individual Report includes optional
metadata such as drop reason. It may also want to process the embedded
telemetry metadata, for example to recognize the case where a path change
directs traffic to a congested node where packets are being dropped.
## Coalescing A Group of Telemetry Reports In A Single Packet
Starting with Version 2.0, the telemetry report format allows for a group of
individual reports, each corresponding to one data plane packet or one flow,
to be coalesced into the same telemetry report packet. This can help reduce
the packet processing overhead associated with telemetry reports. The only
restrictions are that all of the individual reports in the group must be
generated by the same node (or hardware subsystem within the node), and
they are all addressed to the same destination within the distributed
telemetry monitoring system. Beyond those restrictions, implementations are
free to come up with their own methods for deciding which individual
reports to group together.
Support for coalescing a group of telemetry reports in a single packet is
optional.
# Telemetry Report Format { #report-format }
This section specifies the packet format for telemetry reports.
## Outer Encapsulation
Telemetry reports are defined using a UDP-based encapsulation. Various
outer encapsulations may be used to transport the UDP packets. Typically
this would simply be an Ethernet header, followed by an IPv4 or IPv6
header, followed by the UDP header. This specification does not preclude
the use of different transport encapsulations.
The source IP address identifies the node that generates the
telemetry report.
The Destination IP address identifies a location in the distributed
telemetry monitoring system that will receive the telemetry report.
In case of IPv4, as is the case for any other IP packet, either the Don’t
Fragment (DF) bit must be set, or the IPv4 ID field must be set so that the
value does not repeat within the maximum datagram lifetime for a given
source address/destination address/protocol tuple.
### UDP header (8 octets)
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
The Source Port may optionally be used to carry flow entropy, for example
based on a hash of the inner 5-tuple. Otherwise, it should be user
configurable.
The Destination Port is user configurable. The expectation is that the same
Destination Port value will be used for all telemetry reports in a
particular deployment.
## Telemetry Report Group Header (Ver 2.0) (8 octets)
The Telemetry Report Group Header immediately follows the UDP header
whose destination port identifies the contents as a telemetry report.
This header contains the common fields in a telemetry report that
optionally contains multiple coalesced individual reports, each
corresponding to one data plane packet. There is at most one instance
of the Telemetry Report Group Header in a packet.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver | hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
[]{tex-cmd: "\newpage"}
* Ver (4b): Version
: This specification defines **version 2**.
* hw_id (6b): Hardware ID
: Identifies the hardware subsystem within the node that generated
this report. For example, in a chassis with multiple linecards this could
identify a specific linecard, or a subsystem within a linecard. The hw_id
is unique within the scope of a Node ID.
* Sequence Number (22b): Sequence Number
: Reflects the sequence of reports from a specific combination of
(Node ID, hw_id) to a particular telemetry report destination. This
can be used to detect loss of telemetry reports before they reach their
intended destination.
* Node ID (32b): Node ID
: The unique ID of a node. This is generally administratively assigned.
Node IDs must be unique within a management domain.
## Individual Report Header (Ver 2.0) (4+ octets) { #individual-report }
Each telemetry report packet contains one or more individual reports immediately
following the Telemetry Report Group Header. Each report within the packet
starts with the Individual Report Header. The presence of multiple reports
corresponding to multiple data plane packets, possibly from multiple flows,
can be determined by comparing the *Report Length* in the Individual Report
Header with the length in the UDP header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|RepType| InType| Report Length | MD Length |D|Q|F|I| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
| | |
| Individual Report Main Contents | |
| (varies depending on RepType) | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Report
| | Length
| Individual Report Inner Contents | |
| (Truncated Packet or Additional DS Extension Data | |
| or TLV depending on InType) | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
```
* RepType (4b): Report Type
: Type of the individual report:
* **0**: Inner Only
* **1**: INT
* **2**: IOAM
* **3 – 15**: Reserved
[]{tex-cmd: "\newpage"}
* InType (4b): Inner Type
: Type of data embedded after the *Individual Report Main Contents*:
* 0: None
* 1: TLV
* 2: Domain Specific Extension Data
* 3: Ethernet
* 4: IPv4
* 5: IPv6
* 6-15: Reserved.
* Report Length (8b): Report Length
: Indicates the length of the Individual Report Header in a multiple of
4-byte words, including the *Individual Report Main Contents* and
*Individual Report Inner Contents*, but excluding the length of the first
4-byte word (*RepType, InType, Report Length, MD Length, D, Q, F, I, Rsvd*).
For *RepType* codepoint 1 *INT*, the *Report Length* includes the lengths
of *RepMdBits, Domain Specific ID, DSMdBits, DSMdstatus, Variable Optional
Baseline Metadata,* and *Variable Optional Domain Specific Metadata* (see
Section [#int-report]).
The *Report Length* value 0xFF is a special value that indicates a length
greater than or equal to 0xFF, extending to the end of the UDP payload,
i.e. there are no subsequent individual reports in this telemetry report.
* MD Length (8b): Metadata Length
: Indicates the length of metadata included in this report in a multiple
of 4-byte words. This may help the telemetry monitoring system
determine where the *Individual Report Inner Contents* begins. Note that
this does not include the length of the fixed portion of the *Individual
Report Main Contents*.
For *RepType* codepoint 1 *INT*, this includes the length of the *Variable
Optional Baseline Metadata* and *Variable Optional Domain Specific
Metadata* in 4-byte words (see Section [#int-report]).
* D (1b): Dropped
: Indicates that at least one packet matching a watchlist was dropped.
* Q (1b): Congested Queue Association
: Indicates the presence of congestion on a monitored queue.
* F (1b): Tracked Flow Association
: Indicates that this telemetry report is for a tracked flow, i.e. the
packet matched a watchlist somewhere (in case of INT-MD, INT-MX or IOAM) or
locally (in case of INT-XD). The report might include INT-MD or IOAM
metadata in the truncated packet. Other telemetry reports are
likely to be received for the same tracked flow, from the same node
and (in case of drop reports, INT-MX, INT-XD or path changes) from
other nodes.
* I (1b): Intermediate Report
: Indicates that a transit node sent this intermediate report for INT-MD.
* Rsvd (4b): Reserved
: Should be set to zero upon transmission, and ignored upon reception
* Individual Report Main Contents
: The metadata that comprises this report, along with associated fields
that assist in processing the metadata. The format varies depending on
*RepType*.
When the *RepType* value is *Inner Only*, then the Individual Report Main
Contents is empty. *MD Length* should be set to zero upon transmission,
and ignored upon reception.
The *INT* Individual Report Main Contents format (see Section [#int-report])
was derived with INT 2.0/2.1 in mind, but it may be used with other INT
versions as well. It is possible that other *RepType* codepoints and
corresponding Individual Report Main Contents formats may be defined for
future versions of INT.
The *IOAM* Individual Report Main Contents format will be defined in a
future version of this specification.
* Truncated Packet
: L2/L3/ESP/L4 of the packet for flow details. Presence of this field is
indicated by *InType* codepoint 3, 4, or 5, which identifies the type of
header at the beginning of the truncated packet. The length of the
truncated packet can be determined as *Report Length* - ((fixed length of
*Individual Report Main Contents*) + *MD Length*), unless the
*Report Length* value is 0xFF.
* Additional DS Extension Data
: Additional Domain Specific Extension Data, whose format can be
determined from the *Domain Specific ID* specified in the *Individual
Report Main Contents*. For *RepType* codepoint 1 *INT*, this is additional
domain specific data that is not associated with *DSMdBits*.
Presence of this field is indicated by *InType* codepoint 2.
* TLV
: Type Length Value format. Multiple TLV formatted data (see Section [#tlv]).
Presence of this field is indicated by *InType* codepoint 1.
### Individual Report Main Contents for *RepType* 1 (*INT*) (8+ octets) { #int-report }
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RepMdBits | Domain Specific ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DSMdBits | DSMdstatus |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
| Variable Optional Baseline Metadata | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MD Length
| Variable Optional Domain Specific Metadata | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
```
* RepMdBits (16b): Report Metadata Bits
: Bitmap that indicates which optional baseline metadata is present in the
telemetry report header. Each bit represents 4 octets of optional
metadata, except for bits 4, 5 & 6 which represents 8 octets of optional metadata.
* bit 0 (MSB): Reserved
* bit 1: Level 1 Ingress Interface ID (16 bits) & Egress Interface ID (16 bits)
* bit 2: Hop Latency
* bit 3: Queue ID (8 bits) + Queue Occupancy (24 bits)
* bit 4: Ingress Timestamp (64 bits)
* bit 5: Egress Timestamp (64 bits)
* bit 6: Level 2 Ingress Interface ID (32 bits) + Egress Interface ID (32 bits)
* bit 7: Egress Port TX Utilization
* bit 8: Buffer ID (8 bits) + Buffer Occupancy (24 bits)
* bit 9-14: Reserved.
* bit 15: Queue ID (8 bits) + Drop Reason (8 bits) + Padding (16 bits)
This specification defines the following metadata:
* Drop reason
: An enumeration that indicates the reason why a packet was dropped, for
example as defined in
[github.com/p4lang/switch](https://github.com/p4lang/switch/blob/master/p4src/includes/drop_reason_codes.h).
See the INT specification [^INT] for definitions of the remaining metadata.
* Domain Specific ID (16b)
: The unique ID of the INT Domain.
The *Domain Specific ID* value 0x0000 is the default, known to all nodes.
For this value, all *DSMdBits* are treated as reserved. Operators can
assign values in the range 0x0001 to 0xFFFF.
* DSMdBits (16b): Domain Specific Md Bits
: Bitmap that indicates which optional domain specific metadata is
present in the the telemetry report header. Each bit represents
4 octets or a multiple of 4 octets of domain specific optional metadata.
When using INT-MD or INT-MX, if the *Domain Specific ID* does not match
any Domain ID known to this node, then the node may either:
- Set the Telemetry Report *DSMdBits* field to zero and rederive the
Telemetry Report *MD Length* from *RepMdBits*, or
- Not send any of its own metadata to the monitoring systems, doing
any of the following:
- Not generate any Telemetry Report, or
- Clear *RepMdBits* and *MD Length* as well as *DSMdBits* (this only
makes sense for INT-MD), or
- Use *RepType* value *Inner Only* (this only makes sense for INT-MD).
* DSMdstatus (16b): Domain Specific Md Status
: Indicates the domain specific metadata status.
* Variable Optional Baseline Metadata
: The metadata corresponding to *RepMdBits*, 4 octets for each bit, except
8 octets for bits 4, 5 & 6.
If a node receives an INT-MX or INT-MD packet with an *Instruction Bitmap*
that requests one or more metadata values that are not available or
reserved, then the node must ensure that the corresponding bit(s) in the
Telemetry Report *RepMdBits* that specify the unavailable metadata are not
set. The Telemetry Report *MD Length* must be derived based on the adjusted
*RepMdBits* (and *DSMdBits*) values.
* Variable Optional Domain Specific Metadata
: The metadata corresponding to *DSMdBits*, 4 octets or a multiple of
4 octets for each bit.
If a node receives an INT-MX or INT-MD packet with a *DS Instruction*
that requests one or more metadata values that are not available or
reserved, then the node must ensure that the corresponding bit(s) in the
Telemetry Report *DSMdBits* that specify the unavailable metadata are not
set. The Telemetry Report *MD Length* must be derived based on the adjusted
*DSMdBits* (and *RepMdBits*) values.
[]{tex-cmd: "\newpage"}
### Individual Report Inner Contents for *InType* 1 (*TLV*) (4+ octets) { #tlv }
One or more TLVs, each following the format defined in this section.
The presence of multiple TLVs can be determined by comparing the
*TLVLength* in the first TLV with the *Report Length* in the Individual
Report Header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<----------+
|TLVType| Rsvd | TLVLength | TLV Data Template | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+ |
| | | May be
| Variable TLV Data as identified by TLVType | TLV repeated
| (Truncated Packet or Domain Specific Extension Data) | Length |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+<------+
```
* TLVType (4b): TLV data type
:
- **0**: Domain Specific Extension Data
- **1**: Ethernet
- **2**: IPv4
- **3**: IPv6
- **4-15**: Reserved.
* Rsvd (4b) – Reserved
: Should be set to zero upon transmission and ignored upon reception.
* TLVLength (8b)
: Indicates the length, in 4-byte words, of *Variable TLV Data* as identified
by *TLVType*. Note that this does not include the length of the first
4-byte word (*TLVType, Rsvd, TLVLength, TLV Data Template*).
* TLV Data Template (16b)
: Specifies the format of the *Variable TLV Data*.
A non-zero *TLV Data Template* value specifies the template for *TLVType*
codepoint of *Domain Specific Extension Data*. For *TLVType* codepoints
*Ethernet*, *IPv4*, and *IPv6*, the *TLV Data Template* value should be
zero upon transmission and ignored upon reception.
* Variable TLV Data
: Variable length data based upon *TLVType*. The following two fields are
defined in this version.
* Truncated Packet
: L2/L3/ESP/L4 of the packet for flow details. Presence of this field is
indicated by *TLVType* codepoint 1, 2, or 3, which identifies the type of
header at the beginning of the truncated packet.
* Domain Specific Extension Data
: Domain Specific Extension Data, whose format can be determined from the
*Domain Specific ID* specified in the *Individual Report Main Contents* and
the *TLV Data Template*.
Presence of this field is indicated by *TLVType* codepoint 0.
For *RepType* codepoint 1 *INT*, this is additional domain specific data
that is not associated with *DSMdBits*.
[]{tex-cmd: "\newpage"}
## Embedded Telemetry Metadata In Stacked Reports { #embedded }
There may still be further telemetry metadata embedded within a truncated
packet fragment. For example, this is typically the case when there is
stacked telemetry metadata from hops prior to the node generating the
report. The telemetry metadata will typically be encoded using a defined
data plane format such as INT-MD or IOAM.
A node generating a telemetry report with stacked telemetry metadata may
include its local telemetry metadata in any of the following:
* the embedded telemetry metadata in a truncated packet fragment,
* the stacked telemetry metadata in domain specific extension data,
* the *Individual Report Main Contents* in the same Individual Report Header
that contains the stacked telemetry metadata from previous hops, in either a
truncated packet fragment or in domain specific extension data, or
* the *Individual Report Main Contents* in a separate report from the stacked
telemetry metadata from previous hops. Note that in this case the
ingress timestamp (if present) will be the same in both reports.
If the *Tracked Flow Association (F)* bit is set to 0, then there will not
be any embedded telemetry metadata in the report.
If the *Tracked Flow Association (F)* bit is set to 1, there may or may not
be any embedded telemetry metadata in the report.
[]{tex-cmd: "\newpage"}
# Examples of Telemetry Reports
This section shows examples of Telemetry Reports.
These examples are not intended to be complete or exclusive.
## Example with Baseline Metadata and Truncated IPv4
This example shows a telemetry report with baseline metadata consisting of
level 1 interface IDs and queue occupancy, and a truncated IPv4 packet.
The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 4 (*IPv4*)
- *MD Length* = 2 for level 1 interface IDs and queue occupancy
- *Report Length* = 2 (individual report fixed size) + 2 (*MD Length*)
\+ 5 (original IP header) + 5 (original TCP) \
= 14 (assuming truncated packet fragment ends after the original TCP header)
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0
- *Domain Specific ID*, *DsMdBits* and *DSMdstatus* are all 0
Below is the telemetry report packet starting from the
Telemetry Report Group Header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2| hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1|0 1 0 0| RepLength=14 | MD Length = 2 |0|0|1|0| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ingress Interface ID | Egress Interface ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Queue Occupancy |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Truncated IPv4 Packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
## Example with Baseline Metadata, Domain Specific Metadata, DS Extension Data and Truncated IPv4
This example shows a telemetry report with baseline metadata consisting of
level 1 interface IDs and queue occupancy, one domain specific metadata that
is 4 bytes long, domain specific extension data and a truncated IPv4 packet.
The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 1 (*TLV*)
- *MD Length* = 2 for level 1 interface IDs and queue occupancy
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0
- The domain specific metadata that is included is represented by the first
bit (MSB) of *DsMdBits*
- The first TLV uses *TLVType* = 0 (*Domain Specific Extension Data*)
- The second TLV uses *TLVType* = 2 (*IPv4*)
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2| hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1|0 0 0 1| Report Length | MD Length = 2 |0|0|1|0| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0| Domain Specific ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| DSMdstatus |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ingress Interface ID | Egress Interface ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Queue Occupancy |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Domain Specific Metadata |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| Rsvd | TLVLength | TLV Data Template |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable TLV Data (Domain Specific Extension Data) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 1 0| Rsvd | TLVLength | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable TLV Data (Truncated IPv4 Packet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
## Example with Embedded INT-MD in a TCP Packet
This example shows one possibility for the telemetry report corresponding
to the INT packet in Section 6.3 of the INT 2.1 specification [^INT].
Two hosts (Host1 and Host2) communicate over a network path composed of
three network switches (Switch1, Switch2 and Switch3) as shown below.
```
==> packet P travels from Host1 to Host2 ==>
Host1 --------> Switch1 ---------> Switch2 ---------> Switch3 --------> Host2
```
* The ToR switch of host1 (Switch1) acts as the INT source. It adds a new
UDP header, INT-MD headers and its own metadata in the packet. It requests
each INT hop to insert node ID and queue occupancy (For the sake of
illustration we only consider node ID and queue occupancy being inserted
at each hop. Queue IDs are typically defined per port, hence in a real
use-case queue occupancy is likely to be collected along with egress
interface ID.)
* The values of INT metadata header fields in this example are as follows:
- *Ver* = 2
- *D* = 0 (Packet is not a clone/copy, hence the Sink must not Discard)
- *E* = 0 (Max Hop Count not exceeded)
- *M* = 0 (MTU not exceeded at any node)
- Per-hop Metadata Length = 2 (for node id & queue occupancy)
- *Remaining Hop Count* starts at 8, decremented by 1 at each hop
that inserts INT metadata
* Switch2 prepends its node ID and queue occupancy into the metadata stack.
* The ToR switch of host2 (Switch3) acts as the INT sink, removing the UDP
and INT-MD headers before forwarding the packet to host2. It generates a
Telemetry Report packet with a single individual report. It inserts its
node ID and queue occupancy into the Individual Report Header rather than
the embedded INT stack in the truncated packet.
* The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 4 (*IPv4*)
- *MD Length* = 1 for queue occupancy
- *Report Length* = 2 (individual report fixed size) + 1 (*MD Length*)
\+ 5 (original IP header) + 2 (UDP header for embedded INT) + 1 (INT shim)
\+ 3 (INT fixed) + 4 (INT metadata stack) + 5 (original TCP) \
= 23 (assuming truncated packet fragment ends after the original TCP header)
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0 since the report includes the metadata for all hops
- *Domain Specific ID*, *DsMdBits* and *DSMdstatus* are all 0
Below is the telemetry report packet generated by sink Switch3, starting
from the Telemetry Report Group Header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|Ver = 2| hw_id | Sequence Number | Telemetry
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Report
| Node ID of Switch3 | Group
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|0 0 0 1|0 1 0 0| RepLength=23 | MD Length = 1 |0|0|1|0| Rsvd | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| Individual
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Report
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Queue Occupancy of Switch3 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Ver=4 | IHL=5 | DSCP |ECN| Length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Identification |Flags| Fragment Offset | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IP
| Time to Live | Proto = 17 | Header Checksum | (original)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Source Address | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Destination Address | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Source Port | Destination Port = INT_TBD | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP
| Length | Checksum | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|Type=1 | 2 |R R| Length=7 | Reserved | IP proto = 6 | INT Shim
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Ver=2 |0|0|0| Reserved | HopML=2 |RemainingHopC=6| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| INT
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Node ID of hop2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Queue Occupancy of hop2 | INT
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ metadata
| Node ID of hop1 | stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Queue Occupancy of hop1 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| TCP header | TCP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
```
## Example with Embedded INT-MD over UDP in a VXLAN Packet
Two hosts (Host1 and Host2) communicate over a network path composed of
three network switches (Switch1, Switch2 and Switch3) as shown below.
```
==> packet P travels from Host1 to Host2 ==>
Host1 --------> Switch1 ---------> Switch2 ---------> Switch3 --------> Host2
```
In this example, the two hosts use VXLAN encapsulation. Host1 acts as the
VXLAN tunnel endpoint and INT source, inserts VXLAN and INT-MD over UDP
headers with instruction bits corresponding to the network state to be
reported at intermediate switches. In this example, Host1 and Host2 do not
insert any INT metadata. Intermediate switches process the INT-MD header
and populate the INT metadata. Host2 acts as INT sink and VXLAN tunnel
endpoint, removes the INT-MD and VXLAN headers.
* Host1 sends a VXLAN packet to host2 with inner source IP address 10.10.1.1
(identifying the source workload), inner destination IP address 10.10.2.2
(identifying the destination workload), outer source IP address 192.168.1.1
(identifying host1), outer destination IP address 192.168.2.2 (identifying
host2), UDP source port 56789, and UDP checksum 0.
* Host1 acts as the INT source. It inserts INT-MD headers between the outer
UDP header and the VXLAN header. It requests each INT hop to insert node ID
and level 1 ingress and egress interface IDs.
* The values of INT metadata header fields in this example are as follows:
- *Ver* = 2
- *D* = 0 (Packet is not a clone/copy, hence the Sink must not Discard)
- *E* = 0 (Max Hop Count not exceeded)
- *M* = 0 (MTU not exceeded at any node)
- Per-hop Metadata Length = 2 (for node id & level 1 interface IDs)
- *Remaining Hop Count* starts at 8, decremented by 1 at each hop
that inserts INT metadata
* Switch1, Switch2, and Switch3 each prepend their node ID and the relevant
ingress and egress interface IDs in the metadata stack.
* Host2 acts as the INT sink and VXLAN tunnel endpoint, removes the INT-MD
and VXLAN headers. It generates a Telemetry Report packet with a single
individual report. Since it is not adding any of its own metadata, it uses
*RepType* value *Inner Only* and *InType* value *IPv4* with the embedded
INT stack in the truncated packet.
* The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 0 (*Inner Only*)
- *InType* = 4 (*IPv4*)
- *MD Length* = 0
- *Report Length* = 5 (original outer IP header) + 2 (UDP header)
\+ 1 (INT shim) + 3 (INT fixed) + 6 (INT metadata stack) + 2 (VXLAN)
\+ 4 (inner Ethernet header) + 5 (inner IP header) + 5 (original TCP) \
= 33 (assuming truncated packet fragment ends after the original TCP header)
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0 since the report is being generated by the INT sink, including
the metadata for all hops
- *Domain Specific ID*, *DsMdBits* and *DSMdstatus* are all 0
Below is the telemetry report packet generated by Host2, starting
from the Telemetry Report Group Header. We use INT_TBD for UDP.Destination_Port
to indicate the existence of INT headers.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|Ver = 2| hw_id | Sequence Number | Telemetry
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Report
| Node ID of Switch3 | Group
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|0 0 0 0|0 1 0 0| RepLength=33 | MD Length = 0 |0|0|1|0| Rsvd | Indiv Report
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Ver=4 | IHL=5 | DSCP |ECN| Length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Identification |Flags| Fragment Offset | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IP
| Time to Live | Proto = 17 | Header Checksum | (outer)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Source Address = 192.168.1.1 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Destination Address = 192.168.2.2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Source Port = 56789 | Destination Port = INT_TBD | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP
| Length | Checksum = 0 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|Type=1 | 1 |R R| Length=7 | Destination UDP port = 4789 | INT Shim
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Ver=2 |0|0|0| Reserved | HopML=2 |RemainingHopC=5| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| INT
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Node ID of hop3 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Ingress Interface ID of hop3 | Egress Interface ID of hop3 | INT
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ metadata
| Node ID of hop2 | stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
```
[]{tex-cmd: "\newpage"}
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Ingress Interface ID of hop2 | Egress Interface ID of hop2 | INT
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ metadata
| Node ID of hop1 | stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (cont'd)
| Ingress Interface ID of hop1 | Egress Interface ID of hop1 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|R|R|R|R|I|R|R|R| Reserved | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VXLAN
| VXLAN Network Identifier (VNI) | Reserved | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Inner Destination MAC Address | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Inner Destination MAC Address | Inner Source MAC Address | Ethernet
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (inner)
| Inner Source MAC Address | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Ethertype = 0x0800 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Ver=4 | IHL=5 | DSCP |ECN| Length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Identification |Flags| Fragment Offset | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IP
| Time to Live | Proto = 17 | Header Checksum | (inner)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Source Address = 10.10.1.1 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Destination Address = 10.10.2.2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| TCP header | TCP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
```
# Acknowledgements {@h1:"A"}
We thank the following individuals for their contributions to this
specification.
* Gordon Brebner
* Mukesh Hira
* Jeongkeun Lee
* Randy Levensalor
* Ramesh Sivakolundu
* Mickey Spiegel
# Change log
* 2017-11-10
- Initial release
* 2017-11-10
- **Tag v0.5 spec**
* 2018-2-14
+ Promote *Switch id* to fixed portion of the Telemetry Report Header
- The Switch id is always present.
+ Flexible format allowing for arbitrary combinations of optional telemetry
metadata in the Telemetry Report Header
- Replaces previous Telemetry Drop Header and Telemetry Switch Local
Report Header
- Adds a 4 bit length field indicating the Telemetry Report Header length
in multiples of 4 octets
- Adds a bitmap indicating which optional metadata is present in the
Telemetry Report Header
- Rearranges fields in the first 32 bits of the Telemetry Report Header in
order to achieve proper alignment, and to place reserved bits between
the report metadata bitmap and the association bits so that either one
can expand as necessary
* 2018-04-03
- Editorial changes for v1.0
* 2018-04-20
- **Tag v1.0 spec**
* 2019-07-19
- Added INT modes of operation and nomenclature: INT-XD/MX/MD
* 2020-04-06
- Revised Telemetry Report header 2.0 format supporting:
- coalescing of multiple reports in one packet
- support for domain specific extensions
- some new RepMdBits codepoints
- Increased timestamp size to 8 bytes
- Changed *Switch id* terminology to *Node ID*
* 2020-05-18
- Separated the Telemetry Report Header description into separate sections
for 'Telemetry Report Group Header' (first 8 octets that appear once in
a packet, and 'Individual Report Header' that may be repeated, one per
report in the packet
- Reworded section on Embedded Telemetry Metadata to match v2.0 format,
where truncated packet fragments reside within each Individual Report
Header
- Changed 'switch' and 'device' to 'node' where appropriate
- Reserved Domain Specific ID value 0x0000 as default well known ID
- Added RepType codepoint for 'Inner Only'
* 2020-05-25
- Added 2 examples of telemetry reports with embedded telemetry metadata
* 2020-06-15
- Simplify metadata processing by removing padding option, relying on
clearing of bits instead.
* 2020-10-08
- Specified the 'Report Length' value 0xFF as a special value that indicates
a length greater than or equal to 0xFF in 4-byte words, extending to the
end of the UDP payload, i.e. there are no subsequent individual reports in
this telemetry report.
- **Tag v2.0 spec**
[^IOAM_reqs]: Requirements for In-situ OAM, [draft-brockners-inband-oam-requirements-03](https://tools.ietf.org/html/draft-brockners-inband-oam-requirements-03), March 2017.
[^INT]: [In-band Network Telemetry (INT) Dataplane Specification Version 2.1](https://github.com/p4lang/p4-applications/tree/master/docs/INT_v2_1.pdf), May 2020.
[^IOAM]: Data Fields for In-situ OAM, [draft-ietf-ippm-ioam-data-05](https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09), March 2020.
[^SAI_DTel]: [SAI Data Plane Telemetry API Proposal](https://github.com/opencomputeproject/SAI/blob/master/doc/DTEL/SAI-Proposal-Data-Plane-Telemetry.md), December 2017.
[^SAI_TAM]: [SAI Telemetry and Monitoring (TAM) 2.0](https://github.com/opencomputeproject/SAI/blob/master/doc/TAM/SAI-Proposal-TAM2.0-v2.0.docx), July 2019.
[^IPFIX]: Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information, [RFC 7011](https://www.ietf.org/rfc/rfc7011.txt), September 2013.
[^IOAM_VXLAN_GPE]: VXLAN-GPE Encapsulation for In-situ OAM Data, [draft-brockners-ioam-vxlan-gpe-00](https://tools.ietf.org/html/draft-brockners-ioam-vxlan-gpe-03), November 2019.