-
Notifications
You must be signed in to change notification settings - Fork 46
Expand file tree
/
Copy pathtelemetry_report.mdk
More file actions
1271 lines (1082 loc) · 58.8 KB
/
Copy pathtelemetry_report.mdk
File metadata and controls
1271 lines (1082 loc) · 58.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Title : Telemetry Report Format Specification
Title Note : Version 2.0
Title Footer : 2020-10-08
Author : The P4.org Applications Working Group
Affiliation: Contributions from *CableLabs, Cisco Systems, Intel, VMware, Xilinx*
Heading depth : 5
Pdf Latex: xelatex
Document Class: [11pt]article
Package: [top=1in, bottom=1.25in, left=1in, right=1in]{geometry}
Package: fancyhdr
Tex Header:
\setlength{\headheight}{30pt}
\renewcommand{\footrulewidth}{0.5pt}
@if html {
body.madoko {
font-family: utopia-std, serif;
}
title,titlenote,titlefooter,authors,h1,h2,h3,h4,h5 {
font-family: helvetica, sans-serif;
font-weight: bold;
}
pre, code {
language: p4;
font-family: monospace;
font-size: 10pt;
}
}
@if tex {
body.madoko {
font-family: UtopiaStd-Regular;
}
title,titlenote,titlefooter,authors {
font-family: sans-serif;
font-weight: bold;
}
pre, code {
language: p4;
font-family: LuxiMono;
font-size: 75%;
}
}
Colorizer: p4
.token.keyword {
font-weight: bold;
}
@if html {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
padding:6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if tex {
p4example {
replace: "~ Begin P4ExampleBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4ExampleBlock";
breakable: true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #ffffdd;
border-width: 0.5pt;
}
}
@if html {
p4pseudo {
replace: "~ Begin P4PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4PseudoBlock";
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
border: solid;
background-color: #e9fce9;
border-width: 0.5pt;
}
}
@if tex {
p4pseudo {
replace: "~ Begin P4PseudoBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4PseudoBlock";
breakable : true;
padding: 6pt;
margin-top: 6pt;
margin-bottom: 6pt;
background-color: #e9fce9;
border: solid;
border-width: 0.5pt;
}
}
@if html {
p4grammar {
replace: "~ Begin P4GrammarBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4GrammarBlock";
border: solid;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border-width: 0.5pt;
}
}
@if tex {
p4grammar {
replace: "~ Begin P4GrammarBlock&nl;\
````&nl;&source;&nl;````&nl;\
~ End P4GrammarBlock";
breakable: true;
margin-top: 6pt;
margin-bottom: 6pt;
padding: 6pt;
background-color: #e6ffff;
border: solid;
border-width: 0.5pt;
}
}
[TITLE]
[]{tex-cmd: "\newpage"}
[]{tex-cmd: "\fancyfoot[L]{&date; &time;}"}
[]{tex-cmd: "\fancyfoot[C]{Telemetry Report Format}"}
[]{tex-cmd: "\fancyfoot[R]{\thepage}"}
[]{tex-cmd: "\pagestyle{fancy}"}
[]{tex-cmd: "\sloppy"}
[TOC]
# Introduction
Traditional network monitoring has relied on statistics and probe packets
such as ICMP echo requests/replies. Recent innovations provide greater
insight into network behavior by generating detailed reports of telemetry
metadata such as paths, queue occupancy, latency experienced by data
packets, and timestamps that can be used to determine hop-by-hop and
end-to-end delay. Generation of telemetry reports can be triggered by
various events in categories such as flow monitoring, queue congestion, and
packet drops. Further information regarding the motivation and usage of
detailed telemetry information can be found in the IETF draft for
In-situ OAM [^IOAM_reqs].
Specifications are being defined for embedding telemetry metadata within
data packets, such as INT [^INT] and IOAM [^IOAM]. This allows for telemetry
metadata to be collected as packets traverse a network. When the packets
reach the edge of the network, the telemetry metadata is removed and
telemetry reports are generated.
This specification defines packet formats for telemetry reports from data plane
network devices (e.g. switches, routers, NICs) to a distributed telemetry
monitoring system. The packet formats use headers that describe the contents of
telemetry reports, along with existing (non-telemetry specific) packet
headers that can be used to categorize flows.
## Scope
The scope of this specification is interoperability between network devices
that generate telemetry reports based on what they see in the data plane,
and the initial preprocessors within distributed telemetry monitoring
systems that receive the telemetry reports. This specification is
applicable when telemetry reports are generated by network devices at the
edges of a network, with source and transit network devices embedding
telemetry metadata in data packets according to specifications such as IOAM
[^IOAM] and INT [^INT], when using INT-MD mode. This specification is also
applicable when each network device directly generates telemetry reports,
including transit network devices in the middle of the network, such as in
INT-XD (where data packet formats between successive network devices are
not affected) and in INT-MX (where only INT instructions are embedded in
data packets).
Telemetry report encapsulation formats are defined that allow for the
inclusion of additional telemetry metadata, beyond the (optional) telemetry
metadata embedded between other packet headers as defined in INT-MD and
IOAM. The embedded telemetry metadata is included as is in telemetry
reports, so the packet formats defined in INT-MD and IOAM also define
some aspects of the telemetry report format. See Section [#embedded] for
further discussion.
This specification does not address any of the following, which are
considered out of scope:
* Configuration of network devices so that they can determine when to
generate telemetry reports, and what information to include in those
reports, such as SAI DTel [^SAI_DTel] and SAI TAM 2.0 [^SAI_TAM].
* Events that trigger generation of telemetry reports.
* Selection of particular destinations within distributed telemetry
monitoring systems, to which telemetry reports will be sent.
* Export format for flow statistics or summarized flow records such as
IPFIX [^IPFIX].
# Key Concepts
## Telemetry Report Definition
We define a *telemetry report* as a message that a network device sends to
the monitoring system. A *telemetry report* typically carries a snapshot of the
original data packet (mostly the inner + outer headers), which triggered
the reporting, together with additional telemetry metadata collected from
the reporting network device, and possibly from its upstream network
devices (in case of an in-band mechanism like INT-MD or IOAM). The report
message is encapsulated by IP+UDP, hence it can be forwarded from the
reporting network device through the data network, and to the destination
monitoring system.
The network devices that generate telemetry reports are referred to as
*nodes* in the rest of this specification. Depending on deployment
scenarios, examples of such *nodes* may include network devices such as
switches, routers, and NICs.
The following sections will cover the details of report generation,
report format and encapsulation.
## Telemetry Report Associations
There are many reasons why users may want telemetry reports to be
generated. This specification currently considers three categories for
telemetry report generation:
* Tracked Flows
: Telemetry reports are generated matching certain flow definitions. A
telemetry specific access control list (called a *watchlist* in this
specification) determines which data packets to monitor by matching
packet header fields and optionally identification of the ingress
interface. The action in the matched entry in the *watchlist* may
specify monitoring of this flow, triggering generation of telemetry
reports based on these packets. (Note that the telemetry specific
watchlist is not performing any access control. It only makes
decisions related to monitoring actions.) The telemetry reports
include information about the path that packets traverse as well as
other telemetry metadata such as hop latency and queue occupancy.
* Dropped Packets
: Telemetry reports are generated for dropped packets matching a
telemetry specific access control list (called a *watchlist* in this
specification), when the action in the matched entry specifies
monitoring of dropped packets. This provides visibility into the
impact of packet drops on user traffic.
* Congested Queues
: Telemetry reports are generated for traffic entering a specific queue
during a period of queue congestion. This provides visibility into the
traffic causing and prolonging queue congestion, for example a few
large elephant flows that overwhelm a queue, as well as the victim
traffic (mice flows) getting hurt by the congestion. This also enables
the detection and “re-play” of a short microburst, caused by a large
number of mice flows arriving at the queue at the same time.
Each telemetry report may be associated with one or more of these
categories. This is indicated in the telemetry report by defining
association bits, one for each category, as will be shown in Section
[#individual-report]. New categories (and corresponding association bits) may
be added to future versions of this specification.
Nodes will need to be configured so that they can determine when
to generate telemetry reports, and what information to include in those
reports. Such configuration is considered to be beyond the scope of this
specification. See SAI DTel [^SAI_DTel] for one API proposal to enable
data plane telemetry capabilities in nodes across all three categories.
## Telemetry Report Events
Telemetry reports are typically triggered by packet processing at a node.
However, even when processed packets match a watchlist for a
telemetry report category, it is not necessary for each inspected packet to
trigger generation of a telemetry report. Nodes may apply filters
to determine when significant events occur that should be reported. This is
called event detection in this specification. For example, a node
may trigger telemetry report generation whenever a packet matching a
tracked application flow is received or transmitted on a different path
than previous packets, or if a significant change in latency is experienced
at one particular hop.
Determination of which packets trigger reports, in other words the specific
conditions and logic to determine the events of interest, is left open for
implementations to differentiate themselves, and is considered to be beyond
the scope of this specification.
## Telemetry Reporting Modes
There are different modes which differ with regard to the locations
from which telemetry reports are generated.
### Per Hop Reports in INT-XD/MX modes
~ Figure { #fig-INT-XD-arch; caption: "INT-XD mode - Telemetry Architecture \
with per hop reports generated by each node"; page-align: here }
![INT-XD-arch]
~
[INT-XD-arch]: images/telemetry_arch_per_hop.jpg { width: 5.1in }
In the *INT-XD (eXport Data)* mode, as defined in the INT specification [^INT],
each node generates its own telemetry reports (Figure [#fig-INT-XD-arch]).
The distributed telemetry monitoring system will receive reports from different
nodes, each describing the telemetry metadata (such as node IDs, interface IDs,
latency) for one hop. Within the per hop telemetry reports, the telemetry
metadata precedes the details of the original packet header. There is no change
to data packets traversing the network.
This mode was known as "Postcard" mode in previous versions of this
Telemetry Report specification.
In the *INT-MX (eMbed instruct(X)ions)* mode, as defined in the INT specification [^INT], the
source node embeds instructions in the INT-MX header. Upon receipt of a packet
with this INT type, each node in the path generates its own telemetry reports,
as shown in Figure [#fig-INT-MX-arch]. The distributed telemetry monitoring
system will receive reports from different nodes, each describing the telemetry
metadata (such as node IDs, interface IDs, latency) for one hop. The only
change to data packets is the source node embedding the INT-MX Header with
instructions. The sink node removes the INT-MX Header from such packets. When
using INT-MX mode, the telemetry metadata precedes the details of the original
packet headers within the telemetry report.
~ Figure { #fig-INT-MX-arch; caption: "INT-MX mode - Telemetry Architecture \
with per hop reports generated by each node" }
![INT-MX-arch]
~
[INT-MX-arch]: images/telemetry_arch_int_mx.jpg { width: 5.35in }
### Stacked Reports in INT-MD mode
In the *INT-MD (eMbed Data)* mode, telemetry metadata is embedded in between the
original headers of data packets as they traverse the network, as shown in
Figure [#fig-INT-MD-arch]. This may be done using any of the telemetry data
plane specifications such as INT or IOAM. When a packet enters the
network, the source node may insert a telemetry instruction header,
thereby instructing downstream nodes to add the desired telemetry
metadata. At each hop, the transit node inserts its telemetry metadata at the
top of the stack. The sink node extracts the telemetry instruction header
before progressing the original packet. Depending on the result of event
detection, the sink node may generate a telemetry report containing
the stacked telemetry metadata from all hops across the network.
~ Figure { #fig-INT-MD-arch; caption: "INT-MD mode - Telemetry Architecture \
with stacked reports generated by sink nodes" }
![INT-MD-arch]
~
[INT-MD-arch]: images/telemetry_arch_figure.jpg { width: 6.8in }
In order to reduce complexity at the sink node, some telemetry reports
may include embedded telemetry metadata intermingled with the details of
original packet headers. This simplifies generation of telemetry reports
due to receipt of data packets with embedded telemetry metadata. The
telemetry data plane specification such as INT or IOAM specifies the format
for this portion of the telemetry metadata. This approach reduces data plane
complexity, allowing for all telemetry report processing and generation to
be done in the data plane itself without any need to punt to the control
plane for further processing.
The sink node has the option to add its local telemetry metadata either
in the telemetry report headers defined in this specification, or in the
embedded telemetry metadata intermingled with the original packet headers.
### Using Different Telemetry Modes for Different Telemetry Categories
Even when stacked reports are generated for the category of tracked flows
using INT-MD mode, it is possible to generate per hop reports for other
categories such as dropped packets and congested queues. The latter
categories are often monitored as per node, per port, or per queue local
events, suggesting that telemetry reports should be generated directly from
the affected node(s).
## Correlation of Telemetry Reports
Telemetry reports for a specific application flow matching a watchlist
may be received from multiple nodes. In case of INT-XD and INT-MX modes,
each hop will generate a separate report. Even when stacked telemetry
metadata is embedded in the data plane according to a specification such as
INT or IOAM, telemetry reports for one flow may still be generated by
multiple nodes in case of path change or in case of dropped packets.
The distributed telemetry monitoring system may want to correlate these
telemetry reports on a per flow basis (see Section [#flow-identification] for
details on flow identification). The telemetry reports include one association
bit for each telemetry report category, providing hints to the distributed
telemetry monitoring system that it can use to assist with telemetry report
correlation. In particular, the distributed telemetry monitoring system may
want to apply certain types of telemetry report correlation only when the
corresponding bits are set.
The mechanisms for correlation are left to each implementation, and are
considered to be beyond the scope of this specification.
## Flow Identification { #flow-identification }
There is no explicit metadata defined for flow identification. The
expectation is that either:
- a truncated packet fragment including the original packet headers will
be included in the telemetry report, allowing the distributed telemetry
monitoring system to categorize and identify flows in any manner that
it desires, or
- domain specific flow identification metadata will be included.
Tunneled packets such as VXLAN packets raise the question whether flow
identification should be based on outer or inner headers. The answer may
vary depending on the goals of tracked flow monitoring, deployment aspects,
operational issues, and the capabilities of the distributed telemetry
monitoring system. Note that it is possible to identify flows based on
inner packet headers even when using an INT encapsulation based on outer
headers such as INT over TCP/UDP.
When using INT-MD and flow identification based on inner headers is
desired, the distributed telemetry monitoring system should parse the
truncated packet fragment all the way down past any embedded telemetry
metadata (if present), even when the Individual Report includes optional
metadata such as drop reason. It may also want to process the embedded
telemetry metadata, for example to recognize the case where a path change
directs traffic to a congested node where packets are being dropped.
## Coalescing A Group of Telemetry Reports In A Single Packet
Starting with Version 2.0, the telemetry report format allows for a group of
individual reports, each corresponding to one data plane packet or one flow,
to be coalesced into the same telemetry report packet. This can help reduce
the packet processing overhead associated with telemetry reports. The only
restrictions are that all of the individual reports in the group must be
generated by the same node (or hardware subsystem within the node), and
they are all addressed to the same destination within the distributed
telemetry monitoring system. Beyond those restrictions, implementations are
free to come up with their own methods for deciding which individual
reports to group together.
Support for coalescing a group of telemetry reports in a single packet is
optional.
# Telemetry Report Format { #report-format }
This section specifies the packet format for telemetry reports.
## Outer Encapsulation
Telemetry reports are defined using a UDP-based encapsulation. Various
outer encapsulations may be used to transport the UDP packets. Typically
this would simply be an Ethernet header, followed by an IPv4 or IPv6
header, followed by the UDP header. This specification does not preclude
the use of different transport encapsulations.
The source IP address identifies the node that generates the
telemetry report.
The Destination IP address identifies a location in the distributed
telemetry monitoring system that will receive the telemetry report.
In case of IPv4, as is the case for any other IP packet, either the Don’t
Fragment (DF) bit must be set, or the IPv4 ID field must be set so that the
value does not repeat within the maximum datagram lifetime for a given
source address/destination address/protocol tuple.
### UDP header (8 octets)
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
The Source Port may optionally be used to carry flow entropy, for example
based on a hash of the inner 5-tuple. Otherwise, it should be user
configurable.
The Destination Port is user configurable. The expectation is that the same
Destination Port value will be used for all telemetry reports in a
particular deployment.
## Telemetry Report Group Header (Ver 2.0) (8 octets)
The Telemetry Report Group Header immediately follows the UDP header
whose destination port identifies the contents as a telemetry report.
This header contains the common fields in a telemetry report that
optionally contains multiple coalesced individual reports, each
corresponding to one data plane packet. There is at most one instance
of the Telemetry Report Group Header in a packet.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ver | hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
[]{tex-cmd: "\newpage"}
* Ver (4b): Version
: This specification defines **version 2**.
* hw_id (6b): Hardware ID
: Identifies the hardware subsystem within the node that generated
this report. For example, in a chassis with multiple linecards this could
identify a specific linecard, or a subsystem within a linecard. The hw_id
is unique within the scope of a Node ID.
* Sequence Number (22b): Sequence Number
: Reflects the sequence of reports from a specific combination of
(Node ID, hw_id) to a particular telemetry report destination. This
can be used to detect loss of telemetry reports before they reach their
intended destination.
* Node ID (32b): Node ID
: The unique ID of a node. This is generally administratively assigned.
Node IDs must be unique within a management domain.
## Individual Report Header (Ver 2.0) (4+ octets) { #individual-report }
Each telemetry report packet contains one or more individual reports immediately
following the Telemetry Report Group Header. Each report within the packet
starts with the Individual Report Header. The presence of multiple reports
corresponding to multiple data plane packets, possibly from multiple flows,
can be determined by comparing the *Report Length* in the Individual Report
Header with the length in the UDP header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|RepType| InType| Report Length | MD Length |D|Q|F|I| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
| | |
| Individual Report Main Contents | |
| (varies depending on RepType) | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Report
| | Length
| Individual Report Inner Contents | |
| (Truncated Packet or Additional DS Extension Data | |
| or TLV depending on InType) | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
```
* RepType (4b): Report Type
: Type of the individual report:
* **0**: Inner Only
* **1**: INT
* **2**: IOAM
* **3 – 15**: Reserved
[]{tex-cmd: "\newpage"}
* InType (4b): Inner Type
: Type of data embedded after the *Individual Report Main Contents*:
* 0: None
* 1: TLV
* 2: Domain Specific Extension Data
* 3: Ethernet
* 4: IPv4
* 5: IPv6
* 6-15: Reserved.
* Report Length (8b): Report Length
: Indicates the length of the Individual Report Header in a multiple of
4-byte words, including the *Individual Report Main Contents* and
*Individual Report Inner Contents*, but excluding the length of the first
4-byte word (*RepType, InType, Report Length, MD Length, D, Q, F, I, Rsvd*).
For *RepType* codepoint 1 *INT*, the *Report Length* includes the lengths
of *RepMdBits, Domain Specific ID, DSMdBits, DSMdstatus, Variable Optional
Baseline Metadata,* and *Variable Optional Domain Specific Metadata* (see
Section [#int-report]).
The *Report Length* value 0xFF is a special value that indicates a length
greater than or equal to 0xFF, extending to the end of the UDP payload,
i.e. there are no subsequent individual reports in this telemetry report.
* MD Length (8b): Metadata Length
: Indicates the length of metadata included in this report in a multiple
of 4-byte words. This may help the telemetry monitoring system
determine where the *Individual Report Inner Contents* begins. Note that
this does not include the length of the fixed portion of the *Individual
Report Main Contents*.
For *RepType* codepoint 1 *INT*, this includes the length of the *Variable
Optional Baseline Metadata* and *Variable Optional Domain Specific
Metadata* in 4-byte words (see Section [#int-report]).
* D (1b): Dropped
: Indicates that at least one packet matching a watchlist was dropped.
* Q (1b): Congested Queue Association
: Indicates the presence of congestion on a monitored queue.
* F (1b): Tracked Flow Association
: Indicates that this telemetry report is for a tracked flow, i.e. the
packet matched a watchlist somewhere (in case of INT-MD, INT-MX or IOAM) or
locally (in case of INT-XD). The report might include INT-MD or IOAM
metadata in the truncated packet. Other telemetry reports are
likely to be received for the same tracked flow, from the same node
and (in case of drop reports, INT-MX, INT-XD or path changes) from
other nodes.
* I (1b): Intermediate Report
: Indicates that a transit node sent this intermediate report for INT-MD.
* Rsvd (4b): Reserved
: Should be set to zero upon transmission, and ignored upon reception
* Individual Report Main Contents
: The metadata that comprises this report, along with associated fields
that assist in processing the metadata. The format varies depending on
*RepType*.
When the *RepType* value is *Inner Only*, then the Individual Report Main
Contents is empty. *MD Length* should be set to zero upon transmission,
and ignored upon reception.
The *INT* Individual Report Main Contents format (see Section [#int-report])
was derived with INT 2.0/2.1 in mind, but it may be used with other INT
versions as well. It is possible that other *RepType* codepoints and
corresponding Individual Report Main Contents formats may be defined for
future versions of INT.
The *IOAM* Individual Report Main Contents format will be defined in a
future version of this specification.
* Truncated Packet
: L2/L3/ESP/L4 of the packet for flow details. Presence of this field is
indicated by *InType* codepoint 3, 4, or 5, which identifies the type of
header at the beginning of the truncated packet. The length of the
truncated packet can be determined as *Report Length* - ((fixed length of
*Individual Report Main Contents*) + *MD Length*), unless the
*Report Length* value is 0xFF.
* Additional DS Extension Data
: Additional Domain Specific Extension Data, whose format can be
determined from the *Domain Specific ID* specified in the *Individual
Report Main Contents*. For *RepType* codepoint 1 *INT*, this is additional
domain specific data that is not associated with *DSMdBits*.
Presence of this field is indicated by *InType* codepoint 2.
* TLV
: Type Length Value format. Multiple TLV formatted data (see Section [#tlv]).
Presence of this field is indicated by *InType* codepoint 1.
### Individual Report Main Contents for *RepType* 1 (*INT*) (8+ octets) { #int-report }
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RepMdBits | Domain Specific ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DSMdBits | DSMdstatus |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
| Variable Optional Baseline Metadata | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MD Length
| Variable Optional Domain Specific Metadata | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+
```
* RepMdBits (16b): Report Metadata Bits
: Bitmap that indicates which optional baseline metadata is present in the
telemetry report header. Each bit represents 4 octets of optional
metadata, except for bits 4, 5 & 6 which represents 8 octets of optional metadata.
* bit 0 (MSB): Reserved
* bit 1: Level 1 Ingress Interface ID (16 bits) & Egress Interface ID (16 bits)
* bit 2: Hop Latency
* bit 3: Queue ID (8 bits) + Queue Occupancy (24 bits)
* bit 4: Ingress Timestamp (64 bits)
* bit 5: Egress Timestamp (64 bits)
* bit 6: Level 2 Ingress Interface ID (32 bits) + Egress Interface ID (32 bits)
* bit 7: Egress Port TX Utilization
* bit 8: Buffer ID (8 bits) + Buffer Occupancy (24 bits)
* bit 9-14: Reserved.
* bit 15: Queue ID (8 bits) + Drop Reason (8 bits) + Padding (16 bits)
This specification defines the following metadata:
* Drop reason
: An enumeration that indicates the reason why a packet was dropped, for
example as defined in
[github.com/p4lang/switch](https://github.com/p4lang/switch/blob/master/p4src/includes/drop_reason_codes.h).
See the INT specification [^INT] for definitions of the remaining metadata.
* Domain Specific ID (16b)
: The unique ID of the INT Domain.
The *Domain Specific ID* value 0x0000 is the default, known to all nodes.
For this value, all *DSMdBits* are treated as reserved. Operators can
assign values in the range 0x0001 to 0xFFFF.
* DSMdBits (16b): Domain Specific Md Bits
: Bitmap that indicates which optional domain specific metadata is
present in the the telemetry report header. Each bit represents
4 octets or a multiple of 4 octets of domain specific optional metadata.
When using INT-MD or INT-MX, if the *Domain Specific ID* does not match
any Domain ID known to this node, then the node may either:
- Set the Telemetry Report *DSMdBits* field to zero and rederive the
Telemetry Report *MD Length* from *RepMdBits*, or
- Not send any of its own metadata to the monitoring systems, doing
any of the following:
- Not generate any Telemetry Report, or
- Clear *RepMdBits* and *MD Length* as well as *DSMdBits* (this only
makes sense for INT-MD), or
- Use *RepType* value *Inner Only* (this only makes sense for INT-MD).
* DSMdstatus (16b): Domain Specific Md Status
: Indicates the domain specific metadata status.
* Variable Optional Baseline Metadata
: The metadata corresponding to *RepMdBits*, 4 octets for each bit, except
8 octets for bits 4, 5 & 6.
If a node receives an INT-MX or INT-MD packet with an *Instruction Bitmap*
that requests one or more metadata values that are not available or
reserved, then the node must ensure that the corresponding bit(s) in the
Telemetry Report *RepMdBits* that specify the unavailable metadata are not
set. The Telemetry Report *MD Length* must be derived based on the adjusted
*RepMdBits* (and *DSMdBits*) values.
* Variable Optional Domain Specific Metadata
: The metadata corresponding to *DSMdBits*, 4 octets or a multiple of
4 octets for each bit.
If a node receives an INT-MX or INT-MD packet with a *DS Instruction*
that requests one or more metadata values that are not available or
reserved, then the node must ensure that the corresponding bit(s) in the
Telemetry Report *DSMdBits* that specify the unavailable metadata are not
set. The Telemetry Report *MD Length* must be derived based on the adjusted
*DSMdBits* (and *RepMdBits*) values.
[]{tex-cmd: "\newpage"}
### Individual Report Inner Contents for *InType* 1 (*TLV*) (4+ octets) { #tlv }
One or more TLVs, each following the format defined in this section.
The presence of multiple TLVs can be determined by comparing the
*TLVLength* in the first TLV with the *Report Length* in the Individual
Report Header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<----------+
|TLVType| Rsvd | TLVLength | TLV Data Template | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+ |
| | | May be
| Variable TLV Data as identified by TLVType | TLV repeated
| (Truncated Packet or Domain Specific Extension Data) | Length |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<--+<------+
```
* TLVType (4b): TLV data type
:
- **0**: Domain Specific Extension Data
- **1**: Ethernet
- **2**: IPv4
- **3**: IPv6
- **4-15**: Reserved.
* Rsvd (4b) – Reserved
: Should be set to zero upon transmission and ignored upon reception.
* TLVLength (8b)
: Indicates the length, in 4-byte words, of *Variable TLV Data* as identified
by *TLVType*. Note that this does not include the length of the first
4-byte word (*TLVType, Rsvd, TLVLength, TLV Data Template*).
* TLV Data Template (16b)
: Specifies the format of the *Variable TLV Data*.
A non-zero *TLV Data Template* value specifies the template for *TLVType*
codepoint of *Domain Specific Extension Data*. For *TLVType* codepoints
*Ethernet*, *IPv4*, and *IPv6*, the *TLV Data Template* value should be
zero upon transmission and ignored upon reception.
* Variable TLV Data
: Variable length data based upon *TLVType*. The following two fields are
defined in this version.
* Truncated Packet
: L2/L3/ESP/L4 of the packet for flow details. Presence of this field is
indicated by *TLVType* codepoint 1, 2, or 3, which identifies the type of
header at the beginning of the truncated packet.
* Domain Specific Extension Data
: Domain Specific Extension Data, whose format can be determined from the
*Domain Specific ID* specified in the *Individual Report Main Contents* and
the *TLV Data Template*.
Presence of this field is indicated by *TLVType* codepoint 0.
For *RepType* codepoint 1 *INT*, this is additional domain specific data
that is not associated with *DSMdBits*.
[]{tex-cmd: "\newpage"}
## Embedded Telemetry Metadata In Stacked Reports { #embedded }
There may still be further telemetry metadata embedded within a truncated
packet fragment. For example, this is typically the case when there is
stacked telemetry metadata from hops prior to the node generating the
report. The telemetry metadata will typically be encoded using a defined
data plane format such as INT-MD or IOAM.
A node generating a telemetry report with stacked telemetry metadata may
include its local telemetry metadata in any of the following:
* the embedded telemetry metadata in a truncated packet fragment,
* the stacked telemetry metadata in domain specific extension data,
* the *Individual Report Main Contents* in the same Individual Report Header
that contains the stacked telemetry metadata from previous hops, in either a
truncated packet fragment or in domain specific extension data, or
* the *Individual Report Main Contents* in a separate report from the stacked
telemetry metadata from previous hops. Note that in this case the
ingress timestamp (if present) will be the same in both reports.
If the *Tracked Flow Association (F)* bit is set to 0, then there will not
be any embedded telemetry metadata in the report.
If the *Tracked Flow Association (F)* bit is set to 1, there may or may not
be any embedded telemetry metadata in the report.
[]{tex-cmd: "\newpage"}
# Examples of Telemetry Reports
This section shows examples of Telemetry Reports.
These examples are not intended to be complete or exclusive.
## Example with Baseline Metadata and Truncated IPv4
This example shows a telemetry report with baseline metadata consisting of
level 1 interface IDs and queue occupancy, and a truncated IPv4 packet.
The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 4 (*IPv4*)
- *MD Length* = 2 for level 1 interface IDs and queue occupancy
- *Report Length* = 2 (individual report fixed size) + 2 (*MD Length*)
\+ 5 (original IP header) + 5 (original TCP) \
= 14 (assuming truncated packet fragment ends after the original TCP header)
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0
- *Domain Specific ID*, *DsMdBits* and *DSMdstatus* are all 0
Below is the telemetry report packet starting from the
Telemetry Report Group Header.
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2| hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1|0 1 0 0| RepLength=14 | MD Length = 2 |0|0|1|0| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ingress Interface ID | Egress Interface ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Queue Occupancy |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Truncated IPv4 Packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
## Example with Baseline Metadata, Domain Specific Metadata, DS Extension Data and Truncated IPv4
This example shows a telemetry report with baseline metadata consisting of
level 1 interface IDs and queue occupancy, one domain specific metadata that
is 4 bytes long, domain specific extension data and a truncated IPv4 packet.
The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 1 (*TLV*)
- *MD Length* = 2 for level 1 interface IDs and queue occupancy
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0
- The domain specific metadata that is included is represented by the first
bit (MSB) of *DsMdBits*
- The first TLV uses *TLVType* = 0 (*Domain Specific Extension Data*)
- The second TLV uses *TLVType* = 2 (*IPv4*)
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver = 2| hw_id | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Node ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1|0 0 0 1| Report Length | MD Length = 2 |0|0|1|0| Rsvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0| Domain Specific ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| DSMdstatus |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ingress Interface ID | Egress Interface ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Queue Occupancy |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Domain Specific Metadata |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| Rsvd | TLVLength | TLV Data Template |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable TLV Data (Domain Specific Extension Data) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 1 0| Rsvd | TLVLength | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable TLV Data (Truncated IPv4 Packet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
## Example with Embedded INT-MD in a TCP Packet
This example shows one possibility for the telemetry report corresponding
to the INT packet in Section 6.3 of the INT 2.1 specification [^INT].
Two hosts (Host1 and Host2) communicate over a network path composed of
three network switches (Switch1, Switch2 and Switch3) as shown below.
```
==> packet P travels from Host1 to Host2 ==>
Host1 --------> Switch1 ---------> Switch2 ---------> Switch3 --------> Host2
```
* The ToR switch of host1 (Switch1) acts as the INT source. It adds a new
UDP header, INT-MD headers and its own metadata in the packet. It requests
each INT hop to insert node ID and queue occupancy (For the sake of
illustration we only consider node ID and queue occupancy being inserted
at each hop. Queue IDs are typically defined per port, hence in a real
use-case queue occupancy is likely to be collected along with egress
interface ID.)
* The values of INT metadata header fields in this example are as follows:
- *Ver* = 2
- *D* = 0 (Packet is not a clone/copy, hence the Sink must not Discard)
- *E* = 0 (Max Hop Count not exceeded)
- *M* = 0 (MTU not exceeded at any node)
- Per-hop Metadata Length = 2 (for node id & queue occupancy)
- *Remaining Hop Count* starts at 8, decremented by 1 at each hop
that inserts INT metadata
* Switch2 prepends its node ID and queue occupancy into the metadata stack.
* The ToR switch of host2 (Switch3) acts as the INT sink, removing the UDP
and INT-MD headers before forwarding the packet to host2. It generates a
Telemetry Report packet with a single individual report. It inserts its
node ID and queue occupancy into the Individual Report Header rather than
the embedded INT stack in the truncated packet.
* The values of Telemetry Report header fields in this example are as follows:
- *Ver* = 2
- *RepType* = 1 (*INT*)
- *InType* = 4 (*IPv4*)
- *MD Length* = 1 for queue occupancy
- *Report Length* = 2 (individual report fixed size) + 1 (*MD Length*)
\+ 5 (original IP header) + 2 (UDP header for embedded INT) + 1 (INT shim)
\+ 3 (INT fixed) + 4 (INT metadata stack) + 5 (original TCP) \
= 23 (assuming truncated packet fragment ends after the original TCP header)
- *D* = 0 since the packet is not dropped
- *Q* = 0 since the packet does not experience congestion at Switch3
- *F* = 1
- *I* = 0 since the report includes the metadata for all hops
- *Domain Specific ID*, *DsMdBits* and *DSMdstatus* are all 0
Below is the telemetry report packet generated by sink Switch3, starting
from the Telemetry Report Group Header.