From dedfb16b205911709313f076e82a7f0dfd07d8a3 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Mon, 22 Apr 2024 11:14:23 -0700 Subject: [PATCH 01/21] Create A80-grpc-metrics-for-tcp-connection --- A80-grpc-metrics-for-tcp-connection | 62 +++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 A80-grpc-metrics-for-tcp-connection diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection new file mode 100644 index 000000000..ca3b4a9d3 --- /dev/null +++ b/A80-grpc-metrics-for-tcp-connection @@ -0,0 +1,62 @@ +A80: gRPC Metrics for TCP connection +---- +* Author(s): Yash Tibrewal (@yashykt), Nana Pang (@nanahpang), Yousuk Seung (@yousukseung) +* Approver: Craig Tiller (@ctiller), Mark Roth (@markdroth) +* Status: {Draft, In Review, Ready for Implementation, Implemented} +* language: {...} +* Last updated: 2024-04-18 +* Discussion at: {...} + +## Abstract + +This document proposes adding new TCP connection metrics to gRPC for improved network analysis and debugging. + +## Background + +To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79]. + +### Related Proposals: +* [A79]: gRPC Non-Per-Call Metrics Framework (pending) + +[A79]: https://github.com/grpc/proposal/pull/421 + +## Proposal + +This document proposes changes to the following gRPC components. + +#### Per-Connection TCP Metrics + +We will provide the following metrics: +- `grpc.tcp.min_rtt` +- `grpc.tcp.delivery_rate` +- `grpc.tcp.packets_sent` +- `grpc.tcp.packets_retransmitted` +- `grpc.tcp.packets_spurious_retransmitted` + +The metrics will have label: + +| Name | Disposition | Description | +| ----------- | ----------- | ----------- | +| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. | + +The metrics will be exported as: + +| Name | Type | Unit | Labels | Description | +| ------------- | ----- | ----- | ------- | ----------- | +| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | +| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. | +| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. | + +### Metric Stability + +All metrics added in this proposal will start as experimental. The long term goal will be to +de-experimentalize them and have them be on by default, but the exact +criteria for that change are TBD. + +### Temporary environment variable protection + +This proposal does not include any features enabled via external I/O, so +it does not need environment variable protection. + From ffaeb22f7de15a8f2dab03e7e5d221e2a70b215a Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Tue, 23 Apr 2024 12:27:08 -0700 Subject: [PATCH 02/21] Update A80-grpc-metrics-for-tcp-connection --- A80-grpc-metrics-for-tcp-connection | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection index ca3b4a9d3..4992e19e3 100644 --- a/A80-grpc-metrics-for-tcp-connection +++ b/A80-grpc-metrics-for-tcp-connection @@ -47,7 +47,7 @@ The metrics will be exported as: | grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. | | grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. | | grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. | +| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| ### Metric Stability From d41329136a403d303db166db1daf723980be2a0d Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 24 Apr 2024 13:36:56 -0700 Subject: [PATCH 03/21] Update A80-grpc-metrics-for-tcp-connection --- A80-grpc-metrics-for-tcp-connection | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection index 4992e19e3..61d4b613d 100644 --- a/A80-grpc-metrics-for-tcp-connection +++ b/A80-grpc-metrics-for-tcp-connection @@ -43,8 +43,8 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | -| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. | +| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | +| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records latest throughput measured of the TCP connection. | | grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. | | grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| From 5b5ba3f3cba2d2fa6af72ab6110de211001f83a3 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Thu, 25 Apr 2024 15:36:37 -0700 Subject: [PATCH 04/21] Update A80-grpc-metrics-for-tcp-connection --- A80-grpc-metrics-for-tcp-connection | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection index 61d4b613d..3c2f592b8 100644 --- a/A80-grpc-metrics-for-tcp-connection +++ b/A80-grpc-metrics-for-tcp-connection @@ -5,7 +5,7 @@ A80: gRPC Metrics for TCP connection * Status: {Draft, In Review, Ready for Implementation, Implemented} * language: {...} * Last updated: 2024-04-18 -* Discussion at: {...} +* Discussion at: https://groups.google.com/g/grpc-io/c/AyT0LVgoqFs ## Abstract From 583e6b32ece172499301c2373ee4d5a04b28cb48 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Mon, 29 Apr 2024 14:25:04 -0700 Subject: [PATCH 05/21] Update and rename A80-grpc-metrics-for-tcp-connection to A80-grpc-metrics-for-tcp-connection.md --- ...n => A80-grpc-metrics-for-tcp-connection.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) rename A80-grpc-metrics-for-tcp-connection => A80-grpc-metrics-for-tcp-connection.md (61%) diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection.md similarity index 61% rename from A80-grpc-metrics-for-tcp-connection rename to A80-grpc-metrics-for-tcp-connection.md index 3c2f592b8..45906a316 100644 --- a/A80-grpc-metrics-for-tcp-connection +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -37,17 +37,18 @@ The metrics will have label: | Name | Disposition | Description | | ----------- | ----------- | ----------- | -| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. | +| grpc.tcp.peer_address | optional | Store the peer address info in URI format such as `ipv4:1.2.3.4:567`. | +| grpc.tcp.local_address | optional | Store the local address info in URI format such as `ipv4:1.2.3.4:567`. | The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | -| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records latest throughput measured of the TCP connection. | -| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.min_rtt | Histogram | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | +| grpc.tcp.delivery_rate | Histogram | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | +| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| ### Metric Stability @@ -60,3 +61,8 @@ criteria for that change are TBD. This proposal does not include any features enabled via external I/O, so it does not need environment variable protection. +## Implementation + +Will be implemented in C-core, but currently have no plans to implement in other languages. + + From 8aa21c1b26afd9043fc6652f6a67ba2787e2c7b3 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Mon, 29 Apr 2024 14:26:35 -0700 Subject: [PATCH 06/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 45906a316..aa82cae02 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -63,6 +63,6 @@ it does not need environment variable protection. ## Implementation -Will be implemented in C-core, but currently have no plans to implement in other languages. +Will be implemented in C-core, and currently have no plans to implement in other languages. From 9f8038c61fe5d6f991ae9d12f3449cebc603818c Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 1 May 2024 14:16:17 -0700 Subject: [PATCH 07/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index aa82cae02..52b56be09 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -44,11 +44,11 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | -| grpc.tcp.delivery_rate | Histogram | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | -| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | +| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | +| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| ### Metric Stability From 59ab138dd1945083c3de18ce27faf6e2ed6f959b Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 1 May 2024 17:35:51 -0700 Subject: [PATCH 08/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 52b56be09..c82cd5f79 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -44,12 +44,18 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | +| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | | grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | | grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | | grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +The TCP metrics are collected by enabling `SO_TIMESTAMPING` in kernel TCP through `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))`. The kernel TCP then wil capture packet timestamps on transmission. + +#### Reference: +* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 +* Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst + ### Metric Stability All metrics added in this proposal will start as experimental. The long term goal will be to From ce27a6929c95e006dd2df814e9e7afe9316c5a34 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 1 May 2024 17:49:33 -0700 Subject: [PATCH 09/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index c82cd5f79..6109e993d 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -50,7 +50,7 @@ The metrics will be exported as: | grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| -The TCP metrics are collected by enabling `SO_TIMESTAMPING` in kernel TCP through `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))`. The kernel TCP then wil capture packet timestamps on transmission. +The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked. #### Reference: * Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 From d239c39e8f8957dd2c4b0f393d888c2bc247aee4 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Fri, 10 May 2024 13:28:57 -0700 Subject: [PATCH 10/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 6109e993d..ca7c37bbc 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -16,7 +16,7 @@ This document proposes adding new TCP connection metrics to gRPC for improved ne To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79]. ### Related Proposals: -* [A79]: gRPC Non-Per-Call Metrics Framework (pending) +* [A79]: gRPC Non-Per-Call Metrics Framework [A79]: https://github.com/grpc/proposal/pull/421 @@ -46,9 +46,9 @@ The metrics will be exported as: | ------------- | ----- | ----- | ------- | ----------- | | grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | | grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | -| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked. From 0726f6e8af912ab7ca40fe3c7e3a21ba3f97881b Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 15 May 2024 14:26:33 -0700 Subject: [PATCH 11/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index ca7c37bbc..62f417f50 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -18,7 +18,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding ### Related Proposals: * [A79]: gRPC Non-Per-Call Metrics Framework -[A79]: https://github.com/grpc/proposal/pull/421 +[A79]: https://github.com/grpc/proposal/blob/master/A79-non-per-call-metrics-architecture.md ## Proposal From 3bfe76b79892fb4c77bb2905015798df4fee61a3 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 15 May 2024 14:47:33 -0700 Subject: [PATCH 12/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 62f417f50..a9c82d5f8 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -18,7 +18,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding ### Related Proposals: * [A79]: gRPC Non-Per-Call Metrics Framework -[A79]: https://github.com/grpc/proposal/blob/master/A79-non-per-call-metrics-architecture.md +[A79]: A79-non-per-call-metrics-architecture.md ## Proposal From 2ccf768a6820f1b1d1b49579c082252f6035617e Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Tue, 21 May 2024 16:30:59 -0700 Subject: [PATCH 13/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index a9c82d5f8..22e48b4bd 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -50,11 +50,19 @@ The metrics will be exported as: | grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| -The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked. + +#### Metric Collection Design + +A high-level approach to collecting TCP metrics is as follows: +1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`. +2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the time difference between when a data packet was sent and when it was acknowledged. +3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records. + #### Reference: * Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 * Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst +* Delivery Rate: https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation#name-delivery-rate ### Metric Stability @@ -70,5 +78,3 @@ it does not need environment variable protection. ## Implementation Will be implemented in C-core, and currently have no plans to implement in other languages. - - From 83ac90863229845da872a8c754b1e689774c6b9b Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Tue, 21 May 2024 18:06:44 -0700 Subject: [PATCH 14/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 22e48b4bd..b0dfac344 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -44,8 +44,8 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | -| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | +| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. RTT: packet acked timestamp - packet sent timestamp. | +| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection. Elapse time = packet acked timestamp - last packet acked timestamp. Delivery rate = packet acked bytes / elapse time. | | grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | | grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| @@ -55,7 +55,7 @@ The metrics will be exported as: A high-level approach to collecting TCP metrics is as follows: 1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`. -2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the time difference between when a data packet was sent and when it was acknowledged. +2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (byte difference / time difference) between last acked data packet and the latest acked data packet. 3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records. From 2a11aea34244e32dcafc348a72702ea1d007e947 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Tue, 21 May 2024 18:08:45 -0700 Subject: [PATCH 15/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index b0dfac344..cc811cca0 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -55,7 +55,7 @@ The metrics will be exported as: A high-level approach to collecting TCP metrics is as follows: 1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`. -2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (byte difference / time difference) between last acked data packet and the latest acked data packet. +2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (packet bytes / elapse time between last acked data packet and the latest acked data packet). 3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records. From b6dc6d94a0d4439699da802eb18e52b0d985ed2c Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Tue, 21 May 2024 18:35:55 -0700 Subject: [PATCH 16/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index cc811cca0..3b16a4bfb 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -44,8 +44,8 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. RTT: packet acked timestamp - packet sent timestamp. | -| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection. Elapse time = packet acked timestamp - last packet acked timestamp. Delivery rate = packet acked bytes / elapse time. | +| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | +| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | | grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | | grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| @@ -53,10 +53,10 @@ The metrics will be exported as: #### Metric Collection Design -A high-level approach to collecting TCP metrics is as follows: -1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`. -2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (packet bytes / elapse time between last acked data packet and the latest acked data packet). -3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records. +A high-level approach to collecting TCP metrics (on Linux) is as follows: +1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. +2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). +3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. #### Reference: From 0aceebef5d12b4981447beafee3802243bbcbf9b Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 22 May 2024 14:30:48 -0700 Subject: [PATCH 17/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 3b16a4bfb..3dce9267f 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -37,18 +37,17 @@ The metrics will have label: | Name | Disposition | Description | | ----------- | ----------- | ----------- | -| grpc.tcp.peer_address | optional | Store the peer address info in URI format such as `ipv4:1.2.3.4:567`. | -| grpc.tcp.local_address | optional | Store the local address info in URI format such as `ipv4:1.2.3.4:567`. | +| grpc.tcp.server_address | optional | Store the server address info in URI format such as `ipv4:1.2.3.4:567`. For clients, this address is the same as the peer address, while on the server side, it's the same as the local address. | The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | -| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | -| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.server_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | +| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.server_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | +| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| #### Metric Collection Design From 052d5cf58272f128da0b53ada7cc496b3a8f0c40 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Wed, 22 May 2024 16:55:23 -0700 Subject: [PATCH 18/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 3dce9267f..efa8c720d 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -33,21 +33,15 @@ We will provide the following metrics: - `grpc.tcp.packets_retransmitted` - `grpc.tcp.packets_spurious_retransmitted` -The metrics will have label: - -| Name | Disposition | Description | -| ----------- | ----------- | ----------- | -| grpc.tcp.server_address | optional | Store the server address info in URI format such as `ipv4:1.2.3.4:567`. For clients, this address is the same as the peer address, while on the server side, it's the same as the local address. | - The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.server_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | -| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.server_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | -| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | +| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | +| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| #### Metric Collection Design From 7e5bc869baa98e4a7132e4fcc7b98a5b16a4af99 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Fri, 24 May 2024 11:15:21 -0700 Subject: [PATCH 19/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index efa8c720d..d881e2ec6 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -24,7 +24,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding This document proposes changes to the following gRPC components. -#### Per-Connection TCP Metrics +### Per-Connection TCP Metrics We will provide the following metrics: - `grpc.tcp.min_rtt` @@ -43,14 +43,12 @@ The metrics will be exported as: | grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| - #### Metric Collection Design A high-level approach to collecting TCP metrics (on Linux) is as follows: 1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. 2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). -3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. - +3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. A detailed explanation of the design can be found in the Fathom documentation. #### Reference: * Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 From 092fbc197da455de073419eeb94b60c1f4a68ff7 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Fri, 24 May 2024 13:04:37 -0700 Subject: [PATCH 20/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index d881e2ec6..5b689718e 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -48,7 +48,9 @@ The metrics will be exported as: A high-level approach to collecting TCP metrics (on Linux) is as follows: 1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. 2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). -3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. A detailed explanation of the design can be found in the Fathom documentation. +3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. + +A detailed explanation of the design can be found in the Fathom documentation. #### Reference: * Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 From bd18940ab84ad7ed5acd541d9377ae4867af3ba1 Mon Sep 17 00:00:00 2001 From: nanahpang <31627465+nanahpang@users.noreply.github.com> Date: Fri, 24 May 2024 15:20:35 -0700 Subject: [PATCH 21/21] Update A80-grpc-metrics-for-tcp-connection.md --- A80-grpc-metrics-for-tcp-connection.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md index 5b689718e..f60574777 100644 --- a/A80-grpc-metrics-for-tcp-connection.md +++ b/A80-grpc-metrics-for-tcp-connection.md @@ -37,11 +37,11 @@ The metrics will be exported as: | Name | Type | Unit | Labels | Description | | ------------- | ----- | ----- | ------- | ----------- | -| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | -| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | -| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. | -| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | -| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| +| grpc.tcp.min_rtt | Histogram (floating-point) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. | +| grpc.tcp.delivery_rate | Histogram (floating-point) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. | +| grpc.tcp.packets_sent | Counter (integer) | {packet} | None | Records total packets TCP sends in the calculation period. | +| grpc.tcp.packets_retransmitted | Counter (integer) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | +| grpc.tcp.packets_spurious_retransmitted | Counter (integer) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.| #### Metric Collection Design