Tcp rto rfc 6298 compliant #4

danielmgit · 2016-06-14T12:55:29Z

tcp: use RFC6298 compliant TCP RTO calculation

This patch adjusts Linux RTO calculation to be RFC6298 Standard
compliant. MinRTO is no longer added to the computed RTO, RTO damping
and overestimation are decreased.

In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the
calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is
less.  The Linux implementation as a discrepancy to the Standard
basically adds the defined MinRTO to the calculated RTO. When
comparing both approaches, the Linux calculation seems to perform
worse for sender limited TCP flows like Telnet, SSH or constant bit
rate encoded transmissions, especially for Round Trip Times (RTT) of
50ms to 800ms.

Compared to the Linux implementation the RFC 6298 proposed RTO
calculation performs better and more precise in adapting to current
network characteristics. Extensive measurements for bulk data did not
show a negative impact of the adjusted calculation.

Using the RFC 6298 compliant implementation, the tcp_sock struct variable
u32 mdev_max_us becomes obsolete and consequently is being removed.

Exemplary performance comparison for sender-limited-flows:

- Rate: 10Mbit/s
- Delay: 200ms, Delay Variation: 10ms
- Time between each scheduled segment: 1s
- Amount Data Segments: 300
- Mean of 11 runs

         Mean Response Waiting Time [milliseconds]

PER [%] |   0.5      1    1.5      2      3      5      7     10
--------+-------------------------------------------------------
old     | 206.4  208.6  218.0  218.6  227.2  249.3  274.7  308.2
new     | 203.9  206.0  207.0  209.9  217.3  225.6  238.7  259.1

Detailed analysis:

https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/

Reasoning for historic design:

Sarolahti, P.; Kuznetsov, A. (2002). Congestion Control in Linux TCP.
Conference Paper. Proceedings of the FREENIX Track. 2002 USENIX Annual
https://www.cs.helsinki.fi/research/iwtcp/papers/linuxtcp.pdf

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Daniel Metz <dmetz@mytum.de>

Commit 7f85442 ("phy: Add API for {un}registering an mdio device to a bus.") broke PHY detection on this driver with a copy-paste bug: The code is looking 32 times for a PHY at address 0. Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have their (autodetected) PHYs at address 31. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

This reverts commit a2f2721. I applied the wrong version of this. Signed-off-by: David S. Miller <davem@davemloft.net>

Commit 7f85442 ("phy: Add API for {un}registering an mdio device to a bus.") broke PHY detection on this driver with a copy-paste bug: The code is looking 32 times for a PHY at address 0. Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have their (autodetected) PHYs at address 31. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

>> net/ipv4/ipconfig.c:130:15: warning: 'ic_addrservaddr' defined but not used [-Wunused-variable] static __be32 ic_addrservaddr = NONE; /* IP Address of the IP addresses'server */ Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>

At Qdisc creation or change time, prio_tune() creates missing pfifo qdiscs but does not return an error code if one qdisc could not be allocated. Leaving a qdisc in non operational state without telling user anything about this problem is not good. Also, testing if we replace something different than noop_qdisc a second time makes no sense so I removed useless code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

There is no reason to destroy channels that are destroyed while cpdma_ctlr destroy. In this case no need to remember how much channels where created and destroy them by one, as cpdma_ctlr destroys all of them. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>

This patch adjusts Linux RTO calculation to be RFC6298 Standard compliant. MinRTO is no longer added to the computed RTO, RTO damping and overestimation are decreased. In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is less. The Linux implementation as a discrepancy to the Standard basically adds the defined MinRTO to the calculated RTO. When comparing both approaches, the Linux calculation seems to perform worse for sender limited TCP flows like Telnet, SSH or constant bit rate encoded transmissions, especially for Round Trip Times (RTT) of 50ms to 800ms. Compared to the Linux implementation the RFC 6298 proposed RTO calculation performs better and more precise in adapting to current network characteristics. Extensive measurements for bulk data did not show a negative impact of the adjusted calculation. Using the RFC 6298 compliant implementation, the tcp_sock struct variable u32 mdev_max_us becomes obsolete and consequently is being removed. Exemplary performance comparison for sender-limited-flows: - Rate: 10Mbit/s - Delay: 200ms, Delay Variation: 10ms - Time between each scheduled segment: 1s - Amount Data Segments: 300 - Mean of 11 runs Mean Response Waiting Time [milliseconds] PER [%] | 0.5 1 1.5 2 3 5 7 10 --------+------------------------------------------------------- old | 206.4 208.6 218.0 218.6 227.2 249.3 274.7 308.2 new | 203.9 206.0 207.0 209.9 217.3 225.6 238.7 259.1 Detailed analysis: https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/ Reasoning for historic design: Sarolahti, P.; Kuznetsov, A. (2002). Congestion Control in Linux TCP. Conference Paper. Proceedings of the FREENIX Track. 2002 USENIX Annual https://www.cs.helsinki.fi/research/iwtcp/papers/linuxtcp.pdf Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Daniel Metz <dmetz@mytum.de> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com>

We don't need to hold the local pinctrl lock here to set irq wake on the summary irq line. Doing so only leads to lockdep warnings instead of protecting us from anything. Remove the locking. WARNING: possible circular locking dependency detected 5.4.11 #2 Tainted: G W ------------------------------------------------------ cat/3083 is trying to acquire lock: ffffff81f4fa58c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 but task is already holding lock: ffffff81f4880c18 (&pctrl->lock){-.-.}, at: msm_gpio_irq_set_wake+0x48/0x7c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pctrl->lock){-.-.}: _raw_spin_lock_irqsave+0x64/0x80 msm_gpio_irq_ack+0x68/0xf4 __irq_do_set_handler+0xe0/0x180 __irq_set_handler+0x60/0x9c irq_domain_set_info+0x90/0xb4 gpiochip_hierarchy_irq_domain_alloc+0x110/0x200 __irq_domain_alloc_irqs+0x130/0x29c irq_create_fwspec_mapping+0x1f0/0x300 irq_create_of_mapping+0x70/0x98 of_irq_get+0xa4/0xd4 spi_drv_probe+0x4c/0xb0 really_probe+0x138/0x3f0 driver_probe_device+0x70/0x140 __device_attach_driver+0x9c/0x110 bus_for_each_drv+0x88/0xd0 __device_attach+0xb0/0x160 device_initial_probe+0x20/0x2c bus_probe_device+0x34/0x94 device_add+0x35c/0x3f0 spi_add_device+0xbc/0x194 of_register_spi_devices+0x2c8/0x408 spi_register_controller+0x57c/0x6fc spi_geni_probe+0x260/0x328 platform_drv_probe+0x90/0xb0 really_probe+0x138/0x3f0 driver_probe_device+0x70/0x140 device_driver_attach+0x4c/0x6c __driver_attach+0xcc/0x154 bus_for_each_dev+0x84/0xcc driver_attach+0x2c/0x38 bus_add_driver+0x108/0x1fc driver_register+0x64/0xf8 __platform_driver_register+0x4c/0x58 spi_geni_driver_init+0x1c/0x24 do_one_initcall+0x1a4/0x3e8 do_initcall_level+0xb4/0xcc do_basic_setup+0x30/0x48 kernel_init_freeable+0x124/0x1a8 kernel_init+0x14/0x100 ret_from_fork+0x10/0x18 -> #0 (&irq_desc_lock_class){-.-.}: __lock_acquire+0xeb4/0x2388 lock_acquire+0x1cc/0x210 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 msm_gpio_irq_set_wake+0x5c/0x7c set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 cros_ec_rtc_suspend+0x38/0x4c platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x310/0x41c dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x620 pm_suspend+0x210/0x348 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x15c/0x1fc __vfs_write+0x54/0x18c vfs_write+0xe4/0x1a4 ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_handler+0x7c/0x98 el0_svc+0x8/0xc other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pctrl->lock); lock(&irq_desc_lock_class); lock(&pctrl->lock); lock(&irq_desc_lock_class); *** DEADLOCK *** 7 locks held by cat/3083: #0: ffffff81f06d1420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4 #1: ffffff81c8935680 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc #2: ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc #3: ffffffe89a641d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348 #4: ffffff81f190e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c #5: ffffff81f183d8c0 (lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 #6: ffffff81f4880c18 (&pctrl->lock){-.-.}, at: msm_gpio_irq_set_wake+0x48/0x7c stack backtrace: CPU: 4 PID: 3083 Comm: cat Tainted: G W 5.4.11 #2 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xc8/0x124 print_circular_bug+0x2ac/0x2c4 check_noncircular+0x1a0/0x1a8 __lock_acquire+0xeb4/0x2388 lock_acquire+0x1cc/0x210 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 msm_gpio_irq_set_wake+0x5c/0x7c set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 cros_ec_rtc_suspend+0x38/0x4c platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x310/0x41c dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x620 pm_suspend+0x210/0x348 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x15c/0x1fc __vfs_write+0x54/0x18c vfs_write+0xe4/0x1a4 ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_handler+0x7c/0x98 el0_svc+0x8/0xc Fixes: 6aced33 ("pinctrl: msm: drop wake_irqs bitmap") Cc: Douglas Anderson <dianders@chromium.org> Cc: Brian Masney <masneyb@onstation.org> Cc: Lina Iyer <ilina@codeaurora.org> Cc: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20200121180950.36959-1-swboyd@chromium.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Ido Schimmel says: ==================== mlxsw: Offload TBF Petr says: In order to allow configuration of shapers on Spectrum family of machines, recognize TBF either as root Qdisc, or as a child of ETS or PRIO. Configure rate of maximum shaper according to TBF rate setting, and maximum shaper burst size according to TBF burst setting. - Patches #1 and #2 make the TBF shaper suitable for offloading. - Patches #3, #4 and #5 are refactoring aimed at easier support of leaf Qdiscs in general. - Patches #6 to torvalds#10 gradually introduce TBF offload. - Patches torvalds#11 to torvalds#14 add selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

mlauss2 and others added 7 commits June 14, 2016 15:35

Revert "net: au1000_eth: fix PHY detection"

3ab51f9

This reverts commit a2f2721. I applied the wrong version of this. Signed-off-by: David S. Miller <davem@davemloft.net>

danielmgit force-pushed the tcp-rto-rfc-6298-compliant branch from 539ba76 to 2ce8335 Compare June 14, 2016 13:48

danielmgit closed this Jun 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tcp rto rfc 6298 compliant #4

Tcp rto rfc 6298 compliant #4

danielmgit commented Jun 14, 2016

Tcp rto rfc 6298 compliant #4

Tcp rto rfc 6298 compliant #4

Conversation

danielmgit commented Jun 14, 2016