IPVS low throughput #70747

annProg · 2018-11-07T14:31:13Z

What happened:
Service A with clusterIP 169.169.14.147 and a domain k8s.dev.example.com, the domain resolve to ingress-nginx. Anothor service B will request A, when use domain it works fine. I think using clusterIP directly will get better performance because it saves some time for dns resolve and ingress-nginx proxy, so I change domain to clusterIP. But the result was not expected, it got worse, very low throughput and high latency.

What you expected to happen:
On the above situation, clusterIP should have better performance than ingress-nginx.

How to reproduce it (as minimally and precisely as possible):
ipvs mode, do test with ab

# clusterIP
ab -c100 -t300 -n20000000 -r http://169.169.14.147/

# domain resolve to ingress-nginx
ab -c100 -t300 -n20000000 -r http://k8s.dev.example.com/

Anything else we need to know?:
I install an ipvs enviroment manually and do some test, and found it possiblly because ipvs will use iptables to realize SNAT

Environment:

Kubernetes version (use kubectl version): 1.11.1
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): centos 7.2
Kernel (e.g. uname -a): 3.10
Install tools: binary
Others:

/kind bug

The text was updated successfully, but these errors were encountered:

annProg · 2018-11-07T15:38:51Z

@kubernetes/sig-network-bugs

k8s-ci-robot · 2018-11-07T15:38:59Z

@annProg: Reiterating the mentions to trigger a notification:
@kubernetes/sig-network-bugs

In response to this:

@kubernetes/sig-network-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

annProg · 2018-11-10T04:49:12Z

I try to change some kernel param and found conn_reuse_mode is related to this issue
net.ipv4.vs.conn_reuse_mode , default 1. change to 0，then complete about 3000k request, pretty good performance.

So, should I change the param in production? Does it have any side effects？

annProg · 2018-11-10T04:57:01Z

/area ipvs

annProg · 2018-11-10T04:59:47Z

@m1093782566 Can you look at it?

m1093782566 · 2018-11-10T05:06:38Z

I try to change some kernel param and found conn_reuse_mode is related to this issue
net.ipv4.vs.conn_reuse_mode , default 1. change to 0，then complte about 3000k request, pretty good performance.

Yes, seems you find the key to the problem - it's set to 0 in my environment and I have not found any issue so far(it's about IPVS module behavior and we will not find net.ipv4.vs.conn_reuse_mode parameter in old kernels ).

BTW, you can also try to set the ipvs conntrack(https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L163) to 0 - it will improve performance as well.

annProg · 2018-11-10T05:34:21Z

I try to change some kernel param and found conn_reuse_mode is related to this issue
net.ipv4.vs.conn_reuse_mode , default 1. change to 0，then complte about 3000k request, pretty good performance.

Yes, seems you find the key to the problem - it's set to 0 in my environment and I have not found any issue so far(it's about IPVS module behavior and we will not find net.ipv4.vs.conn_reuse_mode parameter in old kernels ).

BTW, you can also set the ipvs conntrack(https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L163) to 0 since we are no longer depend on conntrack any more - it will improve performance as well.

Thanks for reply.

I am using kubernetes 1.11.1 , and after set sysctl -w net.ipv4.vs.conntrack=0, clusterIP can not be reachable. Which version do I need ?

m1093782566 · 2018-11-10T06:22:09Z

oops, have you ever configured Cluster-CIDR or iptables masq-all for kube-proxy?

annProg · 2018-11-10T06:24:27Z

oops, have you ever configure Cluster-CIDR or iptables masq-all for kube-proxy?

Yes, I configure Cluster-CIDR

# cat kube-proxy.config.yaml 
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 10.11.3.18
clientConnection:
  kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
clusterCIDR: 172.20.0.0/14
healthzBindAddress: 10.11.3.18:10256
hostnameOverride: 10.11.3.18
kind: KubeProxyConfiguration
metricsBindAddress: 10.11.3.18:10249
mode: "ipvs"

m1093782566 · 2018-11-10T06:26:58Z

Sorry, Cluster IP still needs the IPVS conntrack and nodeport does not rely on it any more - we are trying to figure it out for Cluster IP.

annProg · 2018-11-10T06:30:22Z

Sorry, Cluster IP still need the IPVS conntrack and nodeport does not rely on it any more - we are trying to figure it out for Cluster IP.

Got it, thanks!

m1093782566 · 2018-11-10T07:01:07Z

cc @Lion-Wei

Should we tune the kernel parameter in kube-proxy? Given that many users complain about the performance issue, probably the answer is YES.

dawidmalina · 2018-11-10T10:14:36Z

There is a nice article describing even more kernel parameters to tune ipvs: https://jsravn.com/2017/12/24/ipvs-with-kubernetes-ingress/

annProg · 2018-11-10T14:57:30Z

Side effect like this comment: cloudnativelabs/kube-router#544 (comment)

Note: time in picture below is UTC+8.

Abviously every time scaledown , response time will increase.
@m1093782566 Any idea for this?

m1093782566 · 2018-11-11T03:22:56Z

@annProg

Thanks for reporting. Have you ever read the article posted by @dawidmalina? In this article, some ipvs parameters can be tuned, e.g.

net.ipv4.vs.sloppy_tcp=1 allows the passive node to handle preexisting connections (otherwise IPVS will ignore any TCP packets it doesn’t handle the initial SYN for).
net.ipv4.vs.expire_nodest_conn=1 to remove stale/dead connections so further packets will get resets, letting clients quickly retry.
net.ipv4.vs.conn_reuse_mode=2 so IPVS can properly detect terminated connections with direct return. Without this, it’s possible for source port reuse to lead to broken connections.
# be careful because if we set conn_reuse_mode to 2, ipvs will effectively set expire_nodes_conn to 0.

annProg · 2018-11-13T15:31:14Z

@m1093782566 yes, I had test these parameters

net.ipv4.vs.sloppy_tcp not found in centos 7.2(kernel 3.10)
net.ipv4.vs.expire_nodest_conn will not affect the result
net.ipv4.vs.conn_reuse_mode=2 is similar with 1, will be low throughput

According to this issue #57841 ,I guess it may be the root cause, so I upgrade my setup from 1.11.1 to 1.12.2.
But did not solve this problem, and found a new problem, rs delete failed after scaledown. I had open a new issue to this problem: #70997

Lion-Wei · 2018-11-16T07:34:14Z

/assign

Lion-Wei · 2018-11-16T07:49:17Z

Hi, @annProg @dawidmalina @m1093782566 , I raised a pr for this issue. Please check. : )
@jsravn Looking forward for your advise.

m1093782566 · 2018-11-20T02:07:29Z

@annProg

Could you please check out the HEAD to see if your issues have been resolved?

jsravn · 2018-11-20T13:09:24Z

It's not a big surprise this benchmark performs better with conn_reuse_mode=0. ab uses http/1.0 so is creating a new connection for every single request. With conn_reuse_mode=0 IPVS will reuse the existing connection entry if source port matches - sending repeated requests to the same real server rather than actually load balancing. So my guess is you will see an unbalanced load with using conn_reuse_mode=0 and ab, but it makes the benchmark look better.

I'm not sure this benchmark is a realistic scenario. I suggest using a tool like wrk instead which uses persistent connections and see what happens.

[ Upstream commit f678ce8cc3cb2ad29df75d8824c74f36398ba871 ] ddebug_describe_flags() currently fills a caller provided string buffer, after testing its size (also passed) in a BUG_ON. Fix this by replacing them with a known-big-enough string buffer wrapped in a struct, and passing that instead. Also simplify ddebug_describe_flags() flags parameter from a struct to a member in that struct, and hoist the member deref up to the caller. This makes the function reusable (soon) where flags are unpacked. Acked-by: <jbaron@akamai.com> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20200719231058.1586423-8-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> bcache: fix super block seq numbers comparision in register_cache_set() [ Upstream commit 117f636ea695270fe492d0c0c9dfadc7a662af47 ] In register_cache_set(), c is pointer to struct cache_set, and ca is pointer to struct cache, if ca->sb.seq > c->sb.seq, it means this registering cache has up to date version and other members, the in- memory version and other members should be updated to the newer value. But current implementation makes a cache set only has a single cache device, so the above assumption works well except for a special case. The execption is when a cache device new created and both ca->sb.seq and c->sb.seq are 0, because the super block is never flushed out yet. In the location for the following if() check, 2156 if (ca->sb.seq > c->sb.seq) { 2157 c->sb.version = ca->sb.version; 2158 memcpy(c->sb.set_uuid, ca->sb.set_uuid, 16); 2159 c->sb.flags = ca->sb.flags; 2160 c->sb.seq = ca->sb.seq; 2161 pr_debug("set version = %llu\n", c->sb.version); 2162 } c->sb.version is not initialized yet and valued 0. When ca->sb.seq is 0, the if() check will fail (because both values are 0), and the cache set version, set_uuid, flags and seq won't be updated. The above problem is hiden for current code, because the bucket size is compatible among different super block version. And the next time when running cache set again, ca->sb.seq will be larger than 0 and cache set super block version will be updated properly. But if the large bucket feature is enabled, sb->bucket_size is the low 16bits of the bucket size. For a power of 2 value, when the actual bucket size exceeds 16bit width, sb->bucket_size will always be 0. Then read_super_common() will fail because the if() check to is_power_of_2(sb->bucket_size) is false. This is how the long time hidden bug is triggered. This patch modifies the if() check to the following way, 2156 if (ca->sb.seq > c->sb.seq || c->sb.seq == 0) { Then cache set's version, set_uuid, flags and seq will always be updated corectly including for a new created cache device. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> ACPICA: Do not increment operation_region reference counts for field units [ Upstream commit 6a54ebae6d047c988a31f5ac5a64ab5cf83797a2 ] ACPICA commit e17b28cfcc31918d0db9547b6b274b09c413eb70 Object reference counts are used as a part of ACPICA's garbage collection mechanism. This mechanism keeps track of references to heap-allocated structures such as the ACPI operand objects. Recent server firmware has revealed that this reference count can overflow on large servers that declare many field units under the same operation_region. This occurs because each field unit declaration will add a reference count to the source operation_region. This change solves the reference count overflow for operation_regions objects by preventing fieldunits from incrementing their operation_region's reference count. Each operation_region's reference count will not be changed by named objects declared under the Field operator. During namespace deletion, the operation_region namespace node will be deleted and each fieldunit will be deleted without touching the deleted operation_region object. Link: https://github.com/acpica/acpica/commit/e17b28cf Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> agp/intel: Fix a memory leak on module initialisation failure [ Upstream commit b975abbd382fe442713a4c233549abb90e57c22b ] In intel_gtt_setup_scratch_page(), pointer "page" is not released if pci_dma_mapping_error() return an error, leading to a memory leak on module initialisation failure. Simply fix this issue by freeing "page" before return. Fixes: 0e87d2b06cb46 ("intel-gtt: initialize our own scratch page") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200522083451.7448-1-chris@chris-wilson.co.uk Signed-off-by: Sasha Levin <sashal@kernel.org> video: fbdev: sm712fb: fix an issue about iounmap for a wrong address [ Upstream commit 98bd4f72988646c35569e1e838c0ab80d06c77f6 ] the sfb->fb->screen_base is not save the value get by iounmap() when the chip id is 0x720. so iounmap() for address sfb->fb->screen_base is not right. Fixes: 1461d6672864854 ("staging: sm7xxfb: merge sm712fb with fbdev") Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Teddy Wang <teddy.wang@siliconmotion.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422160719.27763-1-zhengdejin5@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> console: newport_con: fix an issue about leak related system resources [ Upstream commit fd4b8243877250c05bb24af7fea5567110c9720b ] A call of the function do_take_over_console() can fail here. The corresponding system resources were not released then. Thus add a call of iounmap() and release_mem_region() together with the check of a failure predicate. and also add release_mem_region() on device removal. Fixes: e86bb8acc0fdc ("[PATCH] VT binding: Make newport_con support binding") Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200423164251.3349-1-zhengdejin5@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> video: pxafb: Fix the function used to balance a 'dma_alloc_coherent()' call [ Upstream commit 499a2c41b954518c372873202d5e7714e22010c4 ] 'dma_alloc_coherent()' must be balanced by a call to 'dma_free_coherent()' not 'dma_free_wc()'. The correct dma_free_ function is already used in the error handling path of the probe function. Fixes: 77e196752bdd ("[ARM] pxafb: allow video memory size to be configurable") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Jani Nikula <jani.nikula@intel.com> cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Eric Miao <eric.miao@marvell.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200429084505.108897-1-christophe.jaillet@wanadoo.fr Signed-off-by: Sasha Levin <sashal@kernel.org> iio: improve IIO_CONCENTRATION channel type description [ Upstream commit df16c33a4028159d1ba8a7061c9fa950b58d1a61 ] IIO_CONCENTRATION together with INFO_RAW specifier is used for reporting raw concentrations of pollutants. Raw value should be meaningless before being properly scaled. Because of that description shouldn't mention raw value unit whatsoever. Fix this by rephrasing existing description so it follows conventions used throughout IIO ABI docs. Fixes: 8ff6b3bc94930 ("iio: chemical: Add IIO_CONCENTRATION channel type") Signed-off-by: Tomasz Duszynski <tomasz.duszynski@octakon.com> Acked-by: Matt Ranostay <matt.ranostay@konsulko.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drm/arm: fix unintentional integer overflow on left shift [ Upstream commit 5f368ddea6fec519bdb93b5368f6a844b6ea27a6 ] Shifting the integer value 1 is evaluated using 32-bit arithmetic and then used in an expression that expects a long value leads to a potential integer overflow. Fix this by using the BIT macro to perform the shift to avoid the overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: ad49f8602fe8 ("drm/arm: Add support for Mali Display Processors") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200618100400.11464-1-colin.king@canonical.com Signed-off-by: Sasha Levin <sashal@kernel.org> leds: lm355x: avoid enum conversion warning [ Upstream commit 985b1f596f9ed56f42b8c2280005f943e1434c06 ] clang points out that doing arithmetic between diffent enums is usually a mistake: drivers/leds/leds-lm355x.c:167:28: warning: bitwise operation between different enumeration types ('enum lm355x_tx2' and 'enum lm355x_ntc') [-Wenum-enum-conversion] reg_val = pdata->pin_tx2 | pdata->ntc_pin; ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ drivers/leds/leds-lm355x.c:178:28: warning: bitwise operation between different enumeration types ('enum lm355x_tx2' and 'enum lm355x_ntc') [-Wenum-enum-conversion] reg_val = pdata->pin_tx2 | pdata->ntc_pin | pdata->pass_mode; ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ In this driver, it is intentional, so add a cast to hide the false-positive warning. It appears to be the only instance of this warning at the moment. Fixes: b98d13c72592 ("leds: Add new LED driver for lm355x chips") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sasha Levin <sashal@kernel.org> media: omap3isp: Add missed v4l2_ctrl_handler_free() for preview_init_entities() [ Upstream commit dc7690a73017e1236202022e26a6aa133f239c8c ] preview_init_entities() does not call v4l2_ctrl_handler_free() when it fails. Add the missed function to fix it. Fixes: de1135d44f4f ("[media] omap3isp: CCDC, preview engine and resizer") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> ASoC: Intel: bxt_rt298: add missing .owner field [ Upstream commit 88cee34b776f80d2da04afb990c2a28c36799c43 ] This field is required for ASoC cards. Not setting it will result in a module->name pointer being NULL and generate problems such as cat /proc/asound/modules 0 (efault) Fixes: 76016322ec56 ('ASoC: Intel: Add Broxton-P machine driver') Reported-by: Jaroslav Kysela <perex@perex.cz> Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20200625191308.3322-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> scsi: cumana_2: Fix different dev_id between request_irq() and free_irq() [ Upstream commit 040ab9c4fd0070cd5fa71ba3a7b95b8470db9b4d ] The dev_id used in request_irq() and free_irq() should match. Use 'info' in both cases. Link: https://lore.kernel.org/r/20200625204730.943520-1-christophe.jaillet@wanadoo.fr Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drm/mipi: use dcs write for mipi_dsi_dcs_set_tear_scanline [ Upstream commit 7a05c3b6d24b8460b3cec436cf1d33fac43c8450 ] The helper uses the MIPI_DCS_SET_TEAR_SCANLINE, although it's currently using the generic write. This does not look right. Perhaps some platforms don't distinguish between the two writers? Cc: Robert Chiras <robert.chiras@nxp.com> Cc: Vinay Simha BN <simhavcs@gmail.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Thierry Reding <treding@nvidia.com> Fixes: e83950816367 ("drm/dsi: Implement set tear scanline") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20200505160329.2976059-3-emil.l.velikov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> cxl: Fix kobject memleak [ Upstream commit 85c5cbeba8f4fb28e6b9bfb3e467718385f78f76 ] Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Fix it by adding a call to kobject_put() in the error path of kobject_init_and_add(). Fixes: b087e6190ddc ("cxl: Export optional AFU configuration record in sysfs") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Link: https://lore.kernel.org/r/20200602120733.5943-1-wanghai38@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> drm/radeon: fix array out-of-bounds read and write issues [ Upstream commit 7ee78aff9de13d5dccba133f4a0de5367194b243 ] There is an off-by-one bounds check on the index into arrays table->mc_reg_address and table->mc_reg_table_entry[k].mc_data[j] that can lead to reads and writes outside of arrays. Fix the bound checking off-by-one error. Addresses-Coverity: ("Out-of-bounds read/write") Fixes: cc8dbbb4f62a ("drm/radeon: add dpm support for CI dGPUs (v2)") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org> scsi: powertec: Fix different dev_id between request_irq() and free_irq() [ Upstream commit d179f7c763241c1dc5077fca88ddc3c47d21b763 ] The dev_id used in request_irq() and free_irq() should match. Use 'info' in both cases. Link: https://lore.kernel.org/r/20200626035948.944148-1-christophe.jaillet@wanadoo.fr Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> scsi: eesox: Fix different dev_id between request_irq() and free_irq() [ Upstream commit 86f2da1112ccf744ad9068b1d5d9843faf8ddee6 ] The dev_id used in request_irq() and free_irq() should match. Use 'info' in both cases. Link: https://lore.kernel.org/r/20200626040553.944352-1-christophe.jaillet@wanadoo.fr Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> ipvs: allow connection reuse for unconfirmed conntrack [ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 https://github.com/kubernetes/kubernetes/issues/70747 - Apache Bench can fill up ipvs service proxy in seconds #544 https://github.com/cloudnativelabs/kube-router/issues/544 - Additional 1s latency in `host -> service IP -> pod` https://github.com/kubernetes/kubernetes/issues/90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> media: firewire: Using uninitialized values in node_probe() [ Upstream commit 2505a210fc126599013aec2be741df20aaacc490 ] If fw_csr_string() returns -ENOENT, then "name" is uninitialized. So then the "strlen(model_names[i]) <= name_len" is true because strlen() is unsigned and -ENOENT is type promoted to a very high positive value. Then the "strncmp(name, model_names[i], name_len)" uses uninitialized data because "name" is uninitialized. Fixes: 92374e886c75 ("[media] firedtv: drop obsolete backend abstraction") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> media: exynos4-is: Add missed check for pinctrl_lookup_state() [ Upstream commit 18ffec750578f7447c288647d7282c7d12b1d969 ] fimc_md_get_pinctrl() misses a check for pinctrl_lookup_state(). Add the missed check to fix it. Fixes: 4163851f7b99 ("[media] s5p-fimc: Use pinctrl API for camera ports configuration]") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> xfs: fix reflink quota reservation accounting error [ Upstream commit 83895227aba1ade33e81f586aa7b6b1e143096a5 ] Quota reservations are supposed to account for the blocks that might be allocated due to a bmap btree split. Reflink doesn't do this, so fix this to make the quota accounting more accurate before we start rearranging things. Fixes: 862bb360ef56 ("xfs: reflink extents from one file to another") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> PCI: Fix pci_cfg_wait queue locking problem [ Upstream commit 2a7e32d0547f41c5ce244f84cf5d6ca7fccee7eb ] The pci_cfg_wait queue is used to prevent user-space config accesses to devices while they are recovering from reset. Previously we used these operations on pci_cfg_wait: __add_wait_queue(&pci_cfg_wait, ...) __remove_wait_queue(&pci_cfg_wait, ...) wake_up_all(&pci_cfg_wait) The wake_up acquires the wait queue lock, but the add and remove do not. Originally these were all protected by the pci_lock, but cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and p->pi_lock"), moved wake_up_all() outside pci_lock, so it could race with add/remove operations, which caused occasional kernel panics, e.g., during vfio-pci hotplug/unplug testing: Unable to handle kernel read from unreadable memory at virtual address ffff802dac469000 Resolve this by using wait_event() instead of __add_wait_queue() and __remove_wait_queue(). The wait queue lock is held by both wait_event() and wake_up_all(), so it provides mutual exclusion. Fixes: cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and p->pi_lock") Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/T/#u Based-on: https://lore.kernel.org/linux-pci/20191210031527.40136-1-zhengxiang9@huawei.com/ Based-on-patch-by: Xiang Zheng <zhengxiang9@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiang Zheng <zhengxiang9@huawei.com> Cc: Heyi Guo <guoheyi@huawei.com> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org> leds: core: Flush scheduled work for system suspend [ Upstream commit 302a085c20194bfa7df52e0fe684ee0c41da02e6 ] Sometimes LED won't be turned off by LED_CORE_SUSPENDRESUME flag upon system suspend. led_set_brightness_nopm() uses schedule_work() to set LED brightness. However, there's no guarantee that the scheduled work gets executed because no one flushes the work. So flush the scheduled work to make sure LED gets turned off. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Fixes: 81fe8e5b73e3 ("leds: core: Add led_set_brightness_nosleep{nopm} functions") Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sasha Levin <sashal@kernel.org> drm: panel: simple: Fix bpc for LG LB070WV8 panel [ Upstream commit a6ae2fe5c9f9fd355a48fb7d21c863e5b20d6c9c ] The LG LB070WV8 panel incorrectly reports a 16 bits per component value, while the panel uses 8 bits per component. Fix it. Fixes: dd0150026901 ("drm/panel: simple: Add support for LG LB070WV8 800x480 7" panel") Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20200711225317.28476-1-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org> drm/bridge: sil_sii8620: initialize return of sii8620_readb [ Upstream commit 02cd2d3144653e6e2a0c7ccaa73311e48e2dc686 ] clang static analysis flags this error sil-sii8620.c:184:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; ^~~~~~~~~~ sii8620_readb calls sii8620_read_buf. sii8620_read_buf can return without setting its output pararmeter 'ret'. So initialize ret. Fixes: ce6e153f414a ("drm/bridge: add Silicon Image SiI8620 driver") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20200712152453.27510-1-trix@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org> scsi: scsi_debug: Add check for sdebug_max_queue during module init [ Upstream commit c87bf24cfb60bce27b4d2c7e56ebfd86fb9d16bb ] sdebug_max_queue should not exceed SDEBUG_CANQUEUE, otherwise crashes like this can be triggered by passing an out-of-range value: Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 pstate: 20400009 (nzCv daif +PAN -UAO BTYPE=--) pc : schedule_resp+0x2a4/0xa70 [scsi_debug] lr : schedule_resp+0x52c/0xa70 [scsi_debug] sp : ffff800022ab36f0 x29: ffff800022ab36f0 x28: ffff0023a935a610 x27: ffff800008e0a648 x26: 0000000000000003 x25: ffff0023e84f3200 x24: 00000000003d0900 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0023be60a320 x20: ffff0023be60b538 x19: ffff800008e13000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000001 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 00000000000000c1 x5 : 0000020000200000 x4 : dead0000000000ff x3 : 0000000000000200 x2 : 0000000000000200 x1 : ffff800008e13d88 x0 : 0000000000000000 Call trace: schedule_resp+0x2a4/0xa70 [scsi_debug] scsi_debug_queuecommand+0x2c4/0x9e0 [scsi_debug] scsi_queue_rq+0x698/0x840 __blk_mq_try_issue_directly+0x108/0x228 blk_mq_request_issue_directly+0x58/0x98 blk_mq_try_issue_list_directly+0x5c/0xf0 blk_mq_sched_insert_requests+0x18c/0x200 blk_mq_flush_plug_list+0x11c/0x190 blk_flush_plug_list+0xdc/0x110 blk_finish_plug+0x38/0x210 blkdev_direct_IO+0x450/0x4d8 generic_file_read_iter+0x84/0x180 blkdev_read_iter+0x3c/0x50 aio_read+0xc0/0x170 io_submit_one+0x5c8/0xc98 __arm64_sys_io_submit+0x1b0/0x258 el0_svc_common.constprop.3+0x68/0x170 do_el0_svc+0x24/0x90 el0_sync_handler+0x13c/0x1a8 el0_sync+0x158/0x180 Code: 528847e0 72a001e0 6b00003f 540018cd (3941c340) In addition, it should not be less than 1. So add checks for these, and fail the module init for those cases. [mkp: changed if condition to match error message] Link: https://lore.kernel.org/r/1594297400-24756-2-git-send-email-john.garry@huawei.com Fixes: c483739430f1 ("scsi_debug: add multiple queue support") Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> mwifiex: Prevent memory corruption handling keys [ Upstream commit e18696786548244914f36ec3c46ac99c53df99c3 ] The length of the key comes from the network and it's a 16 bit number. It needs to be capped to prevent a buffer overflow. Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200708115857.GA13729@mwanda Signed-off-by: Sasha Levin <sashal@kernel.org> powerpc/vdso: Fix vdso cpu truncation [ Upstream commit a9f675f950a07d5c1dbcbb97aabac56f5ed085e3 ] The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org Signed-off-by: Sasha Levin <sashal@kernel.org> staging: rtl8192u: fix a dubious looking mask before a shift [ Upstream commit c4283950a9a4d3bf4a3f362e406c80ab14f10714 ] Currently the masking of ret with 0xff and followed by a right shift of 8 bits always leaves a zero result. It appears the mask of 0xff is incorrect and should be 0xff00, but I don't have the hardware to test this. Fix this to mask the upper 8 bits before shifting. [ Not tested ] Addresses-Coverity: ("Operands don't affect result") Fixes: 8fc8598e61f6 ("Staging: Added Realtek rtl8192u driver to staging") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200716154720.1710252-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> PCI/ASPM: Add missing newline in sysfs 'policy' [ Upstream commit 3167e3d340c092fd47924bc4d23117a3074ef9a9 ] When I cat ASPM parameter 'policy' by sysfs, it displays as follows. Add a newline for easy reading. Other sysfs attributes already include a newline. [root@localhost ~]# cat /sys/module/pcie_aspm/parameters/policy [default] performance powersave powersupersave [root@localhost ~]# Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") Link: https://lore.kernel.org/r/1594972765-10404-1-git-send-email-wangxiongfeng2@huawei.com Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drm/imx: tve: fix regulator_disable error path [ Upstream commit 7bb58b987fee26da2a1665c01033022624986b7c ] Add missing regulator_disable() as devm_action to avoid dedicated unbind() callback and fix the missing error handling. Fixes: fcbc51e54d2a ("staging: drm/imx: Add support for Television Encoder (TVEv2)") Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> USB: serial: iuu_phoenix: fix led-activity helpers [ Upstream commit de37458f8c2bfc465500a1dd0d15dbe96d2a698c ] The set-led command is eight bytes long and starts with a command byte followed by six bytes of RGB data and ends with a byte encoding a frequency (see iuu_led() and iuu_rgbf_fill_buffer()). The led activity helpers had a few long-standing bugs which corrupted the command packets by inserting a second command byte and thereby offsetting the RGB data and dropping the frequency in non-xmas mode. In xmas mode, a related off-by-one error left the frequency field uninitialised. Fixes: 60a8fc017103 ("USB: add iuu_phoenix driver") Reported-by: George Spelvin <lkml@sdf.org> Link: https://lore.kernel.org/r/20200716085056.31471-1-johan@kernel.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor() [ Upstream commit 0f348db01fdf128813fdd659fcc339038fb421a4 ] This condition is reversed and will cause breakage. Fixes: 7440f518dad9 ("thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200616091949.GA11940@mwanda Signed-off-by: Sasha Levin <sashal@kernel.org> coresight: tmc: Fix TMC mode read in tmc_read_unprepare_etb() [ Upstream commit d021f5c5ff679432c5e9faee0fd7350db2efb97c ] Reading TMC mode register without proper coresight power management can lead to exceptions like the one in the call trace below in tmc_read_unprepare_etb() when the trace data is read after the sink is disabled. So fix this by having a check for coresight sysfs mode before reading TMC mode management register in tmc_read_unprepare_etb() similar to tmc_read_prepare_etb(). SError Interrupt on CPU6, code 0xbe000411 -- SError pstate: 80400089 (Nzcv daIf +PAN -UAO) pc : tmc_read_unprepare_etb+0x74/0x108 lr : tmc_read_unprepare_etb+0x54/0x108 sp : ffffff80d9507c30 x29: ffffff80d9507c30 x28: ffffff80b3569a0c x27: 0000000000000000 x26: 00000000000a0001 x25: ffffff80cbae9550 x24: 0000000000000010 x23: ffffffd07296b0f0 x22: ffffffd0109ee028 x21: 0000000000000000 x20: ffffff80d19e70e0 x19: ffffff80d19e7080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: dfffffd000000001 x9 : 0000000000000000 x8 : 0000000000000002 x7 : ffffffd071d0fe78 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000001 x3 : ffffffd071d0fe98 x2 : 0000000000000000 x1 : 0000000000000004 x0 : 0000000000000001 Kernel panic - not syncing: Asynchronous SError Interrupt Fixes: 4525412a5046 ("coresight: tmc: making prepare/unprepare functions generic") Reported-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Tested-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200716175746.3338735-14-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> MIPS: OCTEON: add missing put_device() call in dwc3_octeon_device_init() [ Upstream commit e8b9fc10f2615b9a525fce56981e40b489528355 ] if of_find_device_by_node() succeed, dwc3_octeon_device_init() doesn't have a corresponding put_device(). Thus add put_device() to fix the exception handling for this function implementation. Fixes: 93e502b3c2d4 ("MIPS: OCTEON: Platform support for OCTEON III USB controller") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org> usb: dwc2: Fix error path in gadget registration [ Upstream commit 33a06f1300a79cfd461cea0268f05e969d4f34ec ] When gadget registration fails, one should not call usb_del_gadget_udc(). Ensure this by setting gadget->udc to NULL. Also in case of a failure there is no need to disable low-level hardware, so return immiedetly instead of jumping to error_init label. This fixes the following kernel NULL ptr dereference on gadget failure (can be easily triggered with g_mass_storage without any module parameters): dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter besl=1 dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024 dwc2 12480000.hsotg: EPs: 16, dedicated fifos, 7808 entries in SPRAM Mass Storage Function, version: 2009/09/11 LUN: removable file: (no medium) no file given for LUN0 g_mass_storage 12480000.hsotg: failed to start g_mass_storage: -22 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000104 pgd = (ptrval) [00000104] *pgd=00000000 Internal error: Oops: 805 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc5 #3133 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events deferred_probe_work_func PC is at usb_del_gadget_udc+0x38/0xc4 LR is at __mutex_lock+0x31c/0xb18 ... Process kworker/0:1 (pid: 12, stack limit = 0x(ptrval)) Stack: (0xef121db0 to 0xef122000) ... [<c076bf3c>] (usb_del_gadget_udc) from [<c0726bec>] (dwc2_hsotg_remove+0x10/0x20) [<c0726bec>] (dwc2_hsotg_remove) from [<c0711208>] (dwc2_driver_probe+0x57c/0x69c) [<c0711208>] (dwc2_driver_probe) from [<c06247c0>] (platform_drv_probe+0x6c/0xa4) [<c06247c0>] (platform_drv_probe) from [<c0621df4>] (really_probe+0x200/0x48c) [<c0621df4>] (really_probe) from [<c06221e8>] (driver_probe_device+0x78/0x1fc) [<c06221e8>] (driver_probe_device) from [<c061fcd4>] (bus_for_each_drv+0x74/0xb8) [<c061fcd4>] (bus_for_each_drv) from [<c0621b54>] (__device_attach+0xd4/0x16c) [<c0621b54>] (__device_attach) from [<c0620c98>] (bus_probe_device+0x88/0x90) [<c0620c98>] (bus_probe_device) from [<c06211b0>] (deferred_probe_work_func+0x3c/0xd0) [<c06211b0>] (deferred_probe_work_func) from [<c0149280>] (process_one_work+0x234/0x7dc) [<c0149280>] (process_one_work) from [<c014986c>] (worker_thread+0x44/0x51c) [<c014986c>] (worker_thread) from [<c0150b1c>] (kthread+0x158/0x1a0) [<c0150b1c>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20) Exception stack(0xef121fb0 to 0xef121ff8) ... ---[ end trace 9724c2fc7cc9c982 ]--- While fixing this also fix the double call to dwc2_lowlevel_hw_disable() if dr_mode is set to USB_DR_MODE_PERIPHERAL. In such case low-level hardware is already disabled before calling usb_add_gadget_udc(). That function correctly preserves low-level hardware state, there is no need for the second unconditional dwc2_lowlevel_hw_disable() call. Fixes: 207324a321a8 ("usb: dwc2: Postponed gadget registration to the udc class driver") Acked-by: Minas Harutyunyan <hminas@synopsys.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> scsi: mesh: Fix panic after host or bus reset [ Upstream commit edd7dd2292ab9c3628b65c4d04514c3068ad54f6 ] Booting Linux with a Conner CP3200 drive attached to the MESH SCSI bus results in EH measures and a panic: [ 25.499838] mesh: configured for synchronous 5 MB/s [ 25.787154] mesh: performing initial bus reset... [ 29.867115] scsi host0: MESH [ 29.929527] mesh: target 0 synchronous at 3.6 MB/s [ 29.998763] scsi 0:0:0:0: Direct-Access CONNER CP3200-200mb-3.5 4040 PQ: 0 ANSI: 1 CCS [ 31.989975] sd 0:0:0:0: [sda] 415872 512-byte logical blocks: (213 MB/203 MiB) [ 32.070975] sd 0:0:0:0: [sda] Write Protect is off [ 32.137197] sd 0:0:0:0: [sda] Mode Sense: 5b 00 00 08 [ 32.209661] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 32.332708] sda: [mac] sda1 sda2 sda3 [ 32.417733] sd 0:0:0:0: [sda] Attached SCSI disk ... snip ... [ 76.687067] mesh_abort((ptrval)) [ 76.743606] mesh: state at (ptrval), regs at (ptrval), dma at (ptrval) [ 76.810798] ct=6000 seq=86 bs=4017 fc= 0 exc= 0 err= 0 im= 7 int= 0 sp=85 [ 76.880720] dma stat=84e0 cmdptr=1f73d000 [ 76.941387] phase=4 msgphase=0 conn_tgt=0 data_ptr=24576 [ 77.005567] dma_st=1 dma_ct=0 n_msgout=0 [ 77.065456] target 0: req=(ptrval) goes_out=0 saved_ptr=0 [ 77.130512] mesh_abort((ptrval)) [ 77.187670] mesh: state at (ptrval), regs at (ptrval), dma at (ptrval) [ 77.255594] ct=6000 seq=86 bs=4017 fc= 0 exc= 0 err= 0 im= 7 int= 0 sp=85 [ 77.325778] dma stat=84e0 cmdptr=1f73d000 [ 77.387239] phase=4 msgphase=0 conn_tgt=0 data_ptr=24576 [ 77.453665] dma_st=1 dma_ct=0 n_msgout=0 [ 77.515900] target 0: req=(ptrval) goes_out=0 saved_ptr=0 [ 77.582902] mesh_host_reset [ 88.187083] Kernel panic - not syncing: mesh: double DMA start ! [ 88.254510] CPU: 0 PID: 358 Comm: scsi_eh_0 Not tainted 5.6.13-pmac #1 [ 88.323302] Call Trace: [ 88.378854] [e16ddc58] [c0027080] panic+0x13c/0x308 (unreliable) [ 88.446221] [e16ddcb8] [c02b2478] mesh_start.part.12+0x130/0x414 [ 88.513298] [e16ddcf8] [c02b2fc8] mesh_queue+0x54/0x70 [ 88.577097] [e16ddd18] [c02a1848] scsi_send_eh_cmnd+0x374/0x384 [ 88.643476] [e16dddc8] [c02a1938] scsi_eh_tur+0x5c/0xb8 [ 88.707878] [e16dddf8] [c02a1ab8] scsi_eh_test_devices+0x124/0x178 [ 88.775663] [e16dde28] [c02a2094] scsi_eh_ready_devs+0x588/0x8a8 [ 88.843124] [e16dde98] [c02a31d8] scsi_error_handler+0x344/0x520 [ 88.910697] [e16ddf08] [c00409c8] kthread+0xe4/0xe8 [ 88.975166] [e16ddf38] [c000f234] ret_from_kernel_thread+0x14/0x1c [ 89.044112] Rebooting in 180 seconds.. In theory, a panic can happen after a bus or host reset with dma_started flag set. Fix this by halting the DMA before reinitializing the host. Don't assume that ms->current_req is set when halt_dma() is invoked as it may not hold for bus or host reset. BTW, this particular Conner drive can be made to work by inhibiting disconnect/reselect with 'mesh.resel_targets=0'. Link: https://lore.kernel.org/r/3952bc691e150a7128b29120999b6092071b039a.1595460351.git.fthain@telegraphics.com.au Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Paul Mackerras <paulus@ozlabs.org> Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration [ Upstream commit 0f3c66a3c7b4e8b9f654b3c998e9674376a51b0f ] The MV88E6097 chip does not support configuring jumbo frames. Prior to commit 5f4366660d65 only the 6352, 6351, 6165 and 6320 chips configured jumbo mode. The refactor accidentally added the function for the 6097. Remove the erroneous function pointer assignment. Fixes: 5f4366660d65 ("net: dsa: mv88e6xxx: Refactor setting of jumbo frames") Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Smack: fix another vsscanf out of bounds [ Upstream commit a6bd4f6d9b07452b0b19842044a6c3ea384b0b88 ] This is similar to commit 84e99e58e8d1 ("Smack: slab-out-of-bounds in vsscanf") where we added a bounds check on "rule". Reported-by: syzbot+a22c6092d003d6fe1122@syzkaller.appspotmail.com Fixes: f7112e6c9abf ("Smack: allow for significantly longer Smack labels v4") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Smack: prevent underflow in smk_set_cipso() [ Upstream commit 42a2df3e829f3c5562090391b33714b2e2e5ad4a ] We have an upper bound on "maplevel" but forgot to check for negative values. Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Sasha Levin <sashal@kernel.org> power: supply: check if calc_soc succeeded in pm860x_init_battery [ Upstream commit ccf193dee1f0fff55b556928591f7818bac1b3b1 ] clang static analysis flags this error 88pm860x_battery.c:522:19: warning: Assigned value is garbage or undefined [core.uninitialized.Assign] info->start_soc = soc; ^ ~~~ soc is set by calling calc_soc. But calc_soc can return without setting soc. So check the return status and bail similarly to other checks in pm860x_init_battery and initialize soc to silence the warning. Fixes: a830d28b48bf ("power_supply: Enable battery-charger for 88pm860x") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Bluetooth: hci_serdev: Only unregister device if it was registered [ Upstream commit 202798db9570104728dce8bb57dfeed47ce764bc ] We should not call hci_unregister_dev if the device was not successfully registered. Fixes: c34dc3bfa7642fd ("Bluetooth: hci_serdev: Introduce hci_uart_unregister_device()") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org> selftests/powerpc: Fix CPU affinity for child process [ Upstream commit 854eb5022be04f81e318765f089f41a57c8e5d83 ] On systems with large number of cpus, test fails trying to set affinity by calling sched_setaffinity() with smaller size for affinity mask. This patch fixes it by making sure that the size of allocated affinity mask is dependent on the number of CPUs as reported by get_nprocs(). Fixes: 00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 benchmark") Reported-by: Shirisha Ganta <shiganta@in.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Harish <harish@linux.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609081423.529664-1-harish@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org> PCI: Release IVRS table in AMD ACS quirk [ Upstream commit 090688fa4e448284aaa16136372397d7d10814db ] The acpi_get_table() should be coupled with acpi_put_table() if the mapped table is not used at runtime to release the table mapping. In pci_quirk_amd_sb_acs(), IVRS table is just used for checking AMD IOMMU is supported, not used at runtime, so put the table after using it. Fixes: 15b100dfd1c9 ("PCI: Claim ACS support for AMD southbridge devices") Link: https://lore.kernel.org/r/1595411068-15440-1-git-send-email-guohanjun@huawei.com Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> selftests/powerpc: Fix online CPU selection [ Upstream commit dfa03fff86027e58c8dba5c03ae68150d4e513ad ] The size of the CPU affinity mask must be large enough for systems with a very large number of CPUs. Otherwise, tests which try to determine the first online CPU by calling sched_getaffinity() will fail. This makes sure that the size of the allocated affinity mask is dependent on the number of CPUs as reported by get_nprocs_conf(). Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs") Reported-by: Shirisha Ganta <shiganta@in.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a408c4b8e9a23bb39b539417a21eb0ff47bb5127.1596084858.git.sandipan@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org> s390/qeth: don't process empty bridge port events [ Upstream commit 02472e28b9a45471c6d8729ff2c7422baa9be46a ] Discard events that don't contain any entries. This shouldn't happen, but subsequent code relies on being able to use entry 0. So better be safe than accessing garbage. Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> wl1251: fix always return 0 error [ Upstream commit 20e6421344b5bc2f97b8e2db47b6994368417904 ] wl1251_event_ps_report() should not always return 0 because wl1251_ps_set_mode() may fail. Change it to return 'ret'. Fixes: f7ad1eed4d4b ("wl1251: retry power save entry") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200730073939.33704-1-wanghai38@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> tools, build: Propagate build failures from tools/build/Makefile.build [ Upstream commit a278f3d8191228212c553a5d4303fa603214b717 ] The '&&' command seems to have a bad effect when $(cmd_$(1)) exits with non-zero effect: the command failure is masked (despite `set -e`) and all but the first command of $(dep-cmd) is executed (successfully, as they are mostly printfs), thus overall returning 0 in the end. This means in practice that despite compilation errors, tools's build Makefile will return success. We see this very reliably with libbpf's Makefile, which doesn't get compilation error propagated properly. This in turns causes issues with selftests build, as well as bpftool and other projects that rely on building libbpf. The fix is simple: don't use &&. Given `set -e`, we don't need to chain commands with &&. The shell will exit on first failure, giving desired behavior and propagating error properly. Fixes: 275e2d95591e ("tools build: Move dependency copy into function") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20200731024244.872574-1-andriin@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> net: ethernet: aquantia: Fix wrong return value [ Upstream commit 0470a48880f8bc42ce26962b79c7b802c5a695ec ] In function hw_atl_a0_hw_multicast_list_set(), when an invalid request is encountered, a negative error code should be returned. Fixes: bab6de8fd180b ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions") Cc: David VomLehn <vomlehn@texas.net> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> liquidio: Fix wrong return value in cn23xx_get_pf_num() [ Upstream commit aa027850a292ea65524b8fab83eb91a124ad362c ] On an error exit path, a negative error code should be returned instead of a positive return value. Fixes: 0c45d7fe12c7e ("liquidio: fix use of pf in pass-through mode in a virtual machine") Cc: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> net: spider_net: Fix the size used in a 'dma_free_coherent()' call [ Upstream commit 36f28f7687a9ce665479cce5d64ce7afaa9e77ae ] Update the size used in 'dma_free_coherent()' in order to match the one used in the corresponding 'dma_alloc_coherent()', in 'spider_net_init_chain()'. Fixes: d4ed8f8d1fb7 ("Spidernet DMA coalescing") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> fsl/fman: use 32-bit unsigned integer [ Upstream commit 99f47abd9f7bf6e365820d355dc98f6955a562df ] Potentially overflowing expression (ts_freq << 16 and intgr << 16) declared as type u32 (32-bit unsigned) is evaluated using 32-bit arithmetic and then used in a context that expects an expression of type u64 (64-bit unsigned) which ultimately is used as 16-bit unsigned by typecasting to u16. Fixed by using an unsigned 32-bit integer since the value is truncated anyway in the end. Fixes: 414fd46e7762 ("fsl/fman: Add FMan support") Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> fsl/fman: fix dereference null return value [ Upstream commit 0572054617f32670abab4b4e89a876954d54b704 ] Check before using returned value to avoid dereferencing null pointer. Fixes: 18a6c85fcc78 ("fsl/fman: Add FMan Port Support") Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> fsl/fman: fix unreachable code [ Upstream commit cc79fd8f557767de90ff199d3b6fb911df43160a ] The parameter 'priority' is incorrectly forced to zero which ultimately induces logically dead code in the subsequent lines. Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support") Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> fsl/fman: check dereferencing null pointer [ Upstream commit cc5d229a122106733a85c279d89d7703f21e4d4f ] Add a safe check to avoid dereferencing null pointer Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support") Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> fsl/fman: fix eth hash table allocation [ Upstream commit 3207f715c34317d08e798e11a10ce816feb53c0f ] Fix memory allocation for ethernet address hash table. The code was wrongly allocating an array for eth hash table which is incorrect because this is the main structure for eth hash table (struct eth_hash_t) that contains inside a number of elements. Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support") Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> dlm: Fix kobject memleak [ Upstream commit 0ffddafc3a3970ef7013696e7f36b3d378bc4c16 ] Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Set do_unreg = 1 before kobject_init_and_add() to ensure that kobject_put() can be called in its error patch. Fixes: 901195ed7f4b ("Kobject: change GFS2 to use kobject_init_and_add") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> pinctrl-single: fix pcs_parse_pinconf() return value [ Upstream commit f46fe79ff1b65692a65266a5bec6dbe2bf7fc70f ] This patch causes pcs_parse_pinconf() to return -ENOTSUPP when no pinctrl_map is added. The current behavior is to return 0 when !PCS_HAS_PINCONF or !nconfs. Thus pcs_parse_one_pinctrl_entry() incorrectly assumes that a map was added and sets num_maps = 2. Analysis: ========= The function pcs_parse_one_pinctrl_entry() calls pcs_parse_pinconf() if PCS_HAS_PINCONF is enabled. The function pcs_parse_pinconf() returns 0 to indicate there was no error and num_maps is then set to 2: 980 static int pcs_parse_one_pinctrl_entry(struct pcs_device *pcs, 981 struct device_node *np, 982 struct pinctrl_map **map, 983 unsigned *num_maps, 984 const char **pgnames) 985 { <snip> 1053 (*map)->type = PIN_MAP_TYPE_MUX_GROUP; 1054 (*map)->data.mux.group = np->name; 1055 (*map)->data.mux.function = np->name; 1056 1057 if (PCS_HAS_PINCONF && function) { 1058 res = pcs_parse_pinconf(pcs, np, function, map); 1059 if (res) 1060 goto free_pingroups; 1061 *num_maps = 2; 1062 } else { 1063 *num_maps = 1; 1064 } However, pcs_parse_pinconf() will also return 0 if !PCS_HAS_PINCONF or !nconfs. I believe these conditions should indicate that no map was added by returning -ENOTSUPP. Otherwise pcs_parse_one_pinctrl_entry() will set num_maps = 2 even though no maps were successfully added, as it does not reach "m++" on line 940: 895 static int pcs_parse_pinconf(struct pcs_device *pcs, struct device_node *np, 896 struct pcs_function *func, 897 struct pinctrl_map **map) 898 899 { 900 struct pinctrl_map *m = *map; <snip> 917 /* If pinconf isn't supported, don't parse properties in below. */ 918 if (!PCS_HAS_PINCONF) 919 return 0; 920 921 /* cacluate how much properties are supported in current node */ 922 for (i = 0; i < ARRAY_SIZE(prop2); i++) { 923 if (of_find_property(np, prop2[i].name, NULL)) 924 nconfs++; 925 } 926 for (i = 0; i < ARRAY_SIZE(prop4); i++) { 927 if (of_find_property(np, prop4[i].name, NULL)) 928 nconfs++; 929 } 930 if (!nconfs) 919 return 0; 932 933 func->conf = devm_kcalloc(pcs->dev, 934 nconfs, sizeof(struct pcs_conf_vals), 935 GFP_KERNEL); 936 if (!func->conf) 937 return -ENOMEM; 938 func->nconfs = nconfs; 939 conf = &(func->conf[0]); 940 m++; This situtation will cause a boot failure [0] on the BeagleBone Black (AM3358) when am33xx_pinmux node in arch/arm/boot/dts/am33xx-l4.dtsi has compatible = "pinconf-single" instead of "pinctrl-single". The patch fixes this issue by returning -ENOSUPP when !PCS_HAS_PINCONF or !nconfs, so that pcs_parse_one_pinctrl_entry() will know that no map was added. Logic is also added to pcs_parse_one_pinctrl_entry() to distinguish between -ENOSUPP and other errors. In the case of -ENOSUPP, num_maps is set to 1 as it is valid for pinconf to be enabled and a given pin group to not any pinconf properties. [0] https://lore.kernel.org/linux-omap/20200529175544.GA3766151@x1/ Fixes: 9dddb4df90d1 ("pinctrl: single: support generic pinconf") Signed-off-by: Drew Fustini <drew@beagleboard.org> Acked-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20200608125143.GA2789203@x1 Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task [ Upstream commit 8ab49526b53d3172d1d8dd03a75c7d1f5bd21239 ] syzbot found its way in 86_fsgsbase_read_task() and triggered this oops: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0 RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393 Call Trace: putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876 genregs32_set arch/x86/kernel/ptrace.c:1026 [inline] genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006 copy_regset_from_user include/linux/regset.h:326 [inline] ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline] compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198 __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline] __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline] __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389 do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline] __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c This can happen if ptrace() or sigreturn() pokes an LDT selector into FS or GS for a task with no LDT and something tries to read the base before a return to usermode notices the bad selector and fixes it. The fix is to make sure ldt pointer is not NULL. Fixes: 07e1d88adaae ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately") Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> crypto: aesni - add compatibility with IAS [ Upstream commit 44069737ac9625a0f02f0f7f5ab96aae4cd819bc ] Clang's integrated assembler complains "invalid reassignment of non-absolute variable 'var_ddq_add'" while assembling arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was reassigned with non-absolute values several times, which IAS did not support. We can avoid the reassignment by replacing the uses of var_ddq_add with its definitions accordingly to have compatilibility with IAS. Link: https://github.com/ClangBuiltLinux/linux/issues/1008 Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-by: Fangrui Song <maskray@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot Linux v5.7.5; clang v11.0.0-git Signed-off-by: Jian Cai <caij2003@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> af_packet: TPACKET_V3: fix fill status rwlock imbalance [ Upstream commit 88fd1cb80daa20af063bce81e1fad14e945a8dc4 ] After @blk_fill_in_prog_lock is acquired there is an early out vnet situation that can occur. In that case, the rwlock needs to be released. Also, since @blk_fill_in_prog_lock is only acquired when @tp_version is exactly TPACKET_V3, only release it on that exact condition as well. And finally, add sparse annotation so that it is clearer that prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring and releasing @blk_fill_in_prog_lock, respectively. sparse is still unable to understand the balance, but the warnings are now on a higher level that make more sense. Fixes: 632ca50f2cbd ("af_packet: TPACKET_V3: replace busy-wait loop") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/net/wan/lapbether: Added needed_headroom and a skb->len check [ Upstream commit c7ca03c216acb14466a713fedf1b9f2c24994ef2 ] 1. Added a skb->len check This driver expects upper layers to include a pseudo header of 1 byte when passing down a skb for transmission. This driver will read this 1-byte header. This patch added a skb->len check before reading the header to make sure the header exists. 2. Changed to use needed_headroom instead of hard_header_len to request necessary headroom to be allocated In net/packet/af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and assumes the user to provide the appropriate link layer header. So according to the logic of af_packet.c, dev->hard_header_len should be the length of the header that would be created by dev->header_ops->create. However, this driver doesn't provide dev->header_ops, so logically dev->hard_header_len should be 0. So we should use dev->needed_headroom instead of dev->hard_header_len to request necessary headroom to be allocated. This change fixes kernel panic when this driver is used with AF_PACKET SOCK_RAW sockets. Call stack when panic: [ 168.399197] skbuff: skb_under_panic: text:ffffffff819d95fb len:20 put:14 head:ffff8882704c0a00 data:ffff8882704c09fd tail:0x11 end:0xc0 dev:veth0 ... [ 168.399255] Call Trace: [ 168.399259] skb_push.cold+0x14/0x24 [ 168.399262] eth_header+0x2b/0xc0 [ 168.399267] lapbeth_data_transmit+0x9a/0xb0 [lapbether] [ 168.399275] lapb_data_transmit+0x22/0x2c [lapb] [ 168.399277] lapb_transmit_buffer+0x71/0xb0 [lapb] [ 168.399279] lapb_kick+0xe3/0x1c0 [lapb] [ 168.399281] lapb_data_request+0x76/0xc0 [lapb] [ 168.399283] lapbeth_xmit+0x56/0x90 [lapbether] [ 168.399286] dev_hard_start_xmit+0x91/0x1f0 [ 168.399289] ? irq_init_percpu_irqstack+0xc0/0x100 [ 168.399291] __dev_queue_xmit+0x721/0x8e0 [ 168.399295] ? packet_parse_headers.isra.0+0xd2/0x110 [ 168.399297] dev_queue_xmit+0x10/0x20 [ 168.399298] packet_sendmsg+0xbf0/0x19b0 ...... Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Martin Schiller <ms@dev.tdt.de> Cc: Brian Norris <briannorris@chromium.org> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> net/nfc/rawsock.c: add CAP_NET_RAW check. [ Upstream commit 26896f01467a28651f7a536143fe5ac8449d4041 ] When creating a raw AF_NFC socket, CAP_NET_RAW needs to be checked first. Signed-off-by: Qingyu Li <ieatmuttonchuan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> net: refactor bind_bucket fastreuse into helper [ Upstream commit 62ffc589abb176821662efc4525ee4ac0b9c3894 ] Refactor the fastreuse update code in inet_csk_get_port into a small helper function that can be called from other places. Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> net: Set fput_needed iff FDPUT_FPUT is set [ Upstream commit ce787a5a074a86f76f5d3fd804fa78e01bfb9e89 ] We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef6a7e ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> USB: serial: cp210x: re-enable auto-RTS on open commit c7614ff9b73a1e6fb2b1b51396da132ed22fecdb upstream. CP210x hardware disables auto-RTS but leaves auto-CTS when in hardware flow control mode and UART on cp210x hardware is disabled. When re-opening the port, if auto-CTS is enabled on the cp210x, then auto-RTS must be re-enabled in the driver. Signed-off-by: Brant Merryman <brant.merryman@silabs.com> Co-developed-by: Phu Luu <phu.luu@silabs.com> Signed-off-by: Phu Luu <phu.luu@silabs.com> Link: https://lore.kernel.org/r/ECCF8E73-91F3-4080-BE17-1714BC8818FB@silabs.com [ johan: fix up tags and problem description ] Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control") Cc: stable <stable@vger.kernel.org> # 2.6.12 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> USB: serial: cp210x: enable usb generic throttle/unthrottle commit 4387b3dbb079d482d3c2b43a703ceed4dd27ed28 upstream. Assign the .throttle and .unthrottle functions to be generic function in the driver structure to prevent da…

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: John Vincent <git@tensevntysevn.cf> Signed-off-by: John Vincent <git@tenseventyseven.cf>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ] YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 kubernetes/kubernetes#70747 - Apache Bench can fill up ipvs service proxy in seconds #544 cloudnativelabs/kube-router#544 - Additional 1s latency in `host -> service IP -> pod` kubernetes/kubernetes#90854 Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: YangYuxi <yx.atom1@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 7, 2018

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 7, 2018

k8s-ci-robot added the area/ipvs label Nov 10, 2018

m1093782566 mentioned this issue Nov 10, 2018

IPVS latency and throughput is not as stable as iptables #65820

Closed

Lion-Wei mentioned this issue Nov 15, 2018

Track when we can enable the ipvs mode for the kube-proxy by default kubernetes/kubeadm#817

Closed

k8s-ci-robot assigned Lion-Wei Nov 16, 2018

Lion-Wei mentioned this issue Nov 16, 2018

fix IPVS low throughput issue #71114

Merged

k8s-ci-robot closed this as completed in #71114 Nov 16, 2018

yankay mentioned this issue Jun 21, 2022

Change Default kube_proxy_mode to Iptables kubernetes-sigs/kubespray#9015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPVS low throughput #70747

IPVS low throughput #70747

annProg commented Nov 7, 2018 •

edited

Loading

annProg commented Nov 7, 2018

k8s-ci-robot commented Nov 7, 2018

annProg commented Nov 10, 2018 •

edited

Loading

annProg commented Nov 10, 2018

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 •

edited

Loading

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 •

edited

Loading

annProg commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 10, 2018 •

edited

Loading

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 •

edited

Loading

dawidmalina commented Nov 10, 2018

annProg commented Nov 10, 2018

m1093782566 commented Nov 11, 2018 •

edited

Loading

annProg commented Nov 13, 2018 •

edited

Loading

Lion-Wei commented Nov 16, 2018

Lion-Wei commented Nov 16, 2018

m1093782566 commented Nov 20, 2018

jsravn commented Nov 20, 2018 •

edited

Loading

IPVS low throughput #70747

IPVS low throughput #70747

Comments

annProg commented Nov 7, 2018 • edited Loading

annProg commented Nov 7, 2018

k8s-ci-robot commented Nov 7, 2018

annProg commented Nov 10, 2018 • edited Loading

annProg commented Nov 10, 2018

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 • edited Loading

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 • edited Loading

annProg commented Nov 10, 2018 • edited Loading

m1093782566 commented Nov 10, 2018 • edited Loading

annProg commented Nov 10, 2018

m1093782566 commented Nov 10, 2018 • edited Loading

dawidmalina commented Nov 10, 2018

annProg commented Nov 10, 2018

m1093782566 commented Nov 11, 2018 • edited Loading

annProg commented Nov 13, 2018 • edited Loading

Lion-Wei commented Nov 16, 2018

Lion-Wei commented Nov 16, 2018

m1093782566 commented Nov 20, 2018

jsravn commented Nov 20, 2018 • edited Loading

annProg commented Nov 7, 2018 •

edited

Loading

annProg commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 10, 2018 •

edited

Loading

annProg commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 10, 2018 •

edited

Loading

m1093782566 commented Nov 11, 2018 •

edited

Loading

annProg commented Nov 13, 2018 •

edited

Loading

jsravn commented Nov 20, 2018 •

edited

Loading