Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase device discovery timeout to 30seconds/equal to checker_timeout #78475

Merged
merged 1 commit into from Jun 1, 2019

Conversation

@humblec
Copy link
Contributor

commented May 29, 2019

At present, iscsi plugin wait for 10seconds for a path to appear for a multipath
device, but at certain scenarios this may not be sufficient for device mapper
to get the path. The default multipath configuration has a configuation
called 'checker_timeout' which specify the timeout to user for path checkers
that issue scsi commands with an explicit timeout, in seconds;
default taken from /sys/block/sd*/device/timeout which is 30s.
This patch lift the timeout value from 10s to 30s

Signed-off-by: Humble Chirammal hchiramm@redhat.com

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Till this release, iscsi plugin was waiting 10 seconds for a path to appear in the device list. However this timeout is not enough or less than default device discovery timeout in most of the systems which cause certain device to be not accounted for the volume. This timeout has been lifted to 30seconds from this release and it should help to avoid mount issues due to device discovery.
@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

/assign @jsafrane

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: humblec

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from 66e989f to 132c936 May 29, 2019

@jsafrane

This comment has been minimized.

Copy link
Member

commented May 29, 2019

If the value is in /sys/block/sd*/device/timeout, can kubelet read it from there and use it? And fall back to 30, if the file is not readable at all?

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

If the value is in /sys/block/sd*/device/timeout, can kubelet read it from there and use it? And fall back to 30, if the file is not readable at all?

@jsafrane
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html-single/dm_multipath/index#mpio_configfile says:

**checker_timeout** | The timeout to use for prioritizers and path checkers that issue SCSI  commands with an explicit timeout, in seconds. The default value is  taken from sys/block/sdx/device/timeout

So, checking it first from sysfs as mentioned above and falling back to 30s is a good suggestion. However I am wondering, which sd device to check here. Also what about devices which are named differently or with different device name prefix but still support scsi commands.

@@ -50,7 +50,7 @@ const (
maxAttachAttempts = 5

// How many seconds to wait for a multipath device if at least two paths are available.
multipathDeviceTimeout = 10
multipathDeviceTimeout = 30

This comment has been minimized.

Copy link
@jsafrane

jsafrane May 30, 2019

Member

I'm re-reading the code again and it's a bit confusing. Constant name and the comment above do not correspond to how it is used in the code.

As a fix I would propose:

  • Use multipathDeviceTimeout instead of hardcoded 10 here:

    devicePath = waitForMultiPathToExist(devicePathList, 10, b.deviceUtil)
    - this is the place where kubelet waits for multipath device to appear.

  • Create a new constant for waiting for a single path to appear and use it instead of multipathDeviceTimeout everywhere else. IMO 30s is fine.

This comment has been minimized.

Copy link
@humblec

humblec May 30, 2019

Author Contributor

@jsafrane I see, indeed its bit confusing at current state. I have accomodated your review comment and updated the patchset. Hopefully thats what you meant. Please let me know if I missed any or misinterpreted anything.

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from 132c936 to 813da9d May 30, 2019

@k8s-ci-robot k8s-ci-robot added size/S and removed size/XS labels May 30, 2019

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from 813da9d to 0eb0159 May 30, 2019

@humblec humblec changed the title Increase multipathDeviceTimeout to 30seconds/equal to checker_timeout. Increase multipath second path device discovery to 30seconds/equal to checker_timeout. May 30, 2019

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from 0eb0159 to 77bb366 May 30, 2019

@humblec humblec changed the title Increase multipath second path device discovery to 30seconds/equal to checker_timeout. Increase device discovery timeout to 30seconds/equal to checker_timeout May 30, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

/release-note-none

@@ -52,6 +52,9 @@ const (
// How many seconds to wait for a multipath device if at least two paths are available.
multipathDeviceTimeout = 10

// How many seconds to wait for second path for a multipath device.

This comment has been minimized.

Copy link
@jsafrane

jsafrane May 30, 2019

Member

Fix the commit please

@jsafrane

This comment has been minimized.

Copy link
Member

commented May 30, 2019

/kind bug
setting 1.15 milestone

@k8s-ci-robot k8s-ci-robot added kind/bug and removed needs-kind labels May 30, 2019

@jsafrane jsafrane added this to the v1.15 milestone May 30, 2019

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from 77bb366 to ef17014 May 30, 2019

Increase device discovery timeout to 30seconds/equal to checker_timeout.
At present, iscsi plugin wait for 10seconds for a path to appear for a multipath
device, but at certain scenarios this may not be sufficient for device mapper
to get the path. The default multipath configuration has a configuation
called 'checker_timeout' which specify the timeout to user for path checkers
that issue scsi commands with an explicit timeout, in seconds;
default taken from /sys/block/sd*/device/timeout which is 30s.
This patch lift the timeout value from 10s to 30s.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
@jsafrane

This comment has been minimized.

Copy link
Member

commented May 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@humblec humblec force-pushed the humblec:iscsi-higher-timeout branch from ef17014 to d4ea88f May 30, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 30, 2019

@jsafrane

This comment has been minimized.

Copy link
Member

commented May 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@humblec

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

/retest

@fejta-bot

This comment has been minimized.

Copy link

commented May 30, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 8362e9c into kubernetes:master Jun 1, 2019

21 checks passed

cla/linuxfoundation humblec authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Job succeeded.
Details
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.