-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.13] OCPBUGS-14357: Disable iscsi.service & re-enable iscsid.socket #1302
[release-4.13] OCPBUGS-14357: Disable iscsi.service & re-enable iscsid.socket #1302
Conversation
@travier: This pull request references Jira Issue OCPBUGS-14357, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@travier: This pull request references Jira Issue OCPBUGS-14357, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/label backport-risk-assessed |
Hm interesting, the newly added test is failing. |
Yes, this needs the backports listed at the top. |
/label cherry-pick-approved |
/hold |
`iscsi.service` has `Before=remote-fs-pre.target` *and* `After=network-online.target`. This forces `remote-fs-pre.target` to block on `network-online.target` and hence in OCP, on `ovs-configuration.service` (which has `Before=network-online.target`). So this transitively makes `systemd-user-sessions.service` block on `network-online.target`. This was an issue in Fedora as well and was discussed in a devel thread[[1]]. `iscsi.service` was subsequently reworked[[2]][[3]] so that it was only activated if iSCSI was actually used by the system. On RHEL 8, `iscsi.service` and co. were directly enabled by RPM scriptlets rather than using presets. In RHCOS, we explicitly make presets canonical[[4]] so we shipped with `iscsi.service` disabled by default. On RHEL 9, the units were fixed to use presets[[5]]. This is why we started seeing this issue after moving to RHEL 9. So all we need in theory is to have the Fedora patch backported to RHEL 9. However, since we don't really need the functionality from `iscsi.service` by default in RHCOS, we can fast-track its (re-)disablement and not wait for the `iscsi-starter.service` workaround. Note that `iscsi.service` is only used to bring up iSCSI sessions marked for autostart in `/var/lib/iscsi/nodes` and is separate from `iscsid.service`, which is what actually manages the iSCSI connections. In OpenShift, we rely on the latter only (e.g. configured iSCSI PVCs are done by the kubelet directly calling out to `iscsiadm`). It's also separate from iSCSI devices that use host bus adapters, which are transparent to RHCOS/OCP. Fixes: https://issues.redhat.com/browse/OCPBUGS-11124 [1]: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/HACVEJ3FMOIM2TOENOVH5CPOUNR7NCMS [2]: https://src.fedoraproject.org/rpms/iscsi-initiator-utils/c/1e689cd0c6667eca838c85975a1b7a070209e5ad [3]: https://src.fedoraproject.org/rpms/fedora-release/pull-request/246 [4]: https://github.com/coreos/fedora-coreos-config/blob/1553518214088a89d6a2360a6fcdddbd3915628a/manifests/ignition-and-ostree.yaml#L35-L44 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1930458 (cherry picked from commit b5c5a05)
The iSCSI daemon is now socket-activated so that it's only running when needed rather than always enabled. We're breaking that by disabling `iscsid.socket`. This effectively reverts 929ac48 ("c9s: Disable iscsid.socket"). It's not certain why this was done, but it was likely to workaround failing tests. These tests should be fixed now[[1]] so we should be able to stop doing this. [1]: coreos/coreos-assembler#3275 (cherry picked from commit 8c94270)
This was moved to FCOS[[1]]. [1]: coreos/fedora-coreos-config#2437 (cherry picked from commit 096f0ae)
@travier: This pull request references Jira Issue OCPBUGS-14357, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
See: coreos/fedora-coreos-config#2448 ``` $ git -C fedora-coreos-config shortlog --no-merges \ 9fae403ff93a090ef4f7436eda8a0d5387b9c862..ed7f4f22c6db1fbcabad678267a747cc70cbc53a Dusty Mabe (1): tests/kola: wait longer in commonlib.sh is_service_active Jonathan Lebon (1): tests/kola: upstream network-online login test from RHCOS ```
c89acf7
to
5bb4a77
Compare
@travier: This pull request references Jira Issue OCPBUGS-14357, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial LGTM
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, dustymabe, travier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@travier: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@travier: Jira Issue OCPBUGS-14357: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-14357 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cherry-picked from:
iscsid.socket
#1298Needs:
systemctl is-system-running
nonzero exit codes coreos/coreos-assembler#3497overlay: disable iscsi.service by default
iscsi.service
hasBefore=remote-fs-pre.target
andAfter=network-online.target
. This forcesremote-fs-pre.target
to block on
network-online.target
and hence in OCP, onovs-configuration.service
(which hasBefore=network-online.target
).So this transitively makes
systemd-user-sessions.service
block onnetwork-online.target
.This was an issue in Fedora as well and was discussed in a devel
thread[1].
iscsi.service
was subsequently reworked[2][3] so thatit was only activated if iSCSI was actually used by the system.
On RHEL 8,
iscsi.service
and co. were directly enabled by RPMscriptlets rather than using presets. In RHCOS, we explicitly make
presets canonical[4] so we shipped with
iscsi.service
disabled bydefault. On RHEL 9, the units were fixed to use presets[5]. This is
why we started seeing this issue after moving to RHEL 9.
So all we need in theory is to have the Fedora patch backported to RHEL
9. However, since we don't really need the functionality from
iscsi.service
by default in RHCOS, we can fast-track its(re-)disablement and not wait for the
iscsi-starter.service
workaround.Note that
iscsi.service
is only used to bring up iSCSI sessionsmarked for autostart in
/var/lib/iscsi/nodes
and is separate fromiscsid.service
, which is what actually manages the iSCSI connections.In OpenShift, we rely on the latter only (e.g. configured iSCSI PVCs
are done by the kubelet directly calling out to
iscsiadm
). It'salso separate from iSCSI devices that use host bus adapters, which are
transparent to RHCOS/OCP.
Fixes: https://issues.redhat.com/browse/OCPBUGS-11124
(cherry picked from commit b5c5a05)
overlay: stop disabling
iscsid.socket
The iSCSI daemon is now socket-activated so that it's only running when
needed rather than always enabled. We're breaking that by disabling
iscsid.socket
.This effectively reverts 929ac48 ("c9s: Disable iscsid.socket"). It's
not certain why this was done, but it was likely to workaround failing
tests. These tests should be fixed now[1] so we should be able to stop
doing this.
(cherry picked from commit 8c94270)
tests/kola: delete ext.config.systemd.network-online test
This was moved to FCOS[1].
(cherry picked from commit 096f0ae)
Bump fedora-coreos-config submodule
See: coreos/fedora-coreos-config#2448