Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

METAL-897: Use nmcli instead of legacy network scripts #1631

Merged
merged 1 commit into from
Feb 22, 2024

Conversation

elfosardo
Copy link
Member

No description provided.

@elfosardo elfosardo changed the title Use nmcli instead of legacy network scripts [WIP] Use nmcli instead of legacy network scripts Feb 6, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024
@elfosardo elfosardo force-pushed the use-nmcli branch 7 times, most recently from 1473b94 to a5989d6 Compare February 7, 2024 14:29
@elfosardo
Copy link
Member Author

/retest
ofcir failure

@elfosardo
Copy link
Member Author

/retest

2 similar comments
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest
galaxy error, interesting!

@elfosardo
Copy link
Member Author

/retest

2 similar comments
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo elfosardo changed the title [WIP] Use nmcli instead of legacy network scripts METAL-897: Use nmcli instead of legacy network scripts Feb 9, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2024
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

elfosardo commented Feb 9, 2024

/retest
more ansible galaxy error, scary

@elfosardo
Copy link
Member Author

/retest

@mkowalski
Copy link
Member

mkowalski commented Feb 13, 2024

Hey, the only thing that concerns me (but it may be completely invalid) - how am I supposed to "upgrade" after this commit merges? Should I run make clean (or make realclean) before pulling from master and only afterwards use it, or maybe doesn't matter?

I feel without clean before git pull I may have unwanted stuff in my /etc but not sure honestly how this is handled.

What I am trying to say - maybe in host_cleanup.sh we should leave (as non-failing) sudo rm -f /etc/sysconfig/network-scripts/ifcfg-[...] to handle systems that used old dev-scripts in the past?

@elfosardo
Copy link
Member Author

Hey, the only thing that concerns me (but it may be completely invalid) - how am I supposed to "upgrade" after this commit merges? Should I run make clean (or make realclean) before pulling from master and only afterwards use it, or maybe doesn't matter?

I feel without clean before git pull I may have unwanted stuff in my /etc but not sure honestly how this is handled.

What I am trying to say - maybe in host_cleanup.sh we should leave (as non-failing) sudo rm -f /etc/sysconfig/network-scripts/ifcfg-[...] to handle systems that used old dev-scripts in the past?

@mkowalski that sounds like a good idea, I'll update the PR

@elfosardo elfosardo force-pushed the use-nmcli branch 2 times, most recently from 4adb9b2 to 3b4db99 Compare February 13, 2024 13:42
@elfosardo
Copy link
Member Author

/retest
CI is not really ok at the moment

@elfosardo
Copy link
Member Author

/retest

if [ -e /etc/sysconfig/network-scripts/ifcfg-${INT_IF} ]; then
sudo rm -f /etc/sysconfig/network-scripts/ifcfg-${INT_IF}
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add systemctl restart network.service || true here you will allow to keep using master branch on Stream 8 bootstrapped with very old dev-scripts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add this in a follow up as part of cleaning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkowalski
Copy link
Member

/lgtm
Whenever CI passes, good to go

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2024
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest
ansible galaxy issue

@elfosardo
Copy link
Member Author

/retest

4 similar comments
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest
failure is not related to this change

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest
wow CI is so foobar at the moment

@mkowalski
Copy link
Member

I am not sure if this error is really important here, I looked and cluster deploys but something somewhere fails afterwards,

 INFO[2024-02-16T15:01:57Z] Step e2e-metal-ipi-bm-baremetalds-devscripts-setup succeeded after 1h17m5s. 
INFO[2024-02-16T15:01:57Z] Step phase pre succeeded after 1h19m20s.     
INFO[2024-02-16T15:01:57Z] Running multi-stage phase test               
INFO[2024-02-16T15:01:57Z] Running step e2e-metal-ipi-bm-baremetalds-e2e-test. 
INFO[2024-02-16T16:17:19Z] Logs for container test in pod e2e-metal-ipi-bm-baremetalds-e2e-test: 
INFO[2024-02-16T16:17:19Z] time="2024-02-16T16:11:40Z" level=info msg="processed event" event="{{ } {foo-crd.17b463c4bc2ffd43  e2e-horizontal-pod-autoscaling-6430  ed339db1-94bc-4127-9842-38a3ebf7f32d 258089 0 2024-02-16 16:10:55 +0000 UTC <nil> <nil> map[] map[monitor.openshift.io/observed-recreation-count: monitor.openshift.io/observed-update-count:1] [] [] [{kube-controller-manager Update v1 2024-02-16 16:11:40 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:reportingComponent\":{},\"f:source\":{\"f:component\":{}},\"f:type\":{}} }]} {HorizontalPodAutoscaler e2e-horizontal-pod-autoscaling-6430 foo-crd a2e65cc7-43f1-4f19-a3bf-7a965e1ceb46 autoscaling/v2 257669 } FailedGetResourceMetric failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready) {horizontal-pod-autoscaler } 2024-02-16 16:10:55 +0000 UTC 2024-02-16 16:11:40 +0000 UTC 4 Warning 0001-01-01 00:00:00 +0000 UTC nil  nil horizontal-pod-autoscaler }" 

[...]

 Cleaning up.
found errors fetching in-cluster data: [failed to list files in disruption event folder on node host2.cluster5.ocpci.eng.rdu2.redhat.com: the server could not find the requested resource failed to list files in disruption event folder on node host3.cluster5.ocpci.eng.rdu2.redhat.com: the server could not find the requested resource failed to list files in disruption event folder on node host4.cluster5.ocpci.eng.rdu2.redhat.com: the server could not find the requested resource failed to list files in disruption event folder on node host5.cluster5.ocpci.eng.rdu2.redhat.com: the server could not find the requested resource failed to list files in disruption event folder on node host6.cluster5.ocpci.eng.rdu2.redhat.com: the server could not find the requested resource] 

[...]

 Failing tests:
[sig-cli] oc adm node-logs [Suite:openshift/conformance/parallel]
environment: line 123:   320 Killed                  openshift-tests run "${TEST_SUITE}" ${TEST_ARGS:-} --provider "${TEST_PROVIDER:-}" -o "${ARTIFACT_DIR}/e2e.log" --junit-dir "${ARTIFACT_DIR}/junit"
++ date +%s
+ echo 1708100239
{"component":"entrypoint","error":"wrapped process failed: exit status 137","file":"k8s.io/test-infra/prow/entrypoint/run.go:84","func":"k8s.io/test-infra/prow/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2024-02-16T16:17:19Z"}
error: failed to execute wrapped command: exit status 137 
INFO[2024-02-16T16:17:19Z] Step e2e-metal-ipi-bm-baremetalds-e2e-test failed after 1h15m22s. 

I can't see how this change would make cluster suddenly to fail conformance (if it really failed) but not break the installation

@elfosardo
Copy link
Member Author

@mkowalski thank you for checking that
it's weird that the error is showing up now as the CI was 100% passing last week, so I don't think the issue is due to this change
I'm going to retest once more and see

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest
yet another unrelated failure

@elfosardo
Copy link
Member Author

/retest

3 similar comments
@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@elfosardo
Copy link
Member Author

/retest

@derekhiggins
Copy link
Collaborator

/approve
tested on CS9 with both ipv4 and ipv6

Copy link

openshift-ci bot commented Feb 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekhiggins

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 5f47779 into openshift-metal3:master Feb 22, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants