Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mcc: accept ign3 & translate down #1474

Conversation

kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Feb 14, 2020

Phase 1 of Ign3 transition, the MCC accept Ign3 configs translates them down to version 2 and applies to pool while preserving the original V3 configs. New func IgnParseWrapper() allows us to use one function to parse both configs and can be used for later phases.

Related pre-work : coreos/ign-converter#4

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 14, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2020
@kikisdeliveryservice
Copy link
Contributor Author

/skip

@kikisdeliveryservice
Copy link
Contributor Author

Feb 14 03:38:05.549: INFO: > ERROR: (gcloud.compute.instance-groups.list-instances) could not parse resource []

/retest

@runcom
Copy link
Member

runcom commented Feb 19, 2020

amazing, I think this has to be a combination of this and #996 in order to actually accept Ign v3 and have a smoke e2e that creates a v3 MachineConfig (which is later translated down to v2 internally for the 4.5 timeframe cc @yuqi-zhang )

@yuqi-zhang
Copy link
Contributor

If we want #996 in first we will probably have to adapt this to convert from v3 and rawextention. We also would need a way to detect versioning in a rawextention object before converting, which 996 currently does not seem to have. https://github.com/openshift/machine-config-operator/pull/996/files#r381541065

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2020
@kikisdeliveryservice kikisdeliveryservice force-pushed the mcc-ign-por-que-no-los-dos branch 5 times, most recently from fa34392 to a91d3ea Compare April 13, 2020 21:04
@kikisdeliveryservice
Copy link
Contributor Author

Finally rebased to pick up new RawExt changes after being blocked but that PR.

@kikisdeliveryservice kikisdeliveryservice force-pushed the mcc-ign-por-que-no-los-dos branch 3 times, most recently from 18e7a7e to bfaade4 Compare April 14, 2020 02:44
@ashcrow
Copy link
Member

ashcrow commented Apr 14, 2020

/retest

@ashcrow
Copy link
Member

ashcrow commented Apr 14, 2020

/retest ci/prow/e2e-gcp-upgrade

@openshift-ci-robot
Copy link
Contributor

@ashcrow: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test e2e-aws
  • /test e2e-aws-disruptive
  • /test e2e-aws-scaleup-rhel7
  • /test e2e-gcp-op
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi
  • /test e2e-openstack
  • /test e2e-ovirt
  • /test e2e-vsphere
  • /test images
  • /test unit
  • /test verify

Use /test all to run all jobs.

In response to this:

/retest ci/prow/e2e-gcp-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ashcrow
Copy link
Member

ashcrow commented Apr 14, 2020

/test e2e-gcp-upgrade

@kikisdeliveryservice
Copy link
Contributor Author

@ashcrow those failures don't have anything to do with this pr. no need to retest just yet! :)

@kikisdeliveryservice kikisdeliveryservice force-pushed the mcc-ign-por-que-no-los-dos branch 2 times, most recently from 0f0f7ff to 43ed6b3 Compare April 14, 2020 18:16
@runcom
Copy link
Member

runcom commented Apr 23, 2020

/retest

@runcom
Copy link
Member

runcom commented Apr 23, 2020

found the issue - it's an ordering issue with machineconfig names, we know that MC are ordered when being merged together. I will also make the updateSSHKeys function better as today it just uses the last entry (?!) which is what causes this in tandem

This also explains why it works when you run the test alone in a live cluster but it would have failed in a live cluster as well if MC names were messed up like here. It explains also why Urvashi's PR was failing with a similar error on grep as she had the very same issue about MC names ordering.

This is the content of the rendered MC for this ign3 test:

passwd: {
users: [
{
name: "core",
sshAuthorizedKeys: [
"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUq7W38xCZ9WGSWCvustaMGMT04tRohw6AKGzI7P7xql5lhCAReyt72n9qWQRZsE1YiCSQuTfXI1oc8NpSM7+lMLwj12G8z3I1YT31JHr9LLYg/XIcExkzfBI920CaS82VqmKOpI9+ARHSJBdIbKRI0f5Y+u4xbc5UzKCJX8jcKGG7nEiw8zm+cvAlfOgssMK+qJppIbVcb2iZNTsw5i2aX6FDMyC+b17DQHzBGpNbhZYxuoERZVRcnYctgIzuo6fD60gniX0fVvrchlOnubB1sRYbloP2r6UE22w/dpLKOFE5i7CA0ZzNBERZ94cIKumIH9MiJs1a6bMe89VOjjNV devenv "
]
},
{
name: "core",
sshAuthorizedKeys: [
"1234_test_ign3"
]
},
{
name: "core",
sshAuthorizedKeys: [
"1234_test",
"abc_test"
]
}
]
},

and you can see the MCD is only using the last one:

I0423 09:28:06.841256    2039 update.go:500] user data to be verified before ssh update: {<nil>  []  core false false false <nil>  [1234_test abc_test]  false <nil>}

affected code is here:

// Update a given PasswdUser's SSHKey
func (dn *Daemon) updateSSHKeys(newUsers []igntypes.PasswdUser) error {
	if len(newUsers) == 0 {
		return nil
	}

	// we're also appending all keys for any user to core, so for now
	// we pass this to atomicallyWriteSSHKeys to write.
	var concatSSHKeys string
	for _, k := range newUsers[len(newUsers)-1].SSHAuthorizedKeys {
		concatSSHKeys = concatSSHKeys + string(k) + "\n"
	}
	if !dn.mock {
		// Note we write keys only for the core user and so this ignores the user list
		if err := dn.atomicallyWriteSSHKey(concatSSHKeys); err != nil {
			return err
		}
	}
	return nil
}

I'm fixing it by just updating the name of the MC and update MCD fix (being tested here #1676 as it also fixes the issue here but it's not needed for this PR, just the names change is needed but that PR will prove me wrong and we can go with the other fix maybe and it did lol)

@runcom
Copy link
Member

runcom commented Apr 23, 2020

Pending my fix

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2020
@sinnykumari
Copy link
Contributor

Nice debugging Antonio!

Signed-off-by: Antonio Murdaca <runcom@linux.com>
@runcom
Copy link
Member

runcom commented Apr 23, 2020

was able to push directly (I had wrong remotes setup in my git and assumed it was off because it's the same error, took me some time to figure out and now I was able to push)

/hold cancel

2 similar comments
@runcom
Copy link
Member

runcom commented Apr 23, 2020

was able to push directly (I had wrong remotes setup in my git and assumed it was off because it's the same error, took me some time to figure out and now I was able to push)

/hold cancel

@runcom
Copy link
Member

runcom commented Apr 23, 2020

was able to push directly (I had wrong remotes setup in my git and assumed it was off because it's the same error, took me some time to figure out and now I was able to push)

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2020
@runcom
Copy link
Member

runcom commented Apr 23, 2020

/retest

@runcom
Copy link
Member

runcom commented Apr 23, 2020

/refresh

@runcom
Copy link
Member

runcom commented Apr 23, 2020

/retest

@runcom
Copy link
Member

runcom commented Apr 23, 2020

I think we can leave this as is now and it'll go in as soon as tests turn green as the override is already taken into account

@runcom
Copy link
Member

runcom commented Apr 23, 2020

also github is having issues from time to time today

/retest

@LorbusChris
Copy link
Member

/retest

1 similar comment
@LorbusChris
Copy link
Member

/retest

@yuqi-zhang
Copy link
Contributor

The sshkey fix LGTM, it passes tests on #1676 as well

@runcom
Copy link
Member

runcom commented Apr 23, 2020

fixed finally https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1676/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/2005

I'll pick that commit here as that's the one really fixing the issue here.

Use all available keys all the time, not just last one.
This makes sure the operation can be repeated over and over
and won't depend on ordering either...

Signed-off-by: Antonio Murdaca <runcom@linux.com>
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2020
@runcom
Copy link
Member

runcom commented Apr 23, 2020

/retest

@ashcrow
Copy link
Member

ashcrow commented Apr 23, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, kikisdeliveryservice, LorbusChris, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@runcom
Copy link
Member

runcom commented Apr 23, 2020

/skip

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 23, 2020

@kikisdeliveryservice: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 2f7a92b link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@runcom
Copy link
Member

runcom commented Apr 23, 2020

/skip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants