Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1793323: Remove the use of /etc/passwd in favor of cri-o #1015

Merged
merged 3 commits into from
Feb 4, 2020

Conversation

akram
Copy link
Contributor

@akram akram commented Jan 22, 2020

cri-o automatically appends the arbitrary user assigned by openshift to /etc/passwd.
/etc/passwd doesn't have to be group writable then, neither we need nss_wrapper.

This PR removes the use of both for 4.2+

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 22, 2020
@openshift-ci-robot
Copy link
Contributor

@akram: This pull request references Bugzilla bug 1793323, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1793323: Use NSS_WRAPPER instead of /etc/passwd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 22, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2020
@akram
Copy link
Contributor Author

akram commented Jan 22, 2020

aws flake:

Error: Error waiting for AMI (ami-065da375cd98edac8) to be ready: timeout while waiting for state to become 'available' (last state: 'pending', timeout: 40m0s)"
level=error
level=error msg="  on ../tmp/openshift-install-145193433/main.tf line 104, in resource \"aws_ami_copy\" \"main\":"
level=error msg=" 104: resource \"aws_ami_copy\" \"main\" {"

/test e2e-aws-jenkins
/test e2e-aws

@akram
Copy link
Contributor Author

akram commented Jan 22, 2020

/assign @gabemontero
/assign @waveywaves

can you PTAL ?

@waveywaves
Copy link
Contributor

/retest

@waveywaves
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor

/hold

A couple of points @akram @waveywaves

  • did you already try, at least against a 4.x install you initiated, or simply removing this stuff, and seeing if the crio manipulations of the password file were sufficient, per @bparees 's suggestion ?
  • if the crio thing works, we only need this in 3.11
  • the last e2e-aws-jenkins failure was a AWS flake, but eventually, I do not believe it will pass for you until move off more nodejs:8 references origin#24434 merges

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 22, 2020
@akram
Copy link
Contributor Author

akram commented Jan 22, 2020

@gabemontero I tried on a 4.2 cluster and indeed, /etc/passwd contains the uid of the the arbitrary user

sh-4.2$ tail -1 /etc/passwd
1000520000:x:1000520000:0:1000520000 user:/:/sbin/nologin

So, this is kind of redundant with the use of NSS_WRAPPER.

sh-4.2$ tail -1 /var/lib/jenkins/passwd
default:x:1000520000:0:Default Application User:/var/lib/jenkins:/sbin/nologin

I wanted to be able to simply cherry-pick the commit to 4.3, 4.2 and 4.1 . So, does it harm to add it?

@gabemontero
Copy link
Contributor

@gabemontero I tried on a 4.2 cluster and indeed, /etc/passwd contains the uid of the the arbitrary user

sh-4.2$ tail -1 /etc/passwd
1000520000:x:1000520000:0:1000520000 user:/:/sbin/nologin

So, this is kind of redundant with the use of NSS_WRAPPER.

OK, good wrt confirming @bparees 's notion

IIRC, the thought was that this did not exist in 4.1, but confirming that would be ideal.

sh-4.2$ tail -1 /var/lib/jenkins/passwd
default:x:1000520000:0:Default Application User:/var/lib/jenkins:/sbin/nologin

I wanted to be able to simply cherry-pick the commit to 4.3, 4.2 and 4.1 . So, does it harm to add it?

My preference would be

  • remove the manipulation vs. switching to nss_wrapper in master/4.3/4.2
  • once validated in those streams, where a bugzilla of course would have to be in play at that point, if we choose to do in 4.1, do a "manual cherry pick" where you employ nss_wrapper
  • and of course, in parallel, employ nss_wrapper in 3.11

But of course if anyone here wants to debate that approach further, let's do so.

@bparees
Copy link
Contributor

bparees commented Jan 22, 2020

IIRC, the thought was that this did not exist in 4.1, but confirming that would be ideal.

it was backported to some 4.1.z: https://coreos.slack.com/archives/C02UD0TT3/p1579544937000700

i think it's acceptable to say that for 4.x the fix requires you upgrade to a level of 4.x that includes the crio behavior, but the risk here is that if we publish a 4.1 image that relies on that crio behavior, it breaks existing customers until they upgrade. that would be bad.

but i agree with @gabemontero that the desired final state is to not use nss_wrapper, it pulls in an additional package which has been problematic in the past. So i would agree that we should only do the nss_wrapper fix in the fewest possible releases (3.11, maybe 4.1).

@akram
Copy link
Contributor Author

akram commented Jan 22, 2020

@bparees and @gabemontero
I will remove then the use of nss_wrapper and delete the corresponding code for this PR

@gabemontero
Copy link
Contributor

@bparees and @gabemontero
I will remove then the use of nss_wrapper and delete the corresponding code for this PR

great thanks @akram

btw, the fix in openshift/origin#24434 had passing in e2e-aws-jenkins ... just waiting for various flakes to be avoided to get that mergeable

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2020
@gabemontero
Copy link
Contributor

update the PR title @akram

@akram akram changed the title Bug 1793323: Use NSS_WRAPPER instead of /etc/passwd Bug 1793323: Remove the use of /etc/passwd in favor or cri-o Jan 22, 2020
@akram
Copy link
Contributor Author

akram commented Jan 22, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 22, 2020
@akram akram changed the title Bug 1793323: Remove the use of /etc/passwd in favor or cri-o Bug 1793323: Remove the use of /etc/passwd in favor of cri-o Jan 22, 2020
@openshift-ci-robot
Copy link
Contributor

@akram: This pull request references Bugzilla bug 1793323, which is valid.

In response to this:

Bug 1793323: Remove the use of /etc/passwd in favor of cri-o

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor

/hold

no reason to have this PR retest (cause the bot sees the lgtm) until the origin PR @waveywaves has up merges to change the config option used.

we can unhold once the origin PR merges.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2020
@gabemontero
Copy link
Contributor

from the last failing run:

fail [github.com/openshift/origin/test/extended/util/url/url.go:256]: Test Panicked: parse http://Flag --config has been deprecated, use --kubeconfig instead
172.30.0.124:Flag --config has been deprecated, use --kubeconfig instead
80/: net/url: invalid control character in URL

@waveywaves
Copy link
Contributor

/retest

@waveywaves
Copy link
Contributor

/test e2e-aws-jenkins

2 similar comments
@akram
Copy link
Contributor Author

akram commented Jan 31, 2020

/test e2e-aws-jenkins

@akram
Copy link
Contributor Author

akram commented Jan 31, 2020

/test e2e-aws-jenkins

@gabemontero
Copy link
Contributor

I believe you all have a problem related perhaps to this change @akram @waveywaves

From the e2e-aws-jenkins test logs:

2020-01-31 12:20:32.057+0000 [id=107]	SEVERE	o.c.j.p.k.KubernetesLauncher#logLastLines: Error in provisioning; agent=KubernetesSlave name: maven-t8qbf, template=PodTemplate{inheritFrom='', name='maven', slaveConnectTimeout=0, label='maven', serviceAccount='jenkins', nodeSelector='', containers=[ContainerTemplate{name='jnlp', image='image-registry.openshift-image-registry.svc:5000/openshift/jenkins-agent-maven:latest', alwaysPullImage=true, workingDir='/tmp', command='', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='', resourceRequestMemory='', resourceLimitCpu='', resourceLimitMemory=''}], envVars=[KeyValueEnvVar [getValue()=null, getKey()=http_proxy], KeyValueEnvVar [getValue()=null, getKey()=https_proxy], KeyValueEnvVar [getValue()=null, getKey()=no_proxy]]}. Container jnlp. Logs: INFO: Using Remoting version: 3.36
Jan 31, 2020 12:20:30 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /.jenkins/cache/jars
Exception in thread "main" java.io.IOException: Failed to initialize FileSystem JAR Cache in /.jenkins/cache/jars
	at hudson.remoting.Engine.startEngine(Engine.java:284)
	at hudson.remoting.Engine.startEngine(Engine.java:243)
	at hudson.remoting.jnlp.Main.main(Main.java:275)
	at hudson.remoting.jnlp.Main._main(Main.java:270)
	at hudson.remoting.jnlp.Main.main(Main.java:228)
Caused by: java.lang.IllegalArgumentException: Root directory not writable: /.jenkins/cache/jars
	at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:62)
	at hudson.remoting.Engine.startEngine(Engine.java:282)
	... 4 more
Caused by: java.nio.file.AccessDeniedException: /.jenkins
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
	at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
	at java.base/java.nio.file.Files.createDirectory(Files.java:689)
	at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:796)
	at java.base/java.nio.file.Files.createDirectories(Files.java:782)
	at hudson.remoting.Util.mkdirs(Util.java:163)
	at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:60)
	... 5 more

@gabemontero
Copy link
Contributor

/hold

@akram
Copy link
Contributor Author

akram commented Feb 3, 2020

This is related to cri-o that is setting home as "/" for the injected user.

jenkins:x:999:997:Jenkins Automation Server:/var/lib/jenkins:/bin/false
1000580000:x:1000580000:0:1000580000 user:/:/sbin/nologin

however:

oc rsh jenkins-1-lv984
sh-4.2$ echo $HOME
/var/lib/jenkins
```

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 3, 2020
@gabemontero
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 3, 2020
@akram
Copy link
Contributor Author

akram commented Feb 3, 2020

/retest

@akram
Copy link
Contributor Author

akram commented Feb 3, 2020

/test e2e-aws-jenkins
/test e2e-aws

@akram
Copy link
Contributor Author

akram commented Feb 4, 2020

Now, the error happens also with maven agent, where the mvn command in agent also needs to get specified the user.home to be specified.

@waveywaves
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akram, waveywaves

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 24f155a into openshift:master Feb 4, 2020
@openshift-ci-robot
Copy link
Contributor

@akram: All pull requests linked via external trackers have merged. Bugzilla bug 1793323 has been moved to the MODIFIED state.

In response to this:

Bug 1793323: Remove the use of /etc/passwd in favor of cri-o

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akram
Copy link
Contributor Author

akram commented Feb 4, 2020

/cherry-pick release-4.3

@openshift-cherrypick-robot

@akram: #1015 failed to apply on top of branch "release-4.3":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	2/Dockerfile.localdev
M	2/Dockerfile.rhel7
M	2/contrib/s2i/run
A	HACKING.md
Falling back to patching base and 3-way merge...
Removing slave-base/contrib/bin/generate_container_user
CONFLICT (modify/delete): HACKING.md deleted in HEAD and modified in Remove /etc/passwd use on behfalf of cri-o injecting user. Version Remove /etc/passwd use on behfalf of cri-o injecting user of HACKING.md left in tree.
Auto-merging 2/contrib/s2i/run
Auto-merging 2/Dockerfile.rhel7
Auto-merging 2/Dockerfile.localdev
Patch failed at 0001 Remove /etc/passwd use on behfalf of cri-o injecting user

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants