Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins docker-plugin passes /usr/sbin/sshd command by default #62

Closed
GunArm opened this issue Sep 3, 2020 · 2 comments
Closed

Jenkins docker-plugin passes /usr/sbin/sshd command by default #62

GunArm opened this issue Sep 3, 2020 · 2 comments

Comments

@GunArm
Copy link
Contributor

GunArm commented Sep 3, 2020

I have been using these images under the impression that they are made to be used with docker-plugin in jenkins to start these containers as ephemeral agents. However I've today come to the conclusion that most people must be using them for long-running/manually started agents since there's little talk of this issue.

I had previously been having lots of really strange trouble with my linux agents, especially #46 that I have still yet to figure out why it simply does not work. Then this week I needed to add some infra for windows build containers and ran into another weird issue such that these containers simply were not working for me out of the box.

As a baseline I was able to run the windows agent image manually and ssh into it successfully to get to a powershell prompt. BUT when the windows containers were being started by docker plugin, then they would die within 3-4 seconds.

By calling 'docker log {container}' soon after a container died I was able to see the following error:

PS C:\docker_testing\ssh-key-temp> docker logs wonderful_thompson
&: C:\ProgramData\Jenkins\setup-sshd.ps1:88
Line |
  88 |          & $Cmd
     |            ~~~~
     | The term '/usr/sbin/sshd' is not recognized as the name of a
     | cmdlet, function, script file, or operable program. Check the
     | spelling of the name, or if a path was included, verify that
     | the path is correct and try again.

Something in the windows container is trying to run /usr/sbin/sshd? That's definitly not right! I scoured the code for references to the string /usr/sbin/sshd, and it doesn't exist in the repo. I looked at line 88 and see this variable is the command line arguments being passed into setup-sshd. It would seem jenkins is for some reason passing this command into my windows container. I looked all over my configuration for this agent and there was nothing unusual that should be causing that. I made a custom version of the image with extra logging and bypasses this line 88, and the container was able to open. I kept searching for a cause and eventually found:
This Docker Plugin doc says:

By default, the docker plugin will execute /usr/sbin/sshd -D, therefore it is not recommended that you set the ENTRYPOINT unless you plan to pass extra arguments from Jenkins

and
This Docker Plugin issue corroborates this default behavior in the context of complaining about it causing problems for their entry point scripts. The general reaction again is "don't use entrypoint scripts with docker plugin", which I found odd because that's how docker-ssh-agent works, and understandably since it has to add the ssh public keys at startup.

We need our entrypoint script, and there is no (proper) way to make docker-plugin NOT do this, but there is a hacky way: The logic seems to be if the command you provide in your agent configuration is empty, it will run the image with "/usr/sbin/sshd -D", otherwise it will use what you provide for the command. So you can put some kind of no-op in there and it will pass that instead of /usr/sbin/sshd.

Setup-sshd, both windows and linux versions, look at that argument: If it was an ssh key provided, adds the sshkey. If it was not an ssh key, it tries to execute whatever that argument is, if it's null or empty it moves forward in the script to make the environment variables global and then run sshd at the end.

I have always left the command empty, and apparently this has gone unnoticed on my linux agents because /usr/sbin/sshd exists so it doesn't error, but it DOES mean that sshd starts before (and forever blocks) exporting the environment variables to /etc/environment. (I'm thinking, has THIS been the cause of #46 for me all long? I will test this)

In windows the same logic follows, setup-sshd detects that "/usr/sbin/sshd" is not an ssh key so it tries to execute it, which is an error in windows, and the container dies immediately.

I understand the need for the peice of code that is like

if(![System.String]::IsNullOrWhiteSpace($Cmd)) {
    if($Cmd -match "^ssh-.*") {
        Write-Key $Cmd
    } else {
        & $Cmd
        exit
    }
}

You want to easily take the ssh key as a run argument but also allow the flexibility of running alternate commands.

But I propose it be modified to specifically ignore "/usr/sbin/sshd" when passed as the command, as by default by docker-plugin. On windows this breaks everything, and on linux causes subtle unwanted behavior, and if someone was running sshd command we know we're going to run it in a few lines anyway.

# UNTESTED
if(![System.String]::IsNullOrWhiteSpace($Cmd)) {
    if($Cmd -match "^ssh-.*") {
        Write-Key $Cmd
    } else if($Cmd -match "/usr/sbin/sshd") {
        # ignore default command from jenkins docker plugin
    } else {
        & $Cmd
        exit
    }
}

....
Start-Service sshd

Ultimatly I think the true fix for this is in the docker plugin. Some general solution is needed now that windows containers are a thing. But doing this would make these images work with docker plugin as a stop gap. The alternative would be some big confusing explanation in the documentation to tell people to put a janky no-op in the command field of their agent configuration.
The same logical fix could be applied to the linux setup-sshd script to make the behavior more predictable (ie not skipping the last part of the script).

If I put one together, would someone be willing to merge a pull request like this, in theory?

@GunArm
Copy link
Contributor Author

GunArm commented Sep 3, 2020

I did find some discussion about docker-plugin behavior regarding the command, intersected with mention of docker-ssh-agent
jenkinsci/docker-plugin#745

But notably their fix for the command is tied up in a complex of other things they want to do, and they say it's stalled. So I still think this would be a stop gap, as well as making it backward compatible (in the future) with versions of docker-plugin prior to that fix having come out.

@GunArm
Copy link
Contributor Author

GunArm commented Sep 4, 2020

I went ahead and made a pull request of what I did locally. I couldn't figure out how to link it to the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant