Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some provisioner commands oddly fail on Windows guests #700

Closed
sneal opened this issue Dec 10, 2013 · 10 comments
Closed

Some provisioner commands oddly fail on Windows guests #700

sneal opened this issue Dec 10, 2013 · 10 comments

Comments

@sneal
Copy link
Contributor

sneal commented Dec 10, 2013

Adding a reconnect call to the Communicator newSession method allows some Windows provisioning commands to succeed. The problem is I don't know why.

func (c *comm) newSession() (session *ssh.Session, err error) {
    log.Println("opening new ssh session")
    if c.client == nil {
        err = errors.New("client not available")
    } else {
        if err := c.reconnect(); err != nil {
            return nil, err
        }
        session, err = c.client.NewSession()
    }

    if err != nil {
        log.Printf("ssh session open error: '%s', attempting reconnect", err)
        if err := c.reconnect(); err != nil {
            return nil, err
        }

        return c.client.NewSession()
    }

    return session, nil
}

Without this change attempting to use an inline shell provisioner to install the Windows NetFx3 feature via dism will fail with an exit code of 5:

"type": "shell", "inline": ["c:/windows/system32/cmd.exe /c dism.exe /online /LogLevel:4 /enable-feature /featurename:NetFx3 /norestart"]

This originally started off as me running the chef-solo provisioner to install SQL Server which would fail. I paired the SQL install failure down to failing to install .NET 3.5.1, and using the shell provisioner I was able to reproduce the same odd behavior.

Running the same dism command from a standard SSH session works as does from a test Go program I wrote. This finally led me to me adding the reconnect call in Packer which made the shell provisioner work.

I also tried gutting the shell provisioner to have it skip uploading the shell script and just running the inline script directly cmd := &packer.RemoteCmd{Command: p.config.Inline[0]}, but then it only works if its the first command via SSH (i.e. first provisioner).

Do you have any insight into why a added SSH reconnect fixed my provisioning command?

@mitchellh
Copy link
Contributor

Is there anything on stdout/stderr to show what is going on?

I'm not sure why this would help since if the session fails to grab... it attempts to reconnect anyways. So all I can think of here is that the SSH server is weird and is accepting the session but isn't actually working?

@sneal
Copy link
Contributor Author

sneal commented Dec 11, 2013

Here's a small Go program the reproduces the exact problem on my Windows box with OpenSSH installed.

Here is the dism command's stdout/stderr when it fails:

Deployment Image Servicing and Management tool
Version: 6.1.7600.16385


Error: 5

An error occurred while attempting to start the servicing process for the image located at C:\. 
For more information, review the log file.

The DISM log file can be found at C:\Windows\Logs\DISM\dism.log

The log has an error (which isn't useful):

DismHostLib: Failed to set synchronization data in the dismhost.exe process.

For whatever reason without a new tcp connection the command will fail with exit code 5. With a clean tcp connection the command runs without error. I ran Process Monitor on the guest when it was successful and unsuccessful and compared the 2 runs but didn't find anything that stood out. Up until the point of failure everything looked normal on the Windows side of things.

The really odd thing is, most commands work just fine on a re-used connection. This dism command is the only case I've seen (so far) that seems to be affected by this.

@joefitzgerald
Copy link

@sneal Well on first glance i thought that #200 might relate to this, but if the SSH session is still active and valid then the fix to 200 won't help out here. I know that @dylanmei switched from Cygwin/OpenSSH (http://cygwin.com/cgi-bin2/package-cat.cgi?file=x86_64%2Fopenssh%2Fopenssh-6.4p1-1-src&grep=ssh) to standalone OpenSSH (http://www.mls-software.com/opensshd.html) and I suspect that a diff of the configuration of each will reveal some important differences.

I still have an image that was generated using the old method and can package it up and sent it to you to save you the time of building one again...?

@joefitzgerald
Copy link

Could this be as simple as SSH v1 (which does not allow multiple commands to be executed) vs SSH v2 (which does)?

@sneal
Copy link
Contributor Author

sneal commented Dec 12, 2013

I tried Cygwin SSH and it fails the same exact way using my test Go program. I even built a brand new box using misheska's Cygwin template.

@sneal
Copy link
Contributor Author

sneal commented Dec 17, 2013

I'm declaring SSH bankruptcy on this one. There's definitely an issue here, but I have no idea what it is. Even if I figure this one out there are other issues with the SSH environment being just enough different to break other installers and scripts (like installing IIS).

I've modified the vagrant-windows PowerShell scripts that execute chef-solo through a Windows scheduled task without making any changes to Packer. It works, its reliable, and its communicator independent.

@sneal sneal closed this as completed Dec 17, 2013
@sneal
Copy link
Contributor Author

sneal commented Dec 23, 2013

Interestingly the second command run through the SSH connection outputs a different $USERNAME.

    runCmd(client, "echo $USER; echo $USERNAME")
    runCmd(client, "echo $USER; echo $USERNAME")
Administrator
Administrator
Administrator
sshd_server

Definitely a clue.

@sneal
Copy link
Contributor Author

sneal commented Dec 24, 2013

It turns out this is either a bug in the Windows OpenSSH server or a feature of the Win32 API. I'm not sure which, but ultimately it doesn't matter. To workaround this issue you need to configure the OpenSSH service thusly:

  1. Don't use privilege separation.
  2. After installing, configure opensshd to run as Administrator
  3. Grant the Administrator account SeServiceLogonRight privilege

@kouroshparsa
Copy link

I tried the above steps without any luck.
I figured out that when run remotely, the DISM path was wrong. I corrected it but it still encountered other errors hence I decided not to use the chef iis recipes at all.

@databus23
Copy link
Contributor

This hit me hard as well.
I tried to setup winrm via a shell provisioners and get "Access denied" all over the place when trying to setup winrm.
The same batch file works when I manually upload and execute it via ssh.

The only difference between packer und manual ssh I noticed was the difference in the $USERNAME variable (sshd_server instead of the actual user logging in).

This is very odd.

@ghost ghost locked and limited conversation to collaborators Apr 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants