Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsphere-iso builder "boot_command" causes "invalid fault" in vSphere #8957

Closed
rmetschke opened this issue Mar 26, 2020 · 24 comments · Fixed by #10541
Closed

vsphere-iso builder "boot_command" causes "invalid fault" in vSphere #8957

rmetschke opened this issue Mar 26, 2020 · 24 comments · Fixed by #10541

Comments

@rmetschke
Copy link

rmetschke commented Mar 26, 2020

Overview of the Issue

Almost every time I try to build a new vSphere VM using the Ubuntu Server installer, it fails shortly after inputting the boot command. This is similar to what is described in jetbrains-infra/packer-builder-vsphere#63. The error probably a little different due to running vCenter 6.7u3 and vSphere 6.7u3. Packer version is 1.5.4. Here is the Packer build output.

$ packer build ubuntu_18.04.3_http.json
vsphere-iso: output will be in this color.

==> vsphere-iso: Creating VM...
==> vsphere-iso: Customizing hardware...
==> vsphere-iso: Mounting ISO images...
==> vsphere-iso: Starting HTTP server on port 8185
==> vsphere-iso: Set boot order...
==> vsphere-iso: Power on VM...
==> vsphere-iso: Waiting 10s for boot...
==> vsphere-iso: HTTP server is working at http://x.x.x.x:8185/
==> vsphere-iso: Typing boot command...
==> vsphere-iso: Power off VM...
==> vsphere-iso: The attempted operation cannot be performed in the current state (Powered on).
==> vsphere-iso: Destroying VM...
==> vsphere-iso: The attempted operation cannot be performed in the current state (Powered on).
Build 'vsphere-iso' errored: error typing a boot command: ServerFaultCode: A general system error occurred: Invalid fault

==> Some builds didn't complete successfully and had errors:
--> vsphere-iso: error typing a boot command: ServerFaultCode: A general system error occurred: Invalid fault

==> Builds finished but no artifacts were created.

Simplified Packer Buildfile

{
  "builders": [
  {
    "type": "vsphere-iso",

    "vcenter_server": "{{user `vcenter_host`}}",
    "username": "{{user `vsphere_username`}}",
    "password": "{{user `vsphere_password`}}",
    "insecure_connection": "true",
    "cluster": "{{user `cluster`}}",
    "host": "{{user `vmw-01`}}",

    "vm_name": "packer-test",
    "guest_os_type": "ubuntu64guest",
    "CPUs": 2,
    "cpu_cores": 2,
    "RAM": 4096,

    "disk_controller_type": "pvscsi",
    "disk_size": "102400",
    "disk_thin_provisioned": "true",

    "network_card": "vmxnet3",
    "network": "{{user `network`}}",

    "boot_order": "disk,cdrom",
    "boot_wait": "10s",
    "convert_to_template": "false",

    "ssh_username": "{{user `ssh_username`}}",
    "ssh_password": "{{user `ssh_password`}}",
    "ssh_timeout": "15m",

    "iso_paths": [
    	"[{{user `vmw-01_datastore`}}] Images/Linux/ubuntu-18.04.3-live-server-amd64.iso"
    ],
    "http_directory": "http",
    "boot_command": [
        "<enter><enter><wait3><f6><wait3><esc><wait3>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs><bs><bs><bs><bs><bs><bs><bs>",
        "<bs><bs><bs>",
        "/install/vmlinuz noapic ",
        "boot=casper ",
        "initrd=/casper/initrd ",
        "preseed/url=http://{{.HTTPIP}}:{{.HTTPPort}}/preseed.cfg ",
        "debian-installer=en_US auto locale=en_US kbd-chooser/method=us ",
        "grub-installer/bootdev=/dev/sda<wait> ",
        "fb=false debconf/frontend=noninteractive ",
        "-- <enter>"
    ]
  }]
}

Log Fragments and crash.log files

https://gist.github.com/rmetschke/55bef5ef4b2002238ec32b290318b3ca

@SwampDragons
Copy link
Contributor

Hi, thanks for opening this. We'll take a look when we can.

@jhawk28
Copy link
Contributor

jhawk28 commented Mar 28, 2020

It may be an issue with your boot_command. Its powering off the VM for some reason after it does the back spaces. Here is the boot_command that I use for ubuntu:

      "boot_command": [
        "<esc><wait>",
        "<esc><wait>",
        "<enter><wait>",
        "/install/vmlinuz<wait>",
        " auto<wait>",
        " console-setup/ask_detect=false<wait>",
        " console-setup/layoutcode=us<wait>",
        " console-setup/modelcode=pc105<wait>",
        " debconf/frontend=noninteractive<wait>",
        " debian-installer=en_US.UTF-8<wait>",
        " fb=false<wait>",
        " initrd=/install/initrd.gz<wait>",
        " kbd-chooser/method=us<wait>",
        " keyboard-configuration/layout=USA<wait>",
        " keyboard-configuration/variant=USA<wait>",
        " netcfg/get_domain=vm<wait>",
        " netcfg/get_hostname=vagrant<wait>",
        " locale=en_US.UTF-8<wait>",
        " grub-installer/bootdev=/dev/sda<wait>",
        " noapic<wait>",
        " preseed/file=/media/{{user `preseed_path`}}<wait>",
        " -- <wait>",
        "<enter><wait>"
      ],

@KOConchobhair
Copy link

Hello @rmetschke
I believe I saw this same behavior on my vCenter 6.7 as well with packer v1.5.5.
I don't think it's related to jetbrains-infra/packer-builder-vsphere#63 at all.
I believe it is simply a timeout of packer waiting for the VM to boot. I am actually using the exact same boot_command as you. If you use -timestamp-ui command line parameter for packer, you can see the the default 10 seconds is sometimes not enough and things get screwy because packer tries to cancel the process, but the VM has JUST powered on and so vCenter doesn't allow it.
I fixed it in my environment by adding "boot_wait": "15s". Hope this helps!

@SwampDragons
Copy link
Contributor

Thanks for the note @KOConchobhair. Does setting "boot_wait" solve this for you, @rmetschke ?

@rmetschke
Copy link
Author

This isn't a solution for me as going longer than 10 seconds misses the window for stopping the boot sequence. I should add that this is Ubuntu server 18.04.3.

@ghost ghost removed stage/waiting-reply labels Apr 7, 2020
@SwampDragons
Copy link
Contributor

Here's a knowledge base article I found about that error: https://kb.vmware.com/s/article/1014371. Is it possible that your vm is getting moved somehow after being created, like maybe vmotion is turned on or something?

@rmetschke
Copy link
Author

This is environment is just the basic vSphere + vCenter package, so no vMotion.

@jhawk28
Copy link
Contributor

jhawk28 commented Jul 6, 2020

@amitbhadra saw this repeatedly with the new boot command implementation from @sylviamoss. We may see this occur more frequently with 1.6.1. It also appeared to be intermittent (worked ok after a few tries). It may be that we just need to catch the error and try again.

@EzraBrooks
Copy link

I'm having this issue constantly while trying to build machines on vSphere - it also appears that it sometimes just drops parts of my command.

@jhawk28
Copy link
Contributor

jhawk28 commented Jul 27, 2020

@EzraBrooks what version of vmware and packer?

@EzraBrooks
Copy link

vSphere/ESXi 6.7 and Packer 1.6.0.

@jhawk28
Copy link
Contributor

jhawk28 commented Jul 27, 2020

@EzraBrooks can you try using the 1.6.1-Nightly? It changes the way the boot command is entered, but uses the same underlying vsphere APIs. If it still happens, I'm wondering if we can put in some retry logic if it gets an error.

@EzraBrooks
Copy link

Appears to work without any invalid faults or any dropping of inputs. Thanks!

@laurentandrian
Copy link

I am running vSphere/ESXi 6.7 and Packer 1.6.2 and I run into the same issue.

==> vsphere-iso: files/CentOS-7-x86_64-Minimal-1908.iso?checksum=sha256%3A9a2c47d97b9975452f7d582264e9fc16d108ed8252ac6816239a3b58cef5c53d => /opt/fs/buildovf/files/CentOS-7-x86_64-Minimal-1908.iso
==> vsphere-iso: Uploading CentOS-7-x86_64-Minimal-1908.iso to packer_cache/CentOS-7-x86_64-Minimal-1908.iso
==> vsphere-iso: File already uploaded; continuing
==> vsphere-iso: Creating VM...
==> vsphere-iso: Customizing hardware...
==> vsphere-iso: Mounting ISO images...
==> vsphere-iso: Adding configuration parameters...
==> vsphere-iso: Creating floppy disk...
vsphere-iso: Copying files flatly from floppy_files
vsphere-iso: Copying file: http/ks.cfg
vsphere-iso: Done copying files from floppy_files
vsphere-iso: Collecting paths from floppy_dirs
vsphere-iso: Resulting paths from floppy_dirs : []
vsphere-iso: Done copying paths from floppy_dirs
==> vsphere-iso: Uploading created floppy image
==> vsphere-iso: Adding generated Floppy...
==> vsphere-iso: Starting HTTP server on port 8345
==> vsphere-iso: Set boot order temporary...
==> vsphere-iso: Power on VM...
==> vsphere-iso: Waiting 20s for boot...
==> vsphere-iso: HTTP server is working at http://10.122.42.222:8345/
==> vsphere-iso: Typing boot command...
==> vsphere-iso: Error running boot command: error typing a boot command (code, down) 41, false: ServerFaultCode: Cannot complete the operation due to an incorrect request to the server.
==> vsphere-iso: Clear boot order...
==> vsphere-iso: Power off VM...
==> vsphere-iso: Deleting Floppy image ...
==> vsphere-iso: Destroying VM...
Build 'vsphere-iso' errored after 45 seconds 947 milliseconds: Error running boot command: error typing a boot command (code, down) 41, false: ServerFaultCode: Cannot complete the operation due to an incorrect request to the server.

==> Wait completed after 45 seconds 948 milliseconds

==> Some builds didn't complete successfully and had errors:
--> vsphere-iso: Error running boot command: error typing a boot command (code, down) 41, false: ServerFaultCode: Cannot complete the operation due to an incorrect request to the server.

==> Builds finished but no artifacts were created.

I added a boot_wait of 20s but same issue.

@uutest74
Copy link

I have the same issue Packer 1.6.2 vSphere 6.5

  "boot_command": [
    "<esc><wait>",
    "linux ks=hd:fd0:/ks.cfg<enter>"
  ],

Error running boot command: error typing a boot command (code, down) 41, false: ServerFaultCode: Cannot complete the operation due to an incorrect request to the server.

If i choose ESXi 6.0 build 13635687 cluster - this error appears.
If i choose ESXi 6.5.0 build 13932383 cluster - all OK, no errors.

@jhawk28
Copy link
Contributor

jhawk28 commented Oct 30, 2020

@uutest74 ESXI 6.0 doesn't support the keyboard API.

@dreibh
Copy link
Contributor

dreibh commented Nov 27, 2020

Sometimes, I get this error when Packer is typing in the boot_command:

==> Some builds didn't complete successfully and had errors:
--> vsphere-iso: Error running boot command: error typing a boot command (code, down) 18, false: ServerFaultCode: A general system error occurred: Invalid fault

The error sometimes happens when Packer is typing in the boot_command. Usually, there are already some characters typed in successfully, i.e. the issue here is not a permission to input key presses. The VM remains running after the failure.

Packer: 1.6.5
vSphere: 6.7.0 Build 16616668

@timblaktu
Copy link
Contributor

timblaktu commented Jan 29, 2021

As I just posted on the corresponding open issue on the jetbrains repo, I'm seeing this intermittently (5-10% of my CI runs), using a Debian Buster packer host, running Packer v1.6.5 on Jenkins v2.249.2, interfacing with vSphere v6.7.0.44000, and a Debian Buster guests, with default settings in packer template for key and keygroup intervals, and the below boot_command.

Incidentally, my Jenkins/Packer host is also running in the same vSphere cluster (with nested virtualization on, obviously), but from reading the other reports, this doesn't seem to be relevant.

Are people having any success with setting longer key intervals? I'm loathe to do that bc the "typing boot command..." stage already takes a seemingly long time. I tend to think that my environment should have one of the lowest latency possible between the packer processes and the vSphere guest process, since it's all going through the internal networking of our vSphere data center (>=30Gbps).

boot_command = ["<esc><wait>", "install <wait>", "preseed/url=http://{{ .HTTPIP }}:{{ .HTTPPort }}/preseed.cfg <wait>", "debian-installer=en_US.UTF-8 <wait>", "auto <wait>", "locale=en_US.UTF-8 <wait>", "kbd-chooser/method=us <wait>", "keyboard-configuration/xkb-keymap=us <wait>", "netcfg/get_hostname={{ .Name }} <wait>", "netcfg/get_domain=biamp.com <wait>", "fb=false <wait>", "debconf/frontend=noninteractive <wait>", "console-setup/ask_detect=false <wait>", "console-keymaps-at/keymap=us <wait>", "vga=884 <wait>", "grub-installer/bootdev=/dev/sda <wait>", "<enter><wait>"]

(BTW, I've played with several things to try to get the install console to be larger than the default, the above vga=884 being the most recent. Anyone know how to get that working for an env like mine?)

EDIT: I have changed my boot_wait from 5s to 10s and will report back if the issue persists.

@jhawk28
Copy link
Contributor

jhawk28 commented Jan 29, 2021

@timblaktu try to see if this build works: https://app.circleci.com/pipelines/github/hashicorp/packer/8924/workflows/ef5fd9b6-c559-4165-a61a-ac257d675a80/jobs/105663/artifacts

It will attempt to retry sending the keystroke after it gets an error

@timblaktu
Copy link
Contributor

THanks @jhawk28. Perhaps someone who is experiencing this frequently can test this build, e.g. @rmetschke?

This issue is not consistently repeatable for me. Perhaps, however, I could force it to happen by changing the key interval or keygroup interval? If you can provide some guidance on how to configure these settings I am willing to experiment. The vsphere-iso docs show that there is a builder parameter for boot_keygroup_interval but I can't find anywhere one for setting the boot_key_interval. Yesterday I grepped all over for it and found some old PR references to PACKER_KEY_INTERVAL but I can't find this in the current documentation anywhere.

@jhawk28
Copy link
Contributor

jhawk28 commented Jan 29, 2021

the boot_keygroup_interval is the parameter that is used to define how much time is used between each key. It defaults to 100ms. Here is the code snippet that uses it:

func NewUSBDriver(send SendUsbScanCodes, interval time.Duration) *usbDriver {
	// We delay (default 100ms) between each key event to allow for CPU or
	// network latency. See PackerKeyEnv for tuning.
	keyInterval := PackerKeyDefault
	if delay, err := time.ParseDuration(os.Getenv(PackerKeyEnv)); err == nil {
		keyInterval = delay
	}
	// override interval based on builder-specific override.
	if interval > time.Duration(0) {
		keyInterval = interval
	}

Side note: the jetbrains repo is basically dead. Packer core is the maintainer of the plugin now.

@timblaktu
Copy link
Contributor

@jhawk28 I trust you, but why then do the docs say that the boot_keygroup_interval is the "Time to wait after sending a GROUP of key pressses." The wording makes it sound like it's something other than what you say it is (time between each key). Perhaps I should file another doc PR to clarify this (and fix the "pressses" typo)..

Sounds like you're saying boot_keygroup_interval is now THE ONLY knob users can tweak via their builder templates to change the boot_command timing, correct?

@jhawk28
Copy link
Contributor

jhawk28 commented Jan 29, 2021

@timblaktu I'm just reading the code. The boot_keygroup_interval is definitely just passed in and used as the key interval.

d := bootcommand.NewUSBDriver(sendCodes, s.Config.BootGroupInterval)

@ghost
Copy link

ghost commented Apr 19, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants