-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bhyve: enhance debug info for memory range clash #463
Conversation
Explain what the two clashing regions are.
Are we sure that that "claimed" is the right wording? I would lean more towards "reserved", as that seems to be a more accurate description of the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reword section for clariry
@@ -109,9 +109,11 @@ mmio_rb_add(struct mmio_rb_tree *rbt, struct mmio_rb_range *new) | |||
|
|||
if (overlap != NULL) { | |||
#ifdef RB_DEBUG | |||
printf("overlap detected: new %lx:%lx, tree %lx:%lx\n", | |||
printf("overlap detected: new %lx:%lx, tree %lx:%lx, '%s' " | |||
"claims region already claimed for '%s'\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reword to "claims region already reserved for"
This was approved in https://reviews.freebsd.org/D29696 and I landed as efec757 |
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: #463
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463
libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: freebsd/uefi-edk2#9 Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b509 ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b509 Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b509 introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b509 Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: freebsd/freebsd-src#463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b509 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada29 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307
Includes all commits up to 2022/02/06 or d21e71efce39. ---- libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b5090487d Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: https://github.com/freebsd/freebsd-src/pull/463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b5090487d MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e3ca895a14 ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada297e897 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307 ---- bhyve: Nuke double-semicolons A distinct number of double-semicolons ended up in bhyve. Take a pass at getting rid of many of these harmless typos. MFC after: 3 days ---- bhyve: Fix pci device node key in bhyve_config.5 PCI device node key in the manual page is wrong. It should be pci.bus.slot.function. MFC after: 3 days ---- bhyve: Support setting the disk serial number for VirtIO block devices. Reviewed by: allanjude Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D31983 ---- bhyve: Update the -G description in the SYNPOSIS. It was missing both the 'w' flag and 'bind_address'. ---- bhyve_config.5: Document gdb.address. ---- bhyve: Add an empty case for event types in mevent_kq_fflags(). This fixes a -Wswitch error raised by GCC 9. Differential Revision: https://reviews.freebsd.org/D31938 ---- bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. Reviewed by: bz, grehan, jhb MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32359 ---- bhyve.8: Fix markup of the -G flag ---- bhyve: Update usage and synopsis for the -k flag Let's make it clear to users that -k is for configuration files. Also, point to bhyve_config(5) in the paragraph describing the flag. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32467 ---- bhyve: ignore low bits of CFGADR Bhyve could emulate wrong PCI registers. In the best case, the guest reads wrong registers and the device driver would report some errors. In the worst case, the guest writes to wrong PCI registers and could brick hardware when using PCI passthrough. According to Intels specification, low bits of CFGADR should be ignored. Some OS like linux may rely on it. Otherwise, bhyve could emulate a wrong PCI register. E.g. If linux would like to read 2 bytes from offset 0x02, following would happen. linux: outl 0x80000002 at CFGADR inw at CFGDAT + 2 bhyve: cfgoff = 0x80000002 & 0xFF = 0x02 coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04 Bhyve would emulate the register at offset 0x04 not 0x02. Reviewed By: #bhyve, grehan Differential Revision: https://reviews.freebsd.org/D31819 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix the WITH_BHYVE_SNAPSHOT build Note, this breaks compatibility with snapshots generated by older builds of bhyve(8). Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") Reported by: Greg V <greg@unrelenting.technology> Reviewed by: grehan, bz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32523 ---- bhyve: Bump the SMBIOS firmware version to 14.0 for 14-CURRENT Bump the firmware version to 14.0 and set the firmware release date to today. Reviewed by: jhb, bz, imp Differential Revision: https://reviews.freebsd.org/D32534 ---- bhyve: use physical lobits for BARs of passthru devices Tell the guest whether a BAR uses prefetched memory or not for passthru devices by using the same lobits as the physical device. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32685 ---- bhyve: do not explicitly map fbuf framebuffer Allocating a BAR will call baraddr which maps the framebuffer. No need to allocate it explicitly on init. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32596 ---- bhyve: move 64 bit BAR location to match OVMF assumptions OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover 64 bit addresses of BARs. OVMF assumes that 64 bit addresses of BARS are located on next 32 GB boundary behind Top of High RAM. This patch moves 64 bit BARs on next 32 GB boundary behind Top of High RAM to match OVMF assumptions. Differential Revision: https://reviews.freebsd.org/D27970 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: use a fixed 32 bit BAR base address OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space. For that reason, we should use that address too. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31051 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: keep physical and virtual COMMAND reg in sync On startup all virtual BARs are registered. Additionally, the encoding bit in the virtual cmd register is set. After that, the passthru emulation overwrites the virtual cmd register with the physical one. This could lead to a mismatch between registered BARs and the encoding bits in the cmd register. Instead of writing the physical to the virtual cmd register, write the virtual to the physical cmd register to solve this issue. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32687 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: emulate reads of MSI-X capabilities for passthru devices Reads of the MSI-X capabilites aren't emulated by passthru devices yet. The guest will read the host MSI-X capabilites which could cause issues. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32686 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix compile We need err.h Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address") Sponsored by: Bechoff Automation GmbH & Co. KG ---- bhyve blockif: fix blockif_candelete with Capsicum NVMe conformance tests for the Format command failed if the backing-storage for the bhyve device was a file instead of a Zvol. The tests (and the specification) expect a Format to destroy all previously written data. The bhyve NVMe emulation implements this by trimming / deallocating all data from the backing-storage. The blockif_candelete() function indicated the file did not support deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE) even though the kernel supported file hole punching. This occurs on builds with Capsicum enabled because blockif did not allow the fpathconf(2) right. Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call. PR: 260081 Reviewed by: allanjude, markj, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33203 ---- bhyve: fix -Wunused-but-set-variable warning Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33306 ---- bhyve: Support a _VARS.fd file for bootrom OVMF creates two separate .fd files, a _CODE.fd file containing the UEFI code, and a _VARS.fd file containing a template of an empty UEFI variable store. OVMF decides to write variables to the memory range just below the boot rom code if it detects a CFI flash device. So here we add just the barest facsimile of CFI command handling to bootrom.c that is needed to placate OVMF. Submitted by: D Scott Phillips <d.scott.phillips@intel.com> Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19976 MFC After: 1 week ---- bhyve: set EV_CLEAR for EVFILT_VNODE mevents When an EVFILT_VNODE filter event is triggered, reset it. This fixes the issue where a virtio-blk resize event would cause the mevent thread to consume 100% of the cpu. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33326 ---- bhyve nvme: Add AEN support to NVMe emulation Add Asynchronous Event Notification infrastructure to the NVMe emulation. Reviewed by: imp, grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32952 ---- bhyve nvme: Inform guests of namespace resize Register a "block resize" callback to be notified of changes to the backing storage for the Namespace. Use this to generate an Asynchronous Event Notification, Namespace Attributes Changed when the guest OS provides an Asynchronous Event Request. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32953 ---- bhyve: Only snapshot initialized VirtIO queues If the virtio device is not fully initialized, then suspend fails with: vi_pci_snapshot_queues: invalid address: vq->vq_desc Failed to snapshot virtio-rnd; ret=14 MFC after: 1 week Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D26268 ---- bhyve: passthru: enable BARs before possibly mmap(2)ing them The first time we start bhyve with a passthru device everything is fine as on boot we do enable BARs. If a driver (unload) inside bhyve disables the BAR(s) as some Linux drivers do, we need to make sure we re-enable them on next bhyve start. If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP) the kernel will give us an EBUSY. While we were re-enabling the BAR(s) in the current code loop cfginit() was writing the changes out too late to the real hardware. Move the call to init_msix_table() after the register on the real hardware was updated. That way the kernel will be happy and the mmap will succeed and bhyve will start. Also simplify the code given the last argument to init_msix_table() is unused we do not need to do checks for each bar. [1] MFC after: 3 days PR: 260148 Pointed out by: markj [1] Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33628 ---- bhyve: clean up trailing whitespaces Clean up trailing whitespaces. No functional changes. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33681 ---- bhyve smbios type 3 structure is incorrect If you look at the SMBIOS specification, we'll find something is missing. In particular at offset 0Dh is supposed to be the OEM-defined field. This should go between security and height. It is not legal to actually skip this and will lead to other folks not properly interpreting later parts of the table. https://www.illumos.org/issues/14312 Reviewed by: jhb Submitted by: Robert Mustacchi <rm@fingolfin.org> Obtained from: ilumos MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33682 ---- bhyve: only init MSI-X table if passthru device supports it Some passthru devices only support MSI instead of MSI-X. For those devices the initialization of MSI-X table will fail. Re-add the check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9. MFC after: 3 days X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9 PR: 260148 Reviewed by: manu, bz Differential Revision: https://reviews.freebsd.org/D33728 ---- bhyve: enumerate BARs by size E.g. Framebuffers can require large space and BARs need to be aligned by their size. If BARs aren't allocated by size, it'll cause much fragmentation of the MMIO space. Reduce fragmentation by ordering the BAR allocation on their size to reduce the risk of OUT_OF_MMIO_SPACE issues. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D28278 ---- bhyve: allow reading of fwctl signature multiple times At the moment, you only have one single chance to read the fwctl signature. At boot bhyve is in the state IDENT_WAIT. It's then possible to switch to IDENT_SEND. After bhyve sends the signature, it switches to REQ. From now on it's impossible to switch back to IDENT_SEND to read the signature. For that reason, only a single driver can read the signature. A guest can't use two drivers to identify that fwctl is present. It gets even worse when using OVMF. OVMF uses a library to access fwctl. Therefore, every single OVMF driver would try to read the signature. Currently, only a single OVMF driver accesses the fwctl. So, there's no issue with it yet. However, no OS driver would have a chance to detect fwctl when using OVMF because it's signature was already consumed by OVMF. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D31981 ---- bhyve: add more slop to 64 bit BARs Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables based on this allocation. If the guest decides to relocate those BARs above 4 GB, it could lead to mismatching ACPI tables. Especially when using OVMF with enabled bus enumeration it could cause issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS may be unable to recover from this situation and disables some PCI devices because their BARs are located outside of the MMIO space reported by ACPI. Avoid this situation by giving the guest more space for relocating BARs. Let's be paranoid. The available space for BARs below 4 GB is 512 MB large. Use a slop of 512 MB. It'll allow the guest to relocate all BARs below 4 GB to an address above 4 GB. We could run into issues when we exceeding the memlimit above 4 GB. However, this space has a size of 32 GB. Even when using many PCI device with large BARs like framebuffer or when using multiple PCI busses, it's very unlikely that we run out of space due to the large slop. Additionally, this situation will occur on startup and not at runtime which is much better. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33118 ---- bhyve: dynamically register FwCtl ports Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's FwCtl. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33496 ---- bhyve: Map the right BAR in init_msix_table() The PBA and MSI-X table can reside in different BARs. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: Correct unmapping of the MSI-X table BAR The starting address passed to mprotect was wrong, so in the case where the last page containing the table is not the last page of the BAR, the wrong region would be unmapped. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: add nvlist functions for setting unset nodes If an emulation uses those functions instead of set_config_value_node or set_config_value, it allows the config values to get overwritten. Introducing new functions is much more readable than if else statements in the emulation code. Reviewed by: khng MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33770 ---- bhyve: get mediasize for character devices when resizing virtio-blk Reviewed by: imp, allanjude, jhb Differential Revision: https://reviews.freebsd.org/D33403 ---- bhyve/snapshot: fix pthread_create() error check pthread_create() returns 0 on success or an error number on failure. Reviewed by: khng, markj Differential Revision: https://reviews.freebsd.org/D33930 ---- Append Keyboard Layout specified option for using VNC. Part two: Append bhyve -K option for specified keyboard layout with layout setting files every languages. Since the cmd option '-k' was used in the meantime it was changed to '-K' PR: 246121 Submitted by: koinec@yahoo.co.jp Reviewed by: grehan@ Differential Revision: https://reviews.freebsd.org/D29473 MFC after: 4 weeks ---- bhyve: ahci: Fix regression with no ports An AHCI controller may be specified with no connected ports. Avoid dumping core in this case for compatibility with existing VM configs. Reviewed by: khng, jhb Fixes: 621b5090487de Refactor configuration management in bhyve. MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D33969 ---- bhyve/block_if: allow DIOCGMEDIASIZE ioctl This is needed to get mediasize of the device after a resize event. I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which disables capsicum. Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...") Differential Revision: https://reviews.freebsd.org/D34013 ---- pkgbase: bhyve: Tag the kbdlayout file to be in the bhyve package ---- bhyve nvme: Advertise v1.4 support Bump advertised NVMe support from v1.3 to v1.4 Reviewed by: allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33564 ---- bhyve nvme: Fix NVM Format completion status The NVM Format command is unique among the Admin commands in that it needs to finish asynchronously. For this reason, the emulation code invented a synthetic completion status (NVME_NO_STATUS) to indicate that the command was still in progress and the command processing loop should not generate a completion message. The implementation used the value 0xffff for the synthetic value as this set both the Status Code and Status Code Type fields to reserved values. Format initialized the completion status to this value and expected error cases to override it with a status code/type appropriate to the situation. The macros used to set the NVMe status are careful not to modify bit 0 (i.e. the phase bit), which with the synthetic completion status, causes the phase bit to get out of sync. When running tests in a guest with illegal NVM Format commands, Admin commands would eventually hang because it appeared there were no completions due to the incorrect phase bit value. Fix is to only set NVME_NO_STATUS if the blockif delete command succeeds. While in the neighborhood, add a missing break statement when NVM Format is not supported. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33565 ---- bhyve nvme: Fix Namespace Specific Set Features Return an error if the feature specified in Set Features is Namespace specific but the Namespace ID uses the Global Namespace tag. Fixes UNH Test 1.2.7 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33566 ---- bhyve nvme: Implement Log Page Offset Modify the Get Log Page command to parse the Log Page Offset fields to support more recent versions of the NVMe specification. Fixes various tests for UNH Test 1.3.* Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33568 ---- bhyve nvme: Add missing Admin opcodes Don't treat unsupported Admin commands as Invalid Opcode. Instead return the proper Invalid Field in Command. Fixes UNH IOL test 1.17.2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33569 ---- bhyve nvme: Remove redundant AER Limit checks The NVMe emulation checked if the Asynchronous Event Request Limit (a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function is only called from nvme_opc_async_event_req() which also checks for exceeding the AERL. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33570 ---- bhyve nvme: Fix Set Features Be more conservative and only support the Features mandatory for an I/O Controller. Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency Mode Configuration and Host Behavior Support features. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33571 ---- bhyve nvme: Add Temperature Threshold support This adds the ability for a guest OS to send Set / Get Feature, Temperature Threshold commands. The implementation assumes a constant temperature and will generate an Asynchronous Event Notification if the specified threshold is above/below this value. Although the specification allows 9 temperature values, this implementation only implements the Composite Temperature. While in the neighborhood, move the clear of the CSTS register in the reset function after all other cleanup. This avoids a race with the guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe emulation is actually complete with the reset. Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33572 ---- bhyve nvme: Update v1.4 Identify Controller data Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE). Also, do not advertise secure erase functionality in the Format NVM Attributes field of the Identify Controller data structure as the Controller does not implement secure erase. Fixes UNH ILO Test 1.1, Case 2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33573 ---- bhyve nvme: Add Select support to Get Features Implement basic support for the SEL field of Get Features. This returns information about Namespace Specific features. Fixes UNH ILO 16.0 Test 1.2, Case 13 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33574 ---- bhyve nvme: Fix LBA out-of-range calculation The function which checks for a valid LBA range mistakenly named an input value as NLB ("Number of Logical Blocks") instead of "number of blocks". The NVMe specification defines NLB as a zero-based value (i.e. NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed parameter is a 1's-based value. Fix is to rename the variable to avoid future confusion. While in the neighborhood, also check that the starting LBA is less than the size of the backing storage to avoid an integer overflow. Reviewed by: imp, allanjude, jhb Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33575 ---- bhyve nvme: Fix reported VWC value v1.4 and later NVMe Controllers report "Flush all Namespaces" support differently. Fixes UNH IOL 16.0 Test 2.6, Case 3 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33576 ---- bhyve nvme: Fix Set Features, AEN NVMe Controllers which do not support Endurance Groups must return an error when the Endurance Group Event Aggregate Log Change Notices bit is set in Set Features, Asynchronous Event Configuration. Fixes UNH IOL Test 3.12, Case 8 Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33577 ---- bhyve nvme: Fix Identify Namespace, NSID=ffffffff If the NVMe Controller doesn't support Namespace Management, it should return "Invalid Namespace or Format" when the Host request Identify Namespace with the global NSID value. Fixes UNH IOL 16.0 Test 9.1, Case 6 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33578 ---- bhyve/virtio: use correct device id for virtio-scsi Section 4.1.2.1 of the virtio spec states that the transitional PCI device id for a scsi device is 0x1004. Fix suggested by reporter. PR: 259961 Reported by: me@nanaya.pro Reviewed by: imp, jhb Fixes: f9c005a17f4e ("Add bhyve virtio-scsi storage backend support.") Differential Revision: https://reviews.freebsd.org/D34103 ---- Create VM_MEMATTR_DEVICE on all architectures This is intended to be used with memory mapped IO, e.g. from bus_space_map with no flags, or pmap_mapdev. Use this new memory type in the map request configured by resource_init_map_request, and in pciconf. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29692 ---- Use if ... else when printing memory attributes In vmstat there is a switch statement that converts these attributes to a string. As some values can be duplicate we have to hide these from userspace. Replace this switch statement with an if ... else macro that lets us repeat values without a compiler error. Reviewed by: kib MFC after: 2 weeks Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D29703 ---- Remove an always-true check. This fixes a -Wtype-limits error from GCC 9. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D31936 ---- vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559 ---- vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448 ---- bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333
Includes all commits up to 2022/02/06 or d21e71efce39. ---- libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b5090487d Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: https://github.com/freebsd/freebsd-src/pull/463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b5090487d MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e3ca895a14 ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada297e897 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307 ---- bhyve: Nuke double-semicolons A distinct number of double-semicolons ended up in bhyve. Take a pass at getting rid of many of these harmless typos. MFC after: 3 days ---- bhyve: Fix pci device node key in bhyve_config.5 PCI device node key in the manual page is wrong. It should be pci.bus.slot.function. MFC after: 3 days ---- bhyve: Support setting the disk serial number for VirtIO block devices. Reviewed by: allanjude Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D31983 ---- bhyve: Update the -G description in the SYNPOSIS. It was missing both the 'w' flag and 'bind_address'. ---- bhyve_config.5: Document gdb.address. ---- bhyve: Add an empty case for event types in mevent_kq_fflags(). This fixes a -Wswitch error raised by GCC 9. Differential Revision: https://reviews.freebsd.org/D31938 ---- bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. Reviewed by: bz, grehan, jhb MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32359 ---- bhyve.8: Fix markup of the -G flag ---- bhyve: Update usage and synopsis for the -k flag Let's make it clear to users that -k is for configuration files. Also, point to bhyve_config(5) in the paragraph describing the flag. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32467 ---- bhyve: ignore low bits of CFGADR Bhyve could emulate wrong PCI registers. In the best case, the guest reads wrong registers and the device driver would report some errors. In the worst case, the guest writes to wrong PCI registers and could brick hardware when using PCI passthrough. According to Intels specification, low bits of CFGADR should be ignored. Some OS like linux may rely on it. Otherwise, bhyve could emulate a wrong PCI register. E.g. If linux would like to read 2 bytes from offset 0x02, following would happen. linux: outl 0x80000002 at CFGADR inw at CFGDAT + 2 bhyve: cfgoff = 0x80000002 & 0xFF = 0x02 coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04 Bhyve would emulate the register at offset 0x04 not 0x02. Reviewed By: #bhyve, grehan Differential Revision: https://reviews.freebsd.org/D31819 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix the WITH_BHYVE_SNAPSHOT build Note, this breaks compatibility with snapshots generated by older builds of bhyve(8). Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") Reported by: Greg V <greg@unrelenting.technology> Reviewed by: grehan, bz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32523 ---- bhyve: Bump the SMBIOS firmware version to 14.0 for 14-CURRENT Bump the firmware version to 14.0 and set the firmware release date to today. Reviewed by: jhb, bz, imp Differential Revision: https://reviews.freebsd.org/D32534 ---- bhyve: use physical lobits for BARs of passthru devices Tell the guest whether a BAR uses prefetched memory or not for passthru devices by using the same lobits as the physical device. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32685 ---- bhyve: do not explicitly map fbuf framebuffer Allocating a BAR will call baraddr which maps the framebuffer. No need to allocate it explicitly on init. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32596 ---- bhyve: move 64 bit BAR location to match OVMF assumptions OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover 64 bit addresses of BARs. OVMF assumes that 64 bit addresses of BARS are located on next 32 GB boundary behind Top of High RAM. This patch moves 64 bit BARs on next 32 GB boundary behind Top of High RAM to match OVMF assumptions. Differential Revision: https://reviews.freebsd.org/D27970 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: use a fixed 32 bit BAR base address OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space. For that reason, we should use that address too. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31051 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: keep physical and virtual COMMAND reg in sync On startup all virtual BARs are registered. Additionally, the encoding bit in the virtual cmd register is set. After that, the passthru emulation overwrites the virtual cmd register with the physical one. This could lead to a mismatch between registered BARs and the encoding bits in the cmd register. Instead of writing the physical to the virtual cmd register, write the virtual to the physical cmd register to solve this issue. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32687 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: emulate reads of MSI-X capabilities for passthru devices Reads of the MSI-X capabilites aren't emulated by passthru devices yet. The guest will read the host MSI-X capabilites which could cause issues. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32686 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix compile We need err.h Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address") Sponsored by: Bechoff Automation GmbH & Co. KG ---- bhyve blockif: fix blockif_candelete with Capsicum NVMe conformance tests for the Format command failed if the backing-storage for the bhyve device was a file instead of a Zvol. The tests (and the specification) expect a Format to destroy all previously written data. The bhyve NVMe emulation implements this by trimming / deallocating all data from the backing-storage. The blockif_candelete() function indicated the file did not support deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE) even though the kernel supported file hole punching. This occurs on builds with Capsicum enabled because blockif did not allow the fpathconf(2) right. Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call. PR: 260081 Reviewed by: allanjude, markj, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33203 ---- bhyve: fix -Wunused-but-set-variable warning Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33306 ---- bhyve: Support a _VARS.fd file for bootrom OVMF creates two separate .fd files, a _CODE.fd file containing the UEFI code, and a _VARS.fd file containing a template of an empty UEFI variable store. OVMF decides to write variables to the memory range just below the boot rom code if it detects a CFI flash device. So here we add just the barest facsimile of CFI command handling to bootrom.c that is needed to placate OVMF. Submitted by: D Scott Phillips <d.scott.phillips@intel.com> Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19976 MFC After: 1 week ---- bhyve: set EV_CLEAR for EVFILT_VNODE mevents When an EVFILT_VNODE filter event is triggered, reset it. This fixes the issue where a virtio-blk resize event would cause the mevent thread to consume 100% of the cpu. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33326 ---- bhyve nvme: Add AEN support to NVMe emulation Add Asynchronous Event Notification infrastructure to the NVMe emulation. Reviewed by: imp, grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32952 ---- bhyve nvme: Inform guests of namespace resize Register a "block resize" callback to be notified of changes to the backing storage for the Namespace. Use this to generate an Asynchronous Event Notification, Namespace Attributes Changed when the guest OS provides an Asynchronous Event Request. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32953 ---- bhyve: Only snapshot initialized VirtIO queues If the virtio device is not fully initialized, then suspend fails with: vi_pci_snapshot_queues: invalid address: vq->vq_desc Failed to snapshot virtio-rnd; ret=14 MFC after: 1 week Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D26268 ---- bhyve: passthru: enable BARs before possibly mmap(2)ing them The first time we start bhyve with a passthru device everything is fine as on boot we do enable BARs. If a driver (unload) inside bhyve disables the BAR(s) as some Linux drivers do, we need to make sure we re-enable them on next bhyve start. If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP) the kernel will give us an EBUSY. While we were re-enabling the BAR(s) in the current code loop cfginit() was writing the changes out too late to the real hardware. Move the call to init_msix_table() after the register on the real hardware was updated. That way the kernel will be happy and the mmap will succeed and bhyve will start. Also simplify the code given the last argument to init_msix_table() is unused we do not need to do checks for each bar. [1] MFC after: 3 days PR: 260148 Pointed out by: markj [1] Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33628 ---- bhyve: clean up trailing whitespaces Clean up trailing whitespaces. No functional changes. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33681 ---- bhyve smbios type 3 structure is incorrect If you look at the SMBIOS specification, we'll find something is missing. In particular at offset 0Dh is supposed to be the OEM-defined field. This should go between security and height. It is not legal to actually skip this and will lead to other folks not properly interpreting later parts of the table. https://www.illumos.org/issues/14312 Reviewed by: jhb Submitted by: Robert Mustacchi <rm@fingolfin.org> Obtained from: ilumos MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33682 ---- bhyve: only init MSI-X table if passthru device supports it Some passthru devices only support MSI instead of MSI-X. For those devices the initialization of MSI-X table will fail. Re-add the check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9. MFC after: 3 days X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9 PR: 260148 Reviewed by: manu, bz Differential Revision: https://reviews.freebsd.org/D33728 ---- bhyve: enumerate BARs by size E.g. Framebuffers can require large space and BARs need to be aligned by their size. If BARs aren't allocated by size, it'll cause much fragmentation of the MMIO space. Reduce fragmentation by ordering the BAR allocation on their size to reduce the risk of OUT_OF_MMIO_SPACE issues. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D28278 ---- bhyve: allow reading of fwctl signature multiple times At the moment, you only have one single chance to read the fwctl signature. At boot bhyve is in the state IDENT_WAIT. It's then possible to switch to IDENT_SEND. After bhyve sends the signature, it switches to REQ. From now on it's impossible to switch back to IDENT_SEND to read the signature. For that reason, only a single driver can read the signature. A guest can't use two drivers to identify that fwctl is present. It gets even worse when using OVMF. OVMF uses a library to access fwctl. Therefore, every single OVMF driver would try to read the signature. Currently, only a single OVMF driver accesses the fwctl. So, there's no issue with it yet. However, no OS driver would have a chance to detect fwctl when using OVMF because it's signature was already consumed by OVMF. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D31981 ---- bhyve: add more slop to 64 bit BARs Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables based on this allocation. If the guest decides to relocate those BARs above 4 GB, it could lead to mismatching ACPI tables. Especially when using OVMF with enabled bus enumeration it could cause issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS may be unable to recover from this situation and disables some PCI devices because their BARs are located outside of the MMIO space reported by ACPI. Avoid this situation by giving the guest more space for relocating BARs. Let's be paranoid. The available space for BARs below 4 GB is 512 MB large. Use a slop of 512 MB. It'll allow the guest to relocate all BARs below 4 GB to an address above 4 GB. We could run into issues when we exceeding the memlimit above 4 GB. However, this space has a size of 32 GB. Even when using many PCI device with large BARs like framebuffer or when using multiple PCI busses, it's very unlikely that we run out of space due to the large slop. Additionally, this situation will occur on startup and not at runtime which is much better. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33118 ---- bhyve: dynamically register FwCtl ports Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's FwCtl. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33496 ---- bhyve: Map the right BAR in init_msix_table() The PBA and MSI-X table can reside in different BARs. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: Correct unmapping of the MSI-X table BAR The starting address passed to mprotect was wrong, so in the case where the last page containing the table is not the last page of the BAR, the wrong region would be unmapped. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: add nvlist functions for setting unset nodes If an emulation uses those functions instead of set_config_value_node or set_config_value, it allows the config values to get overwritten. Introducing new functions is much more readable than if else statements in the emulation code. Reviewed by: khng MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33770 ---- bhyve: get mediasize for character devices when resizing virtio-blk Reviewed by: imp, allanjude, jhb Differential Revision: https://reviews.freebsd.org/D33403 ---- bhyve/snapshot: fix pthread_create() error check pthread_create() returns 0 on success or an error number on failure. Reviewed by: khng, markj Differential Revision: https://reviews.freebsd.org/D33930 ---- Append Keyboard Layout specified option for using VNC. Part two: Append bhyve -K option for specified keyboard layout with layout setting files every languages. Since the cmd option '-k' was used in the meantime it was changed to '-K' PR: 246121 Submitted by: koinec@yahoo.co.jp Reviewed by: grehan@ Differential Revision: https://reviews.freebsd.org/D29473 MFC after: 4 weeks ---- bhyve: ahci: Fix regression with no ports An AHCI controller may be specified with no connected ports. Avoid dumping core in this case for compatibility with existing VM configs. Reviewed by: khng, jhb Fixes: 621b5090487de Refactor configuration management in bhyve. MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D33969 ---- bhyve/block_if: allow DIOCGMEDIASIZE ioctl This is needed to get mediasize of the device after a resize event. I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which disables capsicum. Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...") Differential Revision: https://reviews.freebsd.org/D34013 ---- pkgbase: bhyve: Tag the kbdlayout file to be in the bhyve package ---- bhyve nvme: Advertise v1.4 support Bump advertised NVMe support from v1.3 to v1.4 Reviewed by: allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33564 ---- bhyve nvme: Fix NVM Format completion status The NVM Format command is unique among the Admin commands in that it needs to finish asynchronously. For this reason, the emulation code invented a synthetic completion status (NVME_NO_STATUS) to indicate that the command was still in progress and the command processing loop should not generate a completion message. The implementation used the value 0xffff for the synthetic value as this set both the Status Code and Status Code Type fields to reserved values. Format initialized the completion status to this value and expected error cases to override it with a status code/type appropriate to the situation. The macros used to set the NVMe status are careful not to modify bit 0 (i.e. the phase bit), which with the synthetic completion status, causes the phase bit to get out of sync. When running tests in a guest with illegal NVM Format commands, Admin commands would eventually hang because it appeared there were no completions due to the incorrect phase bit value. Fix is to only set NVME_NO_STATUS if the blockif delete command succeeds. While in the neighborhood, add a missing break statement when NVM Format is not supported. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33565 ---- bhyve nvme: Fix Namespace Specific Set Features Return an error if the feature specified in Set Features is Namespace specific but the Namespace ID uses the Global Namespace tag. Fixes UNH Test 1.2.7 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33566 ---- bhyve nvme: Implement Log Page Offset Modify the Get Log Page command to parse the Log Page Offset fields to support more recent versions of the NVMe specification. Fixes various tests for UNH Test 1.3.* Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33568 ---- bhyve nvme: Add missing Admin opcodes Don't treat unsupported Admin commands as Invalid Opcode. Instead return the proper Invalid Field in Command. Fixes UNH IOL test 1.17.2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33569 ---- bhyve nvme: Remove redundant AER Limit checks The NVMe emulation checked if the Asynchronous Event Request Limit (a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function is only called from nvme_opc_async_event_req() which also checks for exceeding the AERL. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33570 ---- bhyve nvme: Fix Set Features Be more conservative and only support the Features mandatory for an I/O Controller. Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency Mode Configuration and Host Behavior Support features. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33571 ---- bhyve nvme: Add Temperature Threshold support This adds the ability for a guest OS to send Set / Get Feature, Temperature Threshold commands. The implementation assumes a constant temperature and will generate an Asynchronous Event Notification if the specified threshold is above/below this value. Although the specification allows 9 temperature values, this implementation only implements the Composite Temperature. While in the neighborhood, move the clear of the CSTS register in the reset function after all other cleanup. This avoids a race with the guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe emulation is actually complete with the reset. Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33572 ---- bhyve nvme: Update v1.4 Identify Controller data Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE). Also, do not advertise secure erase functionality in the Format NVM Attributes field of the Identify Controller data structure as the Controller does not implement secure erase. Fixes UNH ILO Test 1.1, Case 2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33573 ---- bhyve nvme: Add Select support to Get Features Implement basic support for the SEL field of Get Features. This returns information about Namespace Specific features. Fixes UNH ILO 16.0 Test 1.2, Case 13 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33574 ---- bhyve nvme: Fix LBA out-of-range calculation The function which checks for a valid LBA range mistakenly named an input value as NLB ("Number of Logical Blocks") instead of "number of blocks". The NVMe specification defines NLB as a zero-based value (i.e. NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed parameter is a 1's-based value. Fix is to rename the variable to avoid future confusion. While in the neighborhood, also check that the starting LBA is less than the size of the backing storage to avoid an integer overflow. Reviewed by: imp, allanjude, jhb Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33575 ---- bhyve nvme: Fix reported VWC value v1.4 and later NVMe Controllers report "Flush all Namespaces" support differently. Fixes UNH IOL 16.0 Test 2.6, Case 3 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33576 ---- bhyve nvme: Fix Set Features, AEN NVMe Controllers which do not support Endurance Groups must return an error when the Endurance Group Event Aggregate Log Change Notices bit is set in Set Features, Asynchronous Event Configuration. Fixes UNH IOL Test 3.12, Case 8 Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33577 ---- bhyve nvme: Fix Identify Namespace, NSID=ffffffff If the NVMe Controller doesn't support Namespace Management, it should return "Invalid Namespace or Format" when the Host request Identify Namespace with the global NSID value. Fixes UNH IOL 16.0 Test 9.1, Case 6 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33578 ---- bhyve/virtio: use correct device id for virtio-scsi Section 4.1.2.1 of the virtio spec states that the transitional PCI device id for a scsi device is 0x1004. Fix suggested by reporter. PR: 259961 Reported by: me@nanaya.pro Reviewed by: imp, jhb Fixes: f9c005a17f4e ("Add bhyve virtio-scsi storage backend support.") Differential Revision: https://reviews.freebsd.org/D34103 ---- Create VM_MEMATTR_DEVICE on all architectures This is intended to be used with memory mapped IO, e.g. from bus_space_map with no flags, or pmap_mapdev. Use this new memory type in the map request configured by resource_init_map_request, and in pciconf. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29692 ---- Use if ... else when printing memory attributes In vmstat there is a switch statement that converts these attributes to a string. As some values can be duplicate we have to hide these from userspace. Replace this switch statement with an if ... else macro that lets us repeat values without a compiler error. Reviewed by: kib MFC after: 2 weeks Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D29703 ---- Remove an always-true check. This fixes a -Wtype-limits error from GCC 9. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D31936 ---- vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559 ---- vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448 ---- bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333
Includes all commits up to 2022/02/06 or d21e71efce39. ---- libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b5090487d Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: https://github.com/freebsd/freebsd-src/pull/463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b5090487d MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e3ca895a14 ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada297e897 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307 ---- bhyve: Nuke double-semicolons A distinct number of double-semicolons ended up in bhyve. Take a pass at getting rid of many of these harmless typos. MFC after: 3 days ---- bhyve: Fix pci device node key in bhyve_config.5 PCI device node key in the manual page is wrong. It should be pci.bus.slot.function. MFC after: 3 days ---- bhyve: Support setting the disk serial number for VirtIO block devices. Reviewed by: allanjude Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D31983 ---- bhyve: Update the -G description in the SYNPOSIS. It was missing both the 'w' flag and 'bind_address'. ---- bhyve_config.5: Document gdb.address. ---- bhyve: Add an empty case for event types in mevent_kq_fflags(). This fixes a -Wswitch error raised by GCC 9. Differential Revision: https://reviews.freebsd.org/D31938 ---- bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. Reviewed by: bz, grehan, jhb MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32359 ---- bhyve.8: Fix markup of the -G flag ---- bhyve: Update usage and synopsis for the -k flag Let's make it clear to users that -k is for configuration files. Also, point to bhyve_config(5) in the paragraph describing the flag. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32467 ---- bhyve: ignore low bits of CFGADR Bhyve could emulate wrong PCI registers. In the best case, the guest reads wrong registers and the device driver would report some errors. In the worst case, the guest writes to wrong PCI registers and could brick hardware when using PCI passthrough. According to Intels specification, low bits of CFGADR should be ignored. Some OS like linux may rely on it. Otherwise, bhyve could emulate a wrong PCI register. E.g. If linux would like to read 2 bytes from offset 0x02, following would happen. linux: outl 0x80000002 at CFGADR inw at CFGDAT + 2 bhyve: cfgoff = 0x80000002 & 0xFF = 0x02 coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04 Bhyve would emulate the register at offset 0x04 not 0x02. Reviewed By: #bhyve, grehan Differential Revision: https://reviews.freebsd.org/D31819 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix the WITH_BHYVE_SNAPSHOT build Note, this breaks compatibility with snapshots generated by older builds of bhyve(8). Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") Reported by: Greg V <greg@unrelenting.technology> Reviewed by: grehan, bz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32523 ---- bhyve: Bump the SMBIOS firmware version to 14.0 for 14-CURRENT Bump the firmware version to 14.0 and set the firmware release date to today. Reviewed by: jhb, bz, imp Differential Revision: https://reviews.freebsd.org/D32534 ---- bhyve: use physical lobits for BARs of passthru devices Tell the guest whether a BAR uses prefetched memory or not for passthru devices by using the same lobits as the physical device. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32685 ---- bhyve: do not explicitly map fbuf framebuffer Allocating a BAR will call baraddr which maps the framebuffer. No need to allocate it explicitly on init. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32596 ---- bhyve: move 64 bit BAR location to match OVMF assumptions OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover 64 bit addresses of BARs. OVMF assumes that 64 bit addresses of BARS are located on next 32 GB boundary behind Top of High RAM. This patch moves 64 bit BARs on next 32 GB boundary behind Top of High RAM to match OVMF assumptions. Differential Revision: https://reviews.freebsd.org/D27970 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: use a fixed 32 bit BAR base address OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space. For that reason, we should use that address too. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31051 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: keep physical and virtual COMMAND reg in sync On startup all virtual BARs are registered. Additionally, the encoding bit in the virtual cmd register is set. After that, the passthru emulation overwrites the virtual cmd register with the physical one. This could lead to a mismatch between registered BARs and the encoding bits in the cmd register. Instead of writing the physical to the virtual cmd register, write the virtual to the physical cmd register to solve this issue. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32687 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: emulate reads of MSI-X capabilities for passthru devices Reads of the MSI-X capabilites aren't emulated by passthru devices yet. The guest will read the host MSI-X capabilites which could cause issues. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32686 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix compile We need err.h Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address") Sponsored by: Bechoff Automation GmbH & Co. KG ---- bhyve blockif: fix blockif_candelete with Capsicum NVMe conformance tests for the Format command failed if the backing-storage for the bhyve device was a file instead of a Zvol. The tests (and the specification) expect a Format to destroy all previously written data. The bhyve NVMe emulation implements this by trimming / deallocating all data from the backing-storage. The blockif_candelete() function indicated the file did not support deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE) even though the kernel supported file hole punching. This occurs on builds with Capsicum enabled because blockif did not allow the fpathconf(2) right. Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call. PR: 260081 Reviewed by: allanjude, markj, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33203 ---- bhyve: fix -Wunused-but-set-variable warning Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33306 ---- bhyve: Support a _VARS.fd file for bootrom OVMF creates two separate .fd files, a _CODE.fd file containing the UEFI code, and a _VARS.fd file containing a template of an empty UEFI variable store. OVMF decides to write variables to the memory range just below the boot rom code if it detects a CFI flash device. So here we add just the barest facsimile of CFI command handling to bootrom.c that is needed to placate OVMF. Submitted by: D Scott Phillips <d.scott.phillips@intel.com> Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19976 MFC After: 1 week ---- bhyve: set EV_CLEAR for EVFILT_VNODE mevents When an EVFILT_VNODE filter event is triggered, reset it. This fixes the issue where a virtio-blk resize event would cause the mevent thread to consume 100% of the cpu. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33326 ---- bhyve nvme: Add AEN support to NVMe emulation Add Asynchronous Event Notification infrastructure to the NVMe emulation. Reviewed by: imp, grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32952 ---- bhyve nvme: Inform guests of namespace resize Register a "block resize" callback to be notified of changes to the backing storage for the Namespace. Use this to generate an Asynchronous Event Notification, Namespace Attributes Changed when the guest OS provides an Asynchronous Event Request. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32953 ---- bhyve: Only snapshot initialized VirtIO queues If the virtio device is not fully initialized, then suspend fails with: vi_pci_snapshot_queues: invalid address: vq->vq_desc Failed to snapshot virtio-rnd; ret=14 MFC after: 1 week Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D26268 ---- bhyve: passthru: enable BARs before possibly mmap(2)ing them The first time we start bhyve with a passthru device everything is fine as on boot we do enable BARs. If a driver (unload) inside bhyve disables the BAR(s) as some Linux drivers do, we need to make sure we re-enable them on next bhyve start. If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP) the kernel will give us an EBUSY. While we were re-enabling the BAR(s) in the current code loop cfginit() was writing the changes out too late to the real hardware. Move the call to init_msix_table() after the register on the real hardware was updated. That way the kernel will be happy and the mmap will succeed and bhyve will start. Also simplify the code given the last argument to init_msix_table() is unused we do not need to do checks for each bar. [1] MFC after: 3 days PR: 260148 Pointed out by: markj [1] Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33628 ---- bhyve: clean up trailing whitespaces Clean up trailing whitespaces. No functional changes. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33681 ---- bhyve smbios type 3 structure is incorrect If you look at the SMBIOS specification, we'll find something is missing. In particular at offset 0Dh is supposed to be the OEM-defined field. This should go between security and height. It is not legal to actually skip this and will lead to other folks not properly interpreting later parts of the table. https://www.illumos.org/issues/14312 Reviewed by: jhb Submitted by: Robert Mustacchi <rm@fingolfin.org> Obtained from: ilumos MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33682 ---- bhyve: only init MSI-X table if passthru device supports it Some passthru devices only support MSI instead of MSI-X. For those devices the initialization of MSI-X table will fail. Re-add the check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9. MFC after: 3 days X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9 PR: 260148 Reviewed by: manu, bz Differential Revision: https://reviews.freebsd.org/D33728 ---- bhyve: enumerate BARs by size E.g. Framebuffers can require large space and BARs need to be aligned by their size. If BARs aren't allocated by size, it'll cause much fragmentation of the MMIO space. Reduce fragmentation by ordering the BAR allocation on their size to reduce the risk of OUT_OF_MMIO_SPACE issues. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D28278 ---- bhyve: allow reading of fwctl signature multiple times At the moment, you only have one single chance to read the fwctl signature. At boot bhyve is in the state IDENT_WAIT. It's then possible to switch to IDENT_SEND. After bhyve sends the signature, it switches to REQ. From now on it's impossible to switch back to IDENT_SEND to read the signature. For that reason, only a single driver can read the signature. A guest can't use two drivers to identify that fwctl is present. It gets even worse when using OVMF. OVMF uses a library to access fwctl. Therefore, every single OVMF driver would try to read the signature. Currently, only a single OVMF driver accesses the fwctl. So, there's no issue with it yet. However, no OS driver would have a chance to detect fwctl when using OVMF because it's signature was already consumed by OVMF. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D31981 ---- bhyve: add more slop to 64 bit BARs Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables based on this allocation. If the guest decides to relocate those BARs above 4 GB, it could lead to mismatching ACPI tables. Especially when using OVMF with enabled bus enumeration it could cause issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS may be unable to recover from this situation and disables some PCI devices because their BARs are located outside of the MMIO space reported by ACPI. Avoid this situation by giving the guest more space for relocating BARs. Let's be paranoid. The available space for BARs below 4 GB is 512 MB large. Use a slop of 512 MB. It'll allow the guest to relocate all BARs below 4 GB to an address above 4 GB. We could run into issues when we exceeding the memlimit above 4 GB. However, this space has a size of 32 GB. Even when using many PCI device with large BARs like framebuffer or when using multiple PCI busses, it's very unlikely that we run out of space due to the large slop. Additionally, this situation will occur on startup and not at runtime which is much better. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33118 ---- bhyve: dynamically register FwCtl ports Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's FwCtl. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33496 ---- bhyve: Map the right BAR in init_msix_table() The PBA and MSI-X table can reside in different BARs. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: Correct unmapping of the MSI-X table BAR The starting address passed to mprotect was wrong, so in the case where the last page containing the table is not the last page of the BAR, the wrong region would be unmapped. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: add nvlist functions for setting unset nodes If an emulation uses those functions instead of set_config_value_node or set_config_value, it allows the config values to get overwritten. Introducing new functions is much more readable than if else statements in the emulation code. Reviewed by: khng MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33770 ---- bhyve: get mediasize for character devices when resizing virtio-blk Reviewed by: imp, allanjude, jhb Differential Revision: https://reviews.freebsd.org/D33403 ---- bhyve/snapshot: fix pthread_create() error check pthread_create() returns 0 on success or an error number on failure. Reviewed by: khng, markj Differential Revision: https://reviews.freebsd.org/D33930 ---- Append Keyboard Layout specified option for using VNC. Part two: Append bhyve -K option for specified keyboard layout with layout setting files every languages. Since the cmd option '-k' was used in the meantime it was changed to '-K' PR: 246121 Submitted by: koinec@yahoo.co.jp Reviewed by: grehan@ Differential Revision: https://reviews.freebsd.org/D29473 MFC after: 4 weeks ---- bhyve: ahci: Fix regression with no ports An AHCI controller may be specified with no connected ports. Avoid dumping core in this case for compatibility with existing VM configs. Reviewed by: khng, jhb Fixes: 621b5090487de Refactor configuration management in bhyve. MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D33969 ---- bhyve/block_if: allow DIOCGMEDIASIZE ioctl This is needed to get mediasize of the device after a resize event. I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which disables capsicum. Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...") Differential Revision: https://reviews.freebsd.org/D34013 ---- pkgbase: bhyve: Tag the kbdlayout file to be in the bhyve package ---- bhyve nvme: Advertise v1.4 support Bump advertised NVMe support from v1.3 to v1.4 Reviewed by: allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33564 ---- bhyve nvme: Fix NVM Format completion status The NVM Format command is unique among the Admin commands in that it needs to finish asynchronously. For this reason, the emulation code invented a synthetic completion status (NVME_NO_STATUS) to indicate that the command was still in progress and the command processing loop should not generate a completion message. The implementation used the value 0xffff for the synthetic value as this set both the Status Code and Status Code Type fields to reserved values. Format initialized the completion status to this value and expected error cases to override it with a status code/type appropriate to the situation. The macros used to set the NVMe status are careful not to modify bit 0 (i.e. the phase bit), which with the synthetic completion status, causes the phase bit to get out of sync. When running tests in a guest with illegal NVM Format commands, Admin commands would eventually hang because it appeared there were no completions due to the incorrect phase bit value. Fix is to only set NVME_NO_STATUS if the blockif delete command succeeds. While in the neighborhood, add a missing break statement when NVM Format is not supported. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33565 ---- bhyve nvme: Fix Namespace Specific Set Features Return an error if the feature specified in Set Features is Namespace specific but the Namespace ID uses the Global Namespace tag. Fixes UNH Test 1.2.7 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33566 ---- bhyve nvme: Implement Log Page Offset Modify the Get Log Page command to parse the Log Page Offset fields to support more recent versions of the NVMe specification. Fixes various tests for UNH Test 1.3.* Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33568 ---- bhyve nvme: Add missing Admin opcodes Don't treat unsupported Admin commands as Invalid Opcode. Instead return the proper Invalid Field in Command. Fixes UNH IOL test 1.17.2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33569 ---- bhyve nvme: Remove redundant AER Limit checks The NVMe emulation checked if the Asynchronous Event Request Limit (a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function is only called from nvme_opc_async_event_req() which also checks for exceeding the AERL. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33570 ---- bhyve nvme: Fix Set Features Be more conservative and only support the Features mandatory for an I/O Controller. Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency Mode Configuration and Host Behavior Support features. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33571 ---- bhyve nvme: Add Temperature Threshold support This adds the ability for a guest OS to send Set / Get Feature, Temperature Threshold commands. The implementation assumes a constant temperature and will generate an Asynchronous Event Notification if the specified threshold is above/below this value. Although the specification allows 9 temperature values, this implementation only implements the Composite Temperature. While in the neighborhood, move the clear of the CSTS register in the reset function after all other cleanup. This avoids a race with the guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe emulation is actually complete with the reset. Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33572 ---- bhyve nvme: Update v1.4 Identify Controller data Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE). Also, do not advertise secure erase functionality in the Format NVM Attributes field of the Identify Controller data structure as the Controller does not implement secure erase. Fixes UNH ILO Test 1.1, Case 2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33573 ---- bhyve nvme: Add Select support to Get Features Implement basic support for the SEL field of Get Features. This returns information about Namespace Specific features. Fixes UNH ILO 16.0 Test 1.2, Case 13 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33574 ---- bhyve nvme: Fix LBA out-of-range calculation The function which checks for a valid LBA range mistakenly named an input value as NLB ("Number of Logical Blocks") instead of "number of blocks". The NVMe specification defines NLB as a zero-based value (i.e. NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed parameter is a 1's-based value. Fix is to rename the variable to avoid future confusion. While in the neighborhood, also check that the starting LBA is less than the size of the backing storage to avoid an integer overflow. Reviewed by: imp, allanjude, jhb Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33575 ---- bhyve nvme: Fix reported VWC value v1.4 and later NVMe Controllers report "Flush all Namespaces" support differently. Fixes UNH IOL 16.0 Test 2.6, Case 3 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33576 ---- bhyve nvme: Fix Set Features, AEN NVMe Controllers which do not support Endurance Groups must return an error when the Endurance Group Event Aggregate Log Change Notices bit is set in Set Features, Asynchronous Event Configuration. Fixes UNH IOL Test 3.12, Case 8 Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33577 ---- bhyve nvme: Fix Identify Namespace, NSID=ffffffff If the NVMe Controller doesn't support Namespace Management, it should return "Invalid Namespace or Format" when the Host request Identify Namespace with the global NSID value. Fixes UNH IOL 16.0 Test 9.1, Case 6 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33578 ---- bhyve/virtio: use correct device id for virtio-scsi Section 4.1.2.1 of the virtio spec states that the transitional PCI device id for a scsi device is 0x1004. Fix suggested by reporter. PR: 259961 Reported by: me@nanaya.pro Reviewed by: imp, jhb Fixes: f9c005a17f4e ("Add bhyve virtio-scsi storage backend support.") Differential Revision: https://reviews.freebsd.org/D34103 ---- Create VM_MEMATTR_DEVICE on all architectures This is intended to be used with memory mapped IO, e.g. from bus_space_map with no flags, or pmap_mapdev. Use this new memory type in the map request configured by resource_init_map_request, and in pciconf. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29692 ---- Use if ... else when printing memory attributes In vmstat there is a switch statement that converts these attributes to a string. As some values can be duplicate we have to hide these from userspace. Replace this switch statement with an if ... else macro that lets us repeat values without a compiler error. Reviewed by: kib MFC after: 2 weeks Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D29703 ---- Remove an always-true check. This fixes a -Wtype-limits error from GCC 9. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D31936 ---- vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559 ---- vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448 ---- bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333
Includes all commits up to 2022/02/06 or d21e71efce39. ---- libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b5090487d Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: https://github.com/freebsd/freebsd-src/pull/463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b5090487d MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e3ca895a14 ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada297e897 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307 ---- bhyve: Nuke double-semicolons A distinct number of double-semicolons ended up in bhyve. Take a pass at getting rid of many of these harmless typos. MFC after: 3 days ---- bhyve: Fix pci device node key in bhyve_config.5 PCI device node key in the manual page is wrong. It should be pci.bus.slot.function. MFC after: 3 days ---- bhyve: Support setting the disk serial number for VirtIO block devices. Reviewed by: allanjude Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D31983 ---- bhyve: Update the -G description in the SYNPOSIS. It was missing both the 'w' flag and 'bind_address'. ---- bhyve_config.5: Document gdb.address. ---- bhyve: Add an empty case for event types in mevent_kq_fflags(). This fixes a -Wswitch error raised by GCC 9. Differential Revision: https://reviews.freebsd.org/D31938 ---- bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. Reviewed by: bz, grehan, jhb MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32359 ---- bhyve.8: Fix markup of the -G flag ---- bhyve: Update usage and synopsis for the -k flag Let's make it clear to users that -k is for configuration files. Also, point to bhyve_config(5) in the paragraph describing the flag. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32467 ---- bhyve: ignore low bits of CFGADR Bhyve could emulate wrong PCI registers. In the best case, the guest reads wrong registers and the device driver would report some errors. In the worst case, the guest writes to wrong PCI registers and could brick hardware when using PCI passthrough. According to Intels specification, low bits of CFGADR should be ignored. Some OS like linux may rely on it. Otherwise, bhyve could emulate a wrong PCI register. E.g. If linux would like to read 2 bytes from offset 0x02, following would happen. linux: outl 0x80000002 at CFGADR inw at CFGDAT + 2 bhyve: cfgoff = 0x80000002 & 0xFF = 0x02 coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04 Bhyve would emulate the register at offset 0x04 not 0x02. Reviewed By: #bhyve, grehan Differential Revision: https://reviews.freebsd.org/D31819 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix the WITH_BHYVE_SNAPSHOT build Note, this breaks compatibility with snapshots generated by older builds of bhyve(8). Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") Reported by: Greg V <greg@unrelenting.technology> Reviewed by: grehan, bz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32523 ---- bhyve: Bump the SMBIOS firmware version to 14.0 for 14-CURRENT Bump the firmware version to 14.0 and set the firmware release date to today. Reviewed by: jhb, bz, imp Differential Revision: https://reviews.freebsd.org/D32534 ---- bhyve: use physical lobits for BARs of passthru devices Tell the guest whether a BAR uses prefetched memory or not for passthru devices by using the same lobits as the physical device. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32685 ---- bhyve: do not explicitly map fbuf framebuffer Allocating a BAR will call baraddr which maps the framebuffer. No need to allocate it explicitly on init. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32596 ---- bhyve: move 64 bit BAR location to match OVMF assumptions OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover 64 bit addresses of BARs. OVMF assumes that 64 bit addresses of BARS are located on next 32 GB boundary behind Top of High RAM. This patch moves 64 bit BARs on next 32 GB boundary behind Top of High RAM to match OVMF assumptions. Differential Revision: https://reviews.freebsd.org/D27970 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: use a fixed 32 bit BAR base address OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space. For that reason, we should use that address too. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31051 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: keep physical and virtual COMMAND reg in sync On startup all virtual BARs are registered. Additionally, the encoding bit in the virtual cmd register is set. After that, the passthru emulation overwrites the virtual cmd register with the physical one. This could lead to a mismatch between registered BARs and the encoding bits in the cmd register. Instead of writing the physical to the virtual cmd register, write the virtual to the physical cmd register to solve this issue. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32687 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: emulate reads of MSI-X capabilities for passthru devices Reads of the MSI-X capabilites aren't emulated by passthru devices yet. The guest will read the host MSI-X capabilites which could cause issues. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32686 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix compile We need err.h Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address") Sponsored by: Bechoff Automation GmbH & Co. KG ---- bhyve blockif: fix blockif_candelete with Capsicum NVMe conformance tests for the Format command failed if the backing-storage for the bhyve device was a file instead of a Zvol. The tests (and the specification) expect a Format to destroy all previously written data. The bhyve NVMe emulation implements this by trimming / deallocating all data from the backing-storage. The blockif_candelete() function indicated the file did not support deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE) even though the kernel supported file hole punching. This occurs on builds with Capsicum enabled because blockif did not allow the fpathconf(2) right. Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call. PR: 260081 Reviewed by: allanjude, markj, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33203 ---- bhyve: fix -Wunused-but-set-variable warning Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33306 ---- bhyve: Support a _VARS.fd file for bootrom OVMF creates two separate .fd files, a _CODE.fd file containing the UEFI code, and a _VARS.fd file containing a template of an empty UEFI variable store. OVMF decides to write variables to the memory range just below the boot rom code if it detects a CFI flash device. So here we add just the barest facsimile of CFI command handling to bootrom.c that is needed to placate OVMF. Submitted by: D Scott Phillips <d.scott.phillips@intel.com> Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19976 MFC After: 1 week ---- bhyve: set EV_CLEAR for EVFILT_VNODE mevents When an EVFILT_VNODE filter event is triggered, reset it. This fixes the issue where a virtio-blk resize event would cause the mevent thread to consume 100% of the cpu. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33326 ---- bhyve nvme: Add AEN support to NVMe emulation Add Asynchronous Event Notification infrastructure to the NVMe emulation. Reviewed by: imp, grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32952 ---- bhyve nvme: Inform guests of namespace resize Register a "block resize" callback to be notified of changes to the backing storage for the Namespace. Use this to generate an Asynchronous Event Notification, Namespace Attributes Changed when the guest OS provides an Asynchronous Event Request. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32953 ---- bhyve: Only snapshot initialized VirtIO queues If the virtio device is not fully initialized, then suspend fails with: vi_pci_snapshot_queues: invalid address: vq->vq_desc Failed to snapshot virtio-rnd; ret=14 MFC after: 1 week Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D26268 ---- bhyve: passthru: enable BARs before possibly mmap(2)ing them The first time we start bhyve with a passthru device everything is fine as on boot we do enable BARs. If a driver (unload) inside bhyve disables the BAR(s) as some Linux drivers do, we need to make sure we re-enable them on next bhyve start. If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP) the kernel will give us an EBUSY. While we were re-enabling the BAR(s) in the current code loop cfginit() was writing the changes out too late to the real hardware. Move the call to init_msix_table() after the register on the real hardware was updated. That way the kernel will be happy and the mmap will succeed and bhyve will start. Also simplify the code given the last argument to init_msix_table() is unused we do not need to do checks for each bar. [1] MFC after: 3 days PR: 260148 Pointed out by: markj [1] Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33628 ---- bhyve: clean up trailing whitespaces Clean up trailing whitespaces. No functional changes. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33681 ---- bhyve smbios type 3 structure is incorrect If you look at the SMBIOS specification, we'll find something is missing. In particular at offset 0Dh is supposed to be the OEM-defined field. This should go between security and height. It is not legal to actually skip this and will lead to other folks not properly interpreting later parts of the table. https://www.illumos.org/issues/14312 Reviewed by: jhb Submitted by: Robert Mustacchi <rm@fingolfin.org> Obtained from: ilumos MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33682 ---- bhyve: only init MSI-X table if passthru device supports it Some passthru devices only support MSI instead of MSI-X. For those devices the initialization of MSI-X table will fail. Re-add the check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9. MFC after: 3 days X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9 PR: 260148 Reviewed by: manu, bz Differential Revision: https://reviews.freebsd.org/D33728 ---- bhyve: enumerate BARs by size E.g. Framebuffers can require large space and BARs need to be aligned by their size. If BARs aren't allocated by size, it'll cause much fragmentation of the MMIO space. Reduce fragmentation by ordering the BAR allocation on their size to reduce the risk of OUT_OF_MMIO_SPACE issues. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D28278 ---- bhyve: allow reading of fwctl signature multiple times At the moment, you only have one single chance to read the fwctl signature. At boot bhyve is in the state IDENT_WAIT. It's then possible to switch to IDENT_SEND. After bhyve sends the signature, it switches to REQ. From now on it's impossible to switch back to IDENT_SEND to read the signature. For that reason, only a single driver can read the signature. A guest can't use two drivers to identify that fwctl is present. It gets even worse when using OVMF. OVMF uses a library to access fwctl. Therefore, every single OVMF driver would try to read the signature. Currently, only a single OVMF driver accesses the fwctl. So, there's no issue with it yet. However, no OS driver would have a chance to detect fwctl when using OVMF because it's signature was already consumed by OVMF. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D31981 ---- bhyve: add more slop to 64 bit BARs Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables based on this allocation. If the guest decides to relocate those BARs above 4 GB, it could lead to mismatching ACPI tables. Especially when using OVMF with enabled bus enumeration it could cause issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS may be unable to recover from this situation and disables some PCI devices because their BARs are located outside of the MMIO space reported by ACPI. Avoid this situation by giving the guest more space for relocating BARs. Let's be paranoid. The available space for BARs below 4 GB is 512 MB large. Use a slop of 512 MB. It'll allow the guest to relocate all BARs below 4 GB to an address above 4 GB. We could run into issues when we exceeding the memlimit above 4 GB. However, this space has a size of 32 GB. Even when using many PCI device with large BARs like framebuffer or when using multiple PCI busses, it's very unlikely that we run out of space due to the large slop. Additionally, this situation will occur on startup and not at runtime which is much better. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33118 ---- bhyve: dynamically register FwCtl ports Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's FwCtl. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33496 ---- bhyve: Map the right BAR in init_msix_table() The PBA and MSI-X table can reside in different BARs. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: Correct unmapping of the MSI-X table BAR The starting address passed to mprotect was wrong, so in the case where the last page containing the table is not the last page of the BAR, the wrong region would be unmapped. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: add nvlist functions for setting unset nodes If an emulation uses those functions instead of set_config_value_node or set_config_value, it allows the config values to get overwritten. Introducing new functions is much more readable than if else statements in the emulation code. Reviewed by: khng MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33770 ---- bhyve: get mediasize for character devices when resizing virtio-blk Reviewed by: imp, allanjude, jhb Differential Revision: https://reviews.freebsd.org/D33403 ---- bhyve/snapshot: fix pthread_create() error check pthread_create() returns 0 on success or an error number on failure. Reviewed by: khng, markj Differential Revision: https://reviews.freebsd.org/D33930 ---- Append Keyboard Layout specified option for using VNC. Part two: Append bhyve -K option for specified keyboard layout with layout setting files every languages. Since the cmd option '-k' was used in the meantime it was changed to '-K' PR: 246121 Submitted by: koinec@yahoo.co.jp Reviewed by: grehan@ Differential Revision: https://reviews.freebsd.org/D29473 MFC after: 4 weeks ---- bhyve: ahci: Fix regression with no ports An AHCI controller may be specified with no connected ports. Avoid dumping core in this case for compatibility with existing VM configs. Reviewed by: khng, jhb Fixes: 621b5090487de Refactor configuration management in bhyve. MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D33969 ---- bhyve/block_if: allow DIOCGMEDIASIZE ioctl This is needed to get mediasize of the device after a resize event. I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which disables capsicum. Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...") Differential Revision: https://reviews.freebsd.org/D34013 ---- pkgbase: bhyve: Tag the kbdlayout file to be in the bhyve package ---- bhyve nvme: Advertise v1.4 support Bump advertised NVMe support from v1.3 to v1.4 Reviewed by: allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33564 ---- bhyve nvme: Fix NVM Format completion status The NVM Format command is unique among the Admin commands in that it needs to finish asynchronously. For this reason, the emulation code invented a synthetic completion status (NVME_NO_STATUS) to indicate that the command was still in progress and the command processing loop should not generate a completion message. The implementation used the value 0xffff for the synthetic value as this set both the Status Code and Status Code Type fields to reserved values. Format initialized the completion status to this value and expected error cases to override it with a status code/type appropriate to the situation. The macros used to set the NVMe status are careful not to modify bit 0 (i.e. the phase bit), which with the synthetic completion status, causes the phase bit to get out of sync. When running tests in a guest with illegal NVM Format commands, Admin commands would eventually hang because it appeared there were no completions due to the incorrect phase bit value. Fix is to only set NVME_NO_STATUS if the blockif delete command succeeds. While in the neighborhood, add a missing break statement when NVM Format is not supported. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33565 ---- bhyve nvme: Fix Namespace Specific Set Features Return an error if the feature specified in Set Features is Namespace specific but the Namespace ID uses the Global Namespace tag. Fixes UNH Test 1.2.7 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33566 ---- bhyve nvme: Implement Log Page Offset Modify the Get Log Page command to parse the Log Page Offset fields to support more recent versions of the NVMe specification. Fixes various tests for UNH Test 1.3.* Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33568 ---- bhyve nvme: Add missing Admin opcodes Don't treat unsupported Admin commands as Invalid Opcode. Instead return the proper Invalid Field in Command. Fixes UNH IOL test 1.17.2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33569 ---- bhyve nvme: Remove redundant AER Limit checks The NVMe emulation checked if the Asynchronous Event Request Limit (a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function is only called from nvme_opc_async_event_req() which also checks for exceeding the AERL. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33570 ---- bhyve nvme: Fix Set Features Be more conservative and only support the Features mandatory for an I/O Controller. Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency Mode Configuration and Host Behavior Support features. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33571 ---- bhyve nvme: Add Temperature Threshold support This adds the ability for a guest OS to send Set / Get Feature, Temperature Threshold commands. The implementation assumes a constant temperature and will generate an Asynchronous Event Notification if the specified threshold is above/below this value. Although the specification allows 9 temperature values, this implementation only implements the Composite Temperature. While in the neighborhood, move the clear of the CSTS register in the reset function after all other cleanup. This avoids a race with the guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe emulation is actually complete with the reset. Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33572 ---- bhyve nvme: Update v1.4 Identify Controller data Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE). Also, do not advertise secure erase functionality in the Format NVM Attributes field of the Identify Controller data structure as the Controller does not implement secure erase. Fixes UNH ILO Test 1.1, Case 2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33573 ---- bhyve nvme: Add Select support to Get Features Implement basic support for the SEL field of Get Features. This returns information about Namespace Specific features. Fixes UNH ILO 16.0 Test 1.2, Case 13 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33574 ---- bhyve nvme: Fix LBA out-of-range calculation The function which checks for a valid LBA range mistakenly named an input value as NLB ("Number of Logical Blocks") instead of "number of blocks". The NVMe specification defines NLB as a zero-based value (i.e. NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed parameter is a 1's-based value. Fix is to rename the variable to avoid future confusion. While in the neighborhood, also check that the starting LBA is less than the size of the backing storage to avoid an integer overflow. Reviewed by: imp, allanjude, jhb Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33575 ---- bhyve nvme: Fix reported VWC value v1.4 and later NVMe Controllers report "Flush all Namespaces" support differently. Fixes UNH IOL 16.0 Test 2.6, Case 3 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33576 ---- bhyve nvme: Fix Set Features, AEN NVMe Controllers which do not support Endurance Groups must return an error when the Endurance Group Event Aggregate Log Change Notices bit is set in Set Features, Asynchronous Event Configuration. Fixes UNH IOL Test 3.12, Case 8 Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33577 ---- bhyve nvme: Fix Identify Namespace, NSID=ffffffff If the NVMe Controller doesn't support Namespace Management, it should return "Invalid Namespace or Format" when the Host request Identify Namespace with the global NSID value. Fixes UNH IOL 16.0 Test 9.1, Case 6 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33578 ---- bhyve/virtio: use correct device id for virtio-scsi Section 4.1.2.1 of the virtio spec states that the transitional PCI device id for a scsi device is 0x1004. Fix suggested by reporter. PR: 259961 Reported by: me@nanaya.pro Reviewed by: imp, jhb Fixes: f9c005a17f4e ("Add bhyve virtio-scsi storage backend support.") Differential Revision: https://reviews.freebsd.org/D34103 ---- Create VM_MEMATTR_DEVICE on all architectures This is intended to be used with memory mapped IO, e.g. from bus_space_map with no flags, or pmap_mapdev. Use this new memory type in the map request configured by resource_init_map_request, and in pciconf. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29692 ---- Use if ... else when printing memory attributes In vmstat there is a switch statement that converts these attributes to a string. As some values can be duplicate we have to hide these from userspace. Replace this switch statement with an if ... else macro that lets us repeat values without a compiler error. Reviewed by: kib MFC after: 2 weeks Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D29703 ---- Remove an always-true check. This fixes a -Wtype-limits error from GCC 9. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D31936 ---- vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559 ---- vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448 ---- bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333
Includes all commits up to 2022/02/06 or d21e71efce39. ---- libvmm: clean up vmmapi.h struct checkpoint_op, enum checkpoint_opcodes, and MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi header. They are used for the save/restore functionality that bhyve(8) provides and are better suited in usr.sbin/bhyve/snapshot.h Since bhyvectl(8) requires these, the Makefile for bhyvectl has been modified to include usr.sbin/bhyve/snapshot.h Reviewed by: kevans, grehan Differential Revision: https://reviews.freebsd.org/D28410 ---- bhyve/snapshot: drop mkdir when creating the unix domain socket Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when creating the unix domain socket for a given bhyve vm. The path to the unix domain socket for a bhyve vm will now be /var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared to bhyvectl(8). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28783 ---- bhyve/snapshot: rename checkpoint_opcodes to be more generic Generalize the naming here since the domain socket that uses these codes might be used for purposes other than the save/restore feature. - rename checkpoint_opcodes to ipc_opcode Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28877 ---- bhyvectl: reduce code duplication Combine send_start_checkpoint() and send_start_suspend() into a single function named snapshot_request(). snapshot_request() is equivalent to send_start_checkpoint() and send_start_suspend() except that it takes an additional argument. The additional argument, enum ipc_opcode, is used to determine the type of snapshot request being performed. Also, switch to using strlcpy instead of strncpy. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28878 ---- bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character buffer that stores a filename or the path to a file - this file is used by the save/restore feature. Since the file doesn't have anything to do with a vm name, rename MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX while here. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28879 ---- bhyvectl: print a better error message when vm_open() fails Use errno to print a more descriptive error message when vm_open() fails libvmm: preserve errno when vm_device_open() fails vm_destroy() squashes errno by making a dive into sysctlbyname() - we can safely skip vm_destroy() here since it's not doing any critical clean up at this point. Replace vm_destroy() with a free() call. PR: 250671 MFC after: 3 days Submitted by: marko@apache.org Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D29109 ---- bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM The save/restore feature uses a unix domain socket to send messages from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice for this. An added benefit of using a datagram socket is simplified code. For bhyve, the listen/accept calls are dropped; and for bhyvectl, the connect() call is dropped. EPRINTLN handles raw mode for bhyve(8), use it to print error messages. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28983 ---- bhyve: virtio shares definitions between sys/dev/virtio Definitions inside usr.sbin/bhyve/virtio.h are thrown away. Definitions in sys/dev/virtio are used instead. This reduces code duplication. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29084 ---- Refactor configuration management in bhyve. Replace the existing ad-hoc configuration via various global variables with a small database of key-value pairs. The database supports heirarchical keys using a MIB-like syntax to name the path to a given key. Values are always stored as strings. The API used to manage configuation values does include wrappers to handling boolean values. Other values use non-string types require parsing by consumers. The configuration values are stored in a tree using nvlists. Leaf nodes hold string values. Configuration values are permitted to reference other configuration values using '%(name)'. This permits constructing template configurations. All existing command line arguments now set configuration values. For devices, the "-s" option parses its option argument to generate a list of key-value pairs for the given device. A new '-o' command line option permits setting an individual configuration variable. The key name is always given as a full path of dot-separated components. A new '-k' command line option parses a simple configuration file. This configuration file holds a flat list of 'key=value' lines where the 'key' is the full path of a configuration variable. Lines starting with a '#' are comments. In general, bhyve starts by parsing command line options in sequence and applying those settings to configuration values. Once this is complete, bhyve then begins initializing its state based on the configuration values. This means that subsequent configuration options or files may override or supplement previously given settings. A special 'config.dump' configuration value can be set to true to help debug configuration issues. When this value is set, bhyve will print out the configuration variables as a flat list of 'key=value' lines. Most command line argments map to a single configuration variable, e.g. '-w' sets the 'x86.strictmsr' value to false. A few command line arguments have less obvious effects: - Multiple '-p' options append their values (as a comma-seperated list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number). - For '-s' options, a pci.<bus>.<slot>.<function> node is created. The first argument to '-s' (the device type) is used as the value of a "device" variable. Additional comma-separated arguments are then parsed into 'key=value' pairs and used to set additional variables under the device node. A PCI device emulation driver can provide its own hook to override the parsing of the additonal '-s' arguments after the device type. After the configuration phase as completed, the init_pci hook then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the "device" value to find the device model to use. The device model's init routine is passed a reference to its nvlist node in the configuration tree which it can query for specific variables. The result is that a lot of the string parsing is removed from the device models and centralized. In addition, adding a new variable just requires teaching the model to look for the new variable. - For '-l' options, a similar model is used where the string is parsed into values that are later read during initialization. One key note here is that the serial ports use the commonly used lowercase names from existing documentation and examples (e.g. "lpc.com1") instead of the uppercase names previously used internally in bhyve. Reviewed by: grehan MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D26035 ---- bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066 ---- bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862 ---- bhyve hostbridge: Rename "device" property to "devid". "device" is already used as the generic PCI-level name of the device model to use (e.g. "hostbridge"). The result was that parsing "hostbridge" as an integer failed and the host bridge used a device ID of 0. The EFI ROM asserts that the device ID of the hostbridge is not 0, so booting with the current EFI ROM was failing during the ROM boot. Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb ---- bhyve: Enable virtio-scsi legacy config parsing. The previous commit added the handler to parse the command line options for virtio-scsi devices but forgot to set the correct function pointer to point to the handler. Reported by: vangyzen Reviewed by: vangyzen Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29438 ---- bhyve: change vq_getchain to return iovecs in both directions The old prototype requires callers to inspect flags of each descriptors to get the starting position of host-writable iovecs. vq_getchain() is changed to return a virtio request with the number of host-readable iovecs and host-writable iovecs instead. Callers can avoid boilerplate code of getting the start offset of host-writable iovecs. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Reviewed by: afedorov Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29433 ---- Fix typo in xhci nvlist node name, and also increment device counter. This allows the xhci tablet device to be recognized and a PCI device instantiated. Reviewed by: jhb Fixes: 621b5090487d Refactor configuration management in bhyve. MFC after: 3 months. ---- bhyve: fix regression in legacy virtio-9p config parsing Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression in legacy virtio-9p config parsing by not initializing *sharename to NULL. As a result, "sharename != NULL" check in the first iteration fails and bhyve exits with "virtio-9p: more than one share name given". Fix by adding NULL back. Approved by: grehan ---- bhyve: add SMBIOS Baseboard Information Add the System Management BIOS Baseboard (or Module) Information a.k.a. Type 2 structure to the SMBIOS emulation. Reviewed by: rgrimes, bcran, grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29657 ---- bhyve: Move the gdb_active check to gdb_cpu_suspend(). The check needs to be in the public routine (gdb_cpu_suspend()), not in the internal routine called from various places (_gdb_cpu_suspend()). All the other callers of _gdb_cpu_suspend() already check gdb_active, and this breaks the use of snapshots when the debug server is not enabled since gdb_cpu_suspend() tries to lock an uninitialized mutex. Reported by: Darius Mihai, Elena Mihailescu Reviewed by: elenamihailescu22_gmail.com Fixes: 621b5090487de9fed1b503769702a9a2a27cc7bb Differential Revision: https://reviews.freebsd.org/D29538 ---- bhyve: implement rdmsr for MSR_IA32_FEATURE_CONTROL Without the -w option, Windows guests crash on boot. This is caused by a rdmsr of MSR_IA32_FEATURE_CONTROL. Windows checks this MSR to determine enabled VMX features. This MSR isn't emulated in bhyve, so a #GP exception is injected which causes Windows to crash. Fix by returning a rdmsr of MSR_IA32_FEATURE_CONTROL with Lock Bit set and VMX disabled to informWindows that VMX isn't available. Reviewed by: jhb, grehan (bhyve) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29665 ---- bhyve.8: Make synopsis more readable There is no need to squeeze all the possible options into one synopsis entry. Let "-l help" and "-s help" be listed separately. While here, keep -s and its arguments on the same line. MFC after: 2 weeks ---- bhyve: Fix synopsis in the usage message In particular: - Sort short options to align with style(9) - Add two missing flags: -G and -r - Drop unnecessary angle brackets for consistency - Rename the "vm" argument to vmname for consistency with the manual page MFC after: 2 weeks ---- bhyve: Improve the option description in the usage message - Sort options as suggested by style(9) - Capitalize some words like CPU and HLT - Add a missing description for the -G flag MFC after: 2 weeks ---- bhyve.8: Sort the options in the OPTIONS section No content change intended. Just moving the option descriptions around to follow the order suggested by style(9). MFC after: 2 weeks ---- bhyve.8: Improve the description and synopsis of -l - Describe "-l help" separately for readability. - List all the supported comX devices explicitly - Use Cm instead of Ar for command modifiers (i.e., literal values a user can specify as an argument to the command). - Explain where to get more information about the possible values of the conf argument. MFC after: 2 weeks ---- bhyve.8: Improve the description of the -m flag - Stylize the synopsis with proper mdoc macros - Do some wordsmithing on the description for consistency. MFC after: 2 weeks ---- bhyve.8: Fix the synopsis of -p Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up description of -r There is no need to wrap those flags in Op macros. MFC after: 2 weeks ---- bhyve.8: Fix indention in the signals table MFC after: 2 weeks ---- bhyve.8: Clean-up synopsis of -s - Document "-s help" separately for readability. - Use appropriate mdoc macros. MFC after: 2 weeks ---- bhyve.8: Clean up the slot description of -s Also, remove the macros of the nested list which contained slot, emulation and conf. This decreases the indention of the -s description. It was necessary to clean up the slot description. MFC after: 2 weeks ---- bhyve.8: Improve emulation description of the -s flag - Set width of the list to the longest key word for readability. - Separate descriptions of amd_hostbridge and hostbridge emulations. Also, wordsmith their descriptions for consistency with other entries. - Use Cm instead of Li for command modifiers. - Do not stylize AMD with Li, there's no need to do it. - Mention COM3 and COM4 in the definition of lpc. - Fix a typo in the definition of ahci-hd ("hard drive" instead of "hard-drive"). MFC after: 2 weeks ---- bhyve.8: Clean up network backends section - Reformat the format lists, use appropriate mdoc macros for readability. - Add a missing Oxford comma. MFC after: 2 weeks ---- bhyve.8: Clean up block storage device backends description MFC after: 2 weeks ---- bhyve.8: Clean up SCSI device backends section MFC after: 2 weeks ---- bhyve.8: Clean up 9P device backends section MFC after: 2 weeks ---- bhyve.8: Clean up TTY, boot ROM, and pass-through descriptions MFC after: 2 weeks ---- bhyve.8: Clean up virtio console device backends description MFC after: 2 weeks ---- bhyve.8: Improve framebuffer backends description - Use appropriate mdoc macros - Document that tcp= is a synonym to rfb= (tcp is used in the examples, but never mentioned) - Clarify the IP address specification MFC after: 2 weeks ---- bhyve.8: Improve documentation of NVME backend - Document the configuration format. - Document two additional configuration options: eui64 and dsm. MFC after: 2 weeks ---- bhyve.8: Improve AHCI backends documentation - Document the backend format. MFC after: 2 weeks ---- bhyve: Document the format for HD audio backends - This change is done for consistency with other backend definitions. MFC after: 2 weeks ---- bhyve.8: Fix mandoc -Tlint issues While here, keep network backends section consistent with other sections. MFC after: 2 weeks ---- bhyve: Be explicit that setting config.dump will not start a VM. Suggested by: rpokala Reviewed by: bcr (manpages) Differential Revision: https://reviews.freebsd.org/D29738 ---- bhyve: Gracefully handle virtio-scsi with no conf Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used by some third party software to probe bhyve for virtio-scsi support. Reviewed by: jhb MFC after: 1 day Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29926 ---- bhyve: Set SO_REUSEADDR on the gdb stub socket Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30037 ---- bhyve/snapshot: provide a way to send other messages/data to bhyve This is a step towards sending messages (other than suspend/checkpoint) from bhyvectl to bhyve. Introduce a new struct, ipc_message - this struct stores the type of message and a union containing message specific structures for the type of message being sent. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30221 ---- bhyve/snapshot: split up mutex/cond initialization from socket creation Move initialization of the mutex/condition variables required by the save/restore feature to their own function. The unix domain socket that facilitates communication between bhyvectl and bhyve doesn't rely on these variables in order to be functional. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D30281 ---- Add a virtio-input device emulation. This will be used to inject keyboard/mouse input events into a guest. The command line syntax is: -s <slot>,virtio-input,/dev/input/eventX Reviewed by: jhb (bhyve), grehan Obtained from: Corvin Köhne <C.Koehne@beckhoff.com> MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30020 ---- bhyve: Register new kevents synchronously. Change mevent_add*() to synchronously add the new kevent. This permits reporting event registration failures to the caller and avoids failing the registration of other, unrelated events queued up in the same batch. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30502 ---- bhyve: Add support for EVFILT_VNODE mevents. This allows registering an event to watch for changes to a file's attributes. This is a bit imperfect as it would be nice to have a way to determine if an fd can use EVFILT_VNODE successfully. mevent's current structure does not permit that and a failure to register a single kevent impacts several other kevents. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30503 ---- bhyve: Add support for handling disk resize events to block_if. Allow clients of blockif to register a resize callback handler. When a callback is registered, register an EVFILT_VNODE kevent watching the backing store for a change in the file's attributes. If the size has changed when the kevent fires, invoke the clients' callback. Currently resize detection is limited to backing stores that support EVFILT_VNODE kevents such as regular files. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30504 ---- bhyve: Split out a lower-level helper for VirtIO interrupts. This allows device models to assert VirtIO interrupts for reasons other than publishing changes to a VirtIO ring such as configuration changes. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30505 ---- bhyve vtblk: Inform guests of disk resize events. Register a resize callback with the blockif interface. When the callback fires, update the size of the disk and notify the guest via a configuration change interrupt. Reviewed by: grehan, markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D30506 ---- bhyve: enhance debug info for memory range clash Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: https://github.com/freebsd/freebsd-src/pull/463 ---- Add more GIC and GICv3 registers These aren't used by either driver, however they will be needed by bhyve on arm64 to emulate a GICv3 interrupt controller. Sponsored by: Innovate UK ---- bhyve: Fix cli regression with NVMe ram The configuration management refactoring inadvertently removed support for a RAM-backed NVMe Namespace (i.e. -s X,nvme,ram=16384). This adds it back. Reported by: andy@omniosce.org Reviewed by: jhb, andy@omniosce.org Fixes: 621b5090487d MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30717 ---- bhyve: fix NVMe MDTS comment Removes an obsolete comment and adds parenthesis around the macro while in the area. No functional change. ---- bhyve: Fix NVMe iovec construction for large IOs The UEFI driver included with Rocky Linux 8.4 uncovered an existing bug in the NVMe emulation's construction of iovec's. By default, NVMe data transfer operations use a scatter-gather list in which all entries point to a fixed size memory region. For example, if the Memory Page Size is 4KiB, a 2MiB IO requires 512 entries. Lists themselves are also fixed size (default is 512 entries). Because the list size is fixed, the last entry is special. If the IO requires more than 512 entries, the last entry in the list contains the address of the next list of entries. But if the IO requires exactly 512 entries, the last entry points to data. The NVMe emulation missed this logic and unconditionally treated the last entry as a pointer to the next list. Fix is to check if the remaining data is greater than the page size before using the last entry as a pointer to the next list. PR: 256422 Reported by: dave@syix.com Tested by: jason@tubnor.net MFC after: 5 days Relnotes: yes Reviewed by: imp, grehan Differential Revision: https://reviews.freebsd.org/D30897 ---- Append Keyboard Layout specified option for using VNC. Part one: supporting QEMU Extended Keyboard Event Message PR: 246121 Submitted by: koinec@yahoo.co.jp Differential Revision: https://reviews.freebsd.org/D29430 ---- libvmm: explicitly save and restore errno in vm_open() In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to preserve errno. However, it's possible that free() may change the errno as well. Keep the free() call, but explicitly save and restore errno. Noted by: jhb Fixes: 6bb140e3ca895a14 ---- vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462 ---- vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805 ---- bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236 ---- bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237 ---- bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238 ---- ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945 ---- AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984 ---- Correct "Fondation" typo (missing "u") ---- AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: 74ada297e897 Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521 ---- vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827 ---- vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372 ---- amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466 ---- vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462 ---- vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514 ---- vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156 ---- bhyve: net_backends, automatically IFF_UP tap devices If you want communications with the outside world and tell bhyve to create an interfaces then it should be usable as well. Rather than relying on the sysctl net.link.tap.up_on_open automatically try to IFF_UP the opened tap device. MFC after: 10 days Reviewed by: markj, grehan Differential Revision: https://reviews.freebsd.org/D31342 ---- bhyve: Use fspacectl(2) for BOP_DELETE on regular file images bhyve can also make use of fspacectl(2) to implement BOP_DELETE with hole-punching. Since it is not desirable to do zero-filling for large DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates that the underlying file system does not support native VOP_DEALLOCATE(9). Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D28880 ---- bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 ---- byhve: add option to specify IP address for gdb Allow user to specify the IP address available for gdb debugger. Reviewed by: jhb, grehan, rgrimes, bcr (man pages) Differential Revision: https://reviews.freebsd.org/D29607 ---- bhyve: change a default address from ANY to localhost Discussed with: grehan, jhb ---- bhyve: Fix vq_getchain() error handling bugs in various device models Reviewed by: grehan, khng Approved by: so Security: CVE-2021-29631 Security: FreeBSD-SA-21:13.bhyve ---- pci: Add an ioctl to perform I/O to BARs This is useful for bhyve, which otherwise has to use /dev/io to handle accesses to I/O port BARs when PCI passthrough is in use. Reviewed by: imp, kib Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31307 ---- bhyve: Nuke double-semicolons A distinct number of double-semicolons ended up in bhyve. Take a pass at getting rid of many of these harmless typos. MFC after: 3 days ---- bhyve: Fix pci device node key in bhyve_config.5 PCI device node key in the manual page is wrong. It should be pci.bus.slot.function. MFC after: 3 days ---- bhyve: Support setting the disk serial number for VirtIO block devices. Reviewed by: allanjude Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D31983 ---- bhyve: Update the -G description in the SYNPOSIS. It was missing both the 'w' flag and 'bind_address'. ---- bhyve_config.5: Document gdb.address. ---- bhyve: Add an empty case for event types in mevent_kq_fflags(). This fixes a -Wswitch error raised by GCC 9. Differential Revision: https://reviews.freebsd.org/D31938 ---- bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. Reviewed by: bz, grehan, jhb MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32359 ---- bhyve.8: Fix markup of the -G flag ---- bhyve: Update usage and synopsis for the -k flag Let's make it clear to users that -k is for configuration files. Also, point to bhyve_config(5) in the paragraph describing the flag. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32467 ---- bhyve: ignore low bits of CFGADR Bhyve could emulate wrong PCI registers. In the best case, the guest reads wrong registers and the device driver would report some errors. In the worst case, the guest writes to wrong PCI registers and could brick hardware when using PCI passthrough. According to Intels specification, low bits of CFGADR should be ignored. Some OS like linux may rely on it. Otherwise, bhyve could emulate a wrong PCI register. E.g. If linux would like to read 2 bytes from offset 0x02, following would happen. linux: outl 0x80000002 at CFGADR inw at CFGDAT + 2 bhyve: cfgoff = 0x80000002 & 0xFF = 0x02 coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04 Bhyve would emulate the register at offset 0x04 not 0x02. Reviewed By: #bhyve, grehan Differential Revision: https://reviews.freebsd.org/D31819 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix the WITH_BHYVE_SNAPSHOT build Note, this breaks compatibility with snapshots generated by older builds of bhyve(8). Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") Reported by: Greg V <greg@unrelenting.technology> Reviewed by: grehan, bz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32523 ---- bhyve: Bump the SMBIOS firmware version to 14.0 for 14-CURRENT Bump the firmware version to 14.0 and set the firmware release date to today. Reviewed by: jhb, bz, imp Differential Revision: https://reviews.freebsd.org/D32534 ---- bhyve: use physical lobits for BARs of passthru devices Tell the guest whether a BAR uses prefetched memory or not for passthru devices by using the same lobits as the physical device. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32685 ---- bhyve: do not explicitly map fbuf framebuffer Allocating a BAR will call baraddr which maps the framebuffer. No need to allocate it explicitly on init. Reviewed by: grehan Sponsored by: Beckhoff Autmation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D32596 ---- bhyve: move 64 bit BAR location to match OVMF assumptions OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover 64 bit addresses of BARs. OVMF assumes that 64 bit addresses of BARS are located on next 32 GB boundary behind Top of High RAM. This patch moves 64 bit BARs on next 32 GB boundary behind Top of High RAM to match OVMF assumptions. Differential Revision: https://reviews.freebsd.org/D27970 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: use a fixed 32 bit BAR base address OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space. For that reason, we should use that address too. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D31051 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: keep physical and virtual COMMAND reg in sync On startup all virtual BARs are registered. Additionally, the encoding bit in the virtual cmd register is set. After that, the passthru emulation overwrites the virtual cmd register with the physical one. This could lead to a mismatch between registered BARs and the encoding bits in the cmd register. Instead of writing the physical to the virtual cmd register, write the virtual to the physical cmd register to solve this issue. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32687 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: emulate reads of MSI-X capabilities for passthru devices Reads of the MSI-X capabilites aren't emulated by passthru devices yet. The guest will read the host MSI-X capabilites which could cause issues. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D32686 Sponsored by: Beckhoff Automation GmbH & Co. KG ---- bhyve: Fix compile We need err.h Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address") Sponsored by: Bechoff Automation GmbH & Co. KG ---- bhyve blockif: fix blockif_candelete with Capsicum NVMe conformance tests for the Format command failed if the backing-storage for the bhyve device was a file instead of a Zvol. The tests (and the specification) expect a Format to destroy all previously written data. The bhyve NVMe emulation implements this by trimming / deallocating all data from the backing-storage. The blockif_candelete() function indicated the file did not support deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE) even though the kernel supported file hole punching. This occurs on builds with Capsicum enabled because blockif did not allow the fpathconf(2) right. Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call. PR: 260081 Reviewed by: allanjude, markj, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33203 ---- bhyve: fix -Wunused-but-set-variable warning Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33306 ---- bhyve: Support a _VARS.fd file for bootrom OVMF creates two separate .fd files, a _CODE.fd file containing the UEFI code, and a _VARS.fd file containing a template of an empty UEFI variable store. OVMF decides to write variables to the memory range just below the boot rom code if it detects a CFI flash device. So here we add just the barest facsimile of CFI command handling to bootrom.c that is needed to placate OVMF. Submitted by: D Scott Phillips <d.scott.phillips@intel.com> Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19976 MFC After: 1 week ---- bhyve: set EV_CLEAR for EVFILT_VNODE mevents When an EVFILT_VNODE filter event is triggered, reset it. This fixes the issue where a virtio-blk resize event would cause the mevent thread to consume 100% of the cpu. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33326 ---- bhyve nvme: Add AEN support to NVMe emulation Add Asynchronous Event Notification infrastructure to the NVMe emulation. Reviewed by: imp, grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32952 ---- bhyve nvme: Inform guests of namespace resize Register a "block resize" callback to be notified of changes to the backing storage for the Namespace. Use this to generate an Asynchronous Event Notification, Namespace Attributes Changed when the guest OS provides an Asynchronous Event Request. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32953 ---- bhyve: Only snapshot initialized VirtIO queues If the virtio device is not fully initialized, then suspend fails with: vi_pci_snapshot_queues: invalid address: vq->vq_desc Failed to snapshot virtio-rnd; ret=14 MFC after: 1 week Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D26268 ---- bhyve: passthru: enable BARs before possibly mmap(2)ing them The first time we start bhyve with a passthru device everything is fine as on boot we do enable BARs. If a driver (unload) inside bhyve disables the BAR(s) as some Linux drivers do, we need to make sure we re-enable them on next bhyve start. If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP) the kernel will give us an EBUSY. While we were re-enabling the BAR(s) in the current code loop cfginit() was writing the changes out too late to the real hardware. Move the call to init_msix_table() after the register on the real hardware was updated. That way the kernel will be happy and the mmap will succeed and bhyve will start. Also simplify the code given the last argument to init_msix_table() is unused we do not need to do checks for each bar. [1] MFC after: 3 days PR: 260148 Pointed out by: markj [1] Sponsored by: The FreeBSD Foundation Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D33628 ---- bhyve: clean up trailing whitespaces Clean up trailing whitespaces. No functional changes. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33681 ---- bhyve smbios type 3 structure is incorrect If you look at the SMBIOS specification, we'll find something is missing. In particular at offset 0Dh is supposed to be the OEM-defined field. This should go between security and height. It is not legal to actually skip this and will lead to other folks not properly interpreting later parts of the table. https://www.illumos.org/issues/14312 Reviewed by: jhb Submitted by: Robert Mustacchi <rm@fingolfin.org> Obtained from: ilumos MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33682 ---- bhyve: only init MSI-X table if passthru device supports it Some passthru devices only support MSI instead of MSI-X. For those devices the initialization of MSI-X table will fail. Re-add the check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9. MFC after: 3 days X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9 PR: 260148 Reviewed by: manu, bz Differential Revision: https://reviews.freebsd.org/D33728 ---- bhyve: enumerate BARs by size E.g. Framebuffers can require large space and BARs need to be aligned by their size. If BARs aren't allocated by size, it'll cause much fragmentation of the MMIO space. Reduce fragmentation by ordering the BAR allocation on their size to reduce the risk of OUT_OF_MMIO_SPACE issues. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D28278 ---- bhyve: allow reading of fwctl signature multiple times At the moment, you only have one single chance to read the fwctl signature. At boot bhyve is in the state IDENT_WAIT. It's then possible to switch to IDENT_SEND. After bhyve sends the signature, it switches to REQ. From now on it's impossible to switch back to IDENT_SEND to read the signature. For that reason, only a single driver can read the signature. A guest can't use two drivers to identify that fwctl is present. It gets even worse when using OVMF. OVMF uses a library to access fwctl. Therefore, every single OVMF driver would try to read the signature. Currently, only a single OVMF driver accesses the fwctl. So, there's no issue with it yet. However, no OS driver would have a chance to detect fwctl when using OVMF because it's signature was already consumed by OVMF. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D31981 ---- bhyve: add more slop to 64 bit BARs Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables based on this allocation. If the guest decides to relocate those BARs above 4 GB, it could lead to mismatching ACPI tables. Especially when using OVMF with enabled bus enumeration it could cause issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS may be unable to recover from this situation and disables some PCI devices because their BARs are located outside of the MMIO space reported by ACPI. Avoid this situation by giving the guest more space for relocating BARs. Let's be paranoid. The available space for BARs below 4 GB is 512 MB large. Use a slop of 512 MB. It'll allow the guest to relocate all BARs below 4 GB to an address above 4 GB. We could run into issues when we exceeding the memlimit above 4 GB. However, this space has a size of 32 GB. Even when using many PCI device with large BARs like framebuffer or when using multiple PCI busses, it's very unlikely that we run out of space due to the large slop. Additionally, this situation will occur on startup and not at runtime which is much better. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33118 ---- bhyve: dynamically register FwCtl ports Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's FwCtl. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33496 ---- bhyve: Map the right BAR in init_msix_table() The PBA and MSI-X table can reside in different BARs. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: Correct unmapping of the MSI-X table BAR The starting address passed to mprotect was wrong, so in the case where the last page containing the table is not the last page of the BAR, the wrong region would be unmapped. Reported by: Andy Fiddaman <andy@omniosce.org> Reviewed by: jhb Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33739 ---- bhyve: add nvlist functions for setting unset nodes If an emulation uses those functions instead of set_config_value_node or set_config_value, it allows the config values to get overwritten. Introducing new functions is much more readable than if else statements in the emulation code. Reviewed by: khng MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33770 ---- bhyve: get mediasize for character devices when resizing virtio-blk Reviewed by: imp, allanjude, jhb Differential Revision: https://reviews.freebsd.org/D33403 ---- bhyve/snapshot: fix pthread_create() error check pthread_create() returns 0 on success or an error number on failure. Reviewed by: khng, markj Differential Revision: https://reviews.freebsd.org/D33930 ---- Append Keyboard Layout specified option for using VNC. Part two: Append bhyve -K option for specified keyboard layout with layout setting files every languages. Since the cmd option '-k' was used in the meantime it was changed to '-K' PR: 246121 Submitted by: koinec@yahoo.co.jp Reviewed by: grehan@ Differential Revision: https://reviews.freebsd.org/D29473 MFC after: 4 weeks ---- bhyve: ahci: Fix regression with no ports An AHCI controller may be specified with no connected ports. Avoid dumping core in this case for compatibility with existing VM configs. Reviewed by: khng, jhb Fixes: 621b5090487de Refactor configuration management in bhyve. MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D33969 ---- bhyve/block_if: allow DIOCGMEDIASIZE ioctl This is needed to get mediasize of the device after a resize event. I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which disables capsicum. Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...") Differential Revision: https://reviews.freebsd.org/D34013 ---- pkgbase: bhyve: Tag the kbdlayout file to be in the bhyve package ---- bhyve nvme: Advertise v1.4 support Bump advertised NVMe support from v1.3 to v1.4 Reviewed by: allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33564 ---- bhyve nvme: Fix NVM Format completion status The NVM Format command is unique among the Admin commands in that it needs to finish asynchronously. For this reason, the emulation code invented a synthetic completion status (NVME_NO_STATUS) to indicate that the command was still in progress and the command processing loop should not generate a completion message. The implementation used the value 0xffff for the synthetic value as this set both the Status Code and Status Code Type fields to reserved values. Format initialized the completion status to this value and expected error cases to override it with a status code/type appropriate to the situation. The macros used to set the NVMe status are careful not to modify bit 0 (i.e. the phase bit), which with the synthetic completion status, causes the phase bit to get out of sync. When running tests in a guest with illegal NVM Format commands, Admin commands would eventually hang because it appeared there were no completions due to the incorrect phase bit value. Fix is to only set NVME_NO_STATUS if the blockif delete command succeeds. While in the neighborhood, add a missing break statement when NVM Format is not supported. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33565 ---- bhyve nvme: Fix Namespace Specific Set Features Return an error if the feature specified in Set Features is Namespace specific but the Namespace ID uses the Global Namespace tag. Fixes UNH Test 1.2.7 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33566 ---- bhyve nvme: Implement Log Page Offset Modify the Get Log Page command to parse the Log Page Offset fields to support more recent versions of the NVMe specification. Fixes various tests for UNH Test 1.3.* Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33568 ---- bhyve nvme: Add missing Admin opcodes Don't treat unsupported Admin commands as Invalid Opcode. Instead return the proper Invalid Field in Command. Fixes UNH IOL test 1.17.2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33569 ---- bhyve nvme: Remove redundant AER Limit checks The NVMe emulation checked if the Asynchronous Event Request Limit (a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function is only called from nvme_opc_async_event_req() which also checks for exceeding the AERL. Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33570 ---- bhyve nvme: Fix Set Features Be more conservative and only support the Features mandatory for an I/O Controller. Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency Mode Configuration and Host Behavior Support features. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33571 ---- bhyve nvme: Add Temperature Threshold support This adds the ability for a guest OS to send Set / Get Feature, Temperature Threshold commands. The implementation assumes a constant temperature and will generate an Asynchronous Event Notification if the specified threshold is above/below this value. Although the specification allows 9 temperature values, this implementation only implements the Composite Temperature. While in the neighborhood, move the clear of the CSTS register in the reset function after all other cleanup. This avoids a race with the guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe emulation is actually complete with the reset. Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4. Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33572 ---- bhyve nvme: Update v1.4 Identify Controller data Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE). Also, do not advertise secure erase functionality in the Format NVM Attributes field of the Identify Controller data structure as the Controller does not implement secure erase. Fixes UNH ILO Test 1.1, Case 2 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33573 ---- bhyve nvme: Add Select support to Get Features Implement basic support for the SEL field of Get Features. This returns information about Namespace Specific features. Fixes UNH ILO 16.0 Test 1.2, Case 13 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33574 ---- bhyve nvme: Fix LBA out-of-range calculation The function which checks for a valid LBA range mistakenly named an input value as NLB ("Number of Logical Blocks") instead of "number of blocks". The NVMe specification defines NLB as a zero-based value (i.e. NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed parameter is a 1's-based value. Fix is to rename the variable to avoid future confusion. While in the neighborhood, also check that the starting LBA is less than the size of the backing storage to avoid an integer overflow. Reviewed by: imp, allanjude, jhb Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33575 ---- bhyve nvme: Fix reported VWC value v1.4 and later NVMe Controllers report "Flush all Namespaces" support differently. Fixes UNH IOL 16.0 Test 2.6, Case 3 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33576 ---- bhyve nvme: Fix Set Features, AEN NVMe Controllers which do not support Endurance Groups must return an error when the Endurance Group Event Aggregate Log Change Notices bit is set in Set Features, Asynchronous Event Configuration. Fixes UNH IOL Test 3.12, Case 8 Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33577 ---- bhyve nvme: Fix Identify Namespace, NSID=ffffffff If the NVMe Controller doesn't support Namespace Management, it should return "Invalid Namespace or Format" when the Host request Identify Namespace with the global NSID value. Fixes UNH IOL 16.0 Test 9.1, Case 6 Reviewed by: imp, allanjude Tested by: jason@tubnor.net MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D33578 ---- bhyve/virtio: use correct device id for virtio-scsi Section 4.1.2.1 of the virtio spec states that the transitional PCI device id for a scsi device is 0x1004. Fix suggested by reporter. PR: 259961 Reported by: me@nanaya.pro Reviewed by: imp, jhb Fixes: f9c005a17f4e ("Add bhyve virtio-scsi storage backend support.") Differential Revision: https://reviews.freebsd.org/D34103 ---- Create VM_MEMATTR_DEVICE on all architectures This is intended to be used with memory mapped IO, e.g. from bus_space_map with no flags, or pmap_mapdev. Use this new memory type in the map request configured by resource_init_map_request, and in pciconf. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29692 ---- Use if ... else when printing memory attributes In vmstat there is a switch statement that converts these attributes to a string. As some values can be duplicate we have to hide these from userspace. Replace this switch statement with an if ... else macro that lets us repeat values without a compiler error. Reviewed by: kib MFC after: 2 weeks Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D29703 ---- Remove an always-true check. This fixes a -Wtype-limits error from GCC 9. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D31936 ---- vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559 ---- vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448 ---- bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333
Explain what the two clashing regions are. Reivewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D29696 Pull Request: #463 (cherry picked from commit efec757)
Correctly set errno on failure and improve the related test. Previously the malloc test would emit an error message but not abort if the errno was not as expected on failure. This was because the return in the null == true case prevented the check for failed == true at the end of check_result from being reached. To resolve this just abort immediately as in the null case. Also add tests of allocations that are expected to fail for calloc and malloc. To make the tests pass we need to set errno in several places, making sure to keep this off the fast path. We must also take care not to attempt to zero nullptr in case of calloc failure. See microsoft/snmalloc#461 and microsoft/snmalloc#463.
Explain what the two clashing regions are.