Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: run DragonflyBSD VMs on GCE? #23060

Open
bradfitz opened this issue Dec 8, 2017 · 23 comments
Open

x/build: run DragonflyBSD VMs on GCE? #23060

bradfitz opened this issue Dec 8, 2017 · 23 comments

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 8, 2017

Looks like Dragonfly now supports virtio:

https://leaf.dragonflybsd.org/cgi/web-man?command=virtio&section=4

So it should run on GCE?

If somebody could prepare make.bash scripts to script the install to prepare bootable images, we could run it on GCE.

See the netbsd, openbsd, and freebsd directories as examples: https://github.com/golang/build/tree/master/env

(The script must run on Linux and use qemu to do the image creation.)

/cc @tdfbsd

@gopherbot gopherbot added this to the Unreleased milestone Dec 8, 2017
@gopherbot gopherbot added the Builders label Dec 8, 2017
@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Dec 9, 2017

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Dec 9, 2017

In that thread, @rickard-von-essen says:

I have a working packer build of DragonFly BSD https://github.com/boxcutter/bsd.

The most interesting parts are the boot_command
https://github.com/boxcutter/bsd/blob/master/dragonflybsd.json#L5
and actual installer script https://github.com/boxcutter/bsd/blob/master/http/install.sh.dfly

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Aug 7, 2018

/cc @dmitshur

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Nov 2, 2018

Update: I just ran Dragonfly (5.2.2) at home on QEMU/KVM with virtio-scsi and virtio net and it works fine.

So it should work fine on GCE, of course (which we already heard).

At this point I'm thinking we should just do this builder "by hand" for now, with a readme file with notes. I'll prepare the image by hand, then shut it down and copy its disk to a GCE image. (uploading it as a sparse tarball)

We can automate it with expect or whatnot later. Perfect is the enemy of good, etc.

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Nov 2, 2018

I shut down my KVM/QEMU instance, copied its disk to a new GCE image, and created a GCE VM. It kernel panics on boot (over serial) with:

panic() at panic+0x236 0xffffffff805f8666 
panic() at panic+0x236 0xffffffff805f8666 
vfs_mountroot() at vfs_mountroot+0xfe 0xffffffff80672c7e 
mi_startup() at mi_startup+0x84 0xffffffff805c2a64 
Debugger("panic")
CPU0 stopping CPUs: 0x0000000e
 stopped
Stopped at      Debugger+0x7c:  movb    $0,0xe67a49(%rip)
db> 

So, uh, not as easy as I'd hoped.

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Nov 2, 2018

Perhaps if we already have to do the whole double virtualization thing for Solaris (#15581 (comment)) anyway, we could just reuse that mechanism to run Dragonfly in qemu/kvm under GCE.

@cnst

This comment has been minimized.

Copy link

@cnst cnst commented Dec 16, 2018

I've tried working on this earlier this year (back in 2018-02), and had it scripted to make the image automatically, but I had the same issue that it'd work on my machines with vanilla QEMU just fine, including with the disk being accessible on DFly through DragonFly's vtscsi(4) with a local QEMU as per the QEMU configuration magic described over at http://wiki.netbsd.org/tutorials/how_to_setup_virtio_scsi_with_qemu/, but it still wouldn't work on GCE with GCE's virtio_scsi. Is there any info on how GCE's virtio_scsi different from QEMU's virtio_scsi?

I've also tried running DragonFly BSD side by side with FreeBSD with CAMDEBUG, but it didn't seem to reveal anything obvious, although the underlying CAM logic does seem to be quite different, so, it's probably the one to blame. I didn't run out of ideas, but did ran out of time back in February, and recently my GCE credits ran out as well.

Nested virtualisation sounds interesting. Does it require Linux on GCE, or would FreeBSD also work?

@tuxillo

This comment has been minimized.

Copy link

@tuxillo tuxillo commented Feb 14, 2019

@cnst do you have instructions on how you tried DragonFly on GCE?

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 15, 2019

Change https://golang.org/cl/162959 mentions this issue: dashboard, buildlet: add a disabled builder with nested virt, for testing

gopherbot pushed a commit to golang/build that referenced this issue Feb 15, 2019
…ting

This adds a linux-amd64 COS builder that should be just like our
existing linux-amd64 COS builder except that it's using a forked image
that has the VMX license bit enabled for nested virtualization. (GCE
appears to be using the license mechanism as some sort of opt-in
mechanism for features that aren't yet GA; might go away?)

Once this is in, it won't do any new builds as regular+trybot builders
are disabled. But it means I can then use gomote + debugnewvm to work
on preparing the other four image types.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ic55f17eea17908dba7f58618d8cd162a2ed9b015
Reviewed-on: https://go-review.googlesource.com/c/162959
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@tuxillo

This comment has been minimized.

Copy link

@tuxillo tuxillo commented Feb 17, 2019

I've tried myself and it seems DragonFly is unable to find the disk.
We're working on it already: https://bugs.dragonflybsd.org/issues/3175

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 19, 2019

Change https://golang.org/cl/163057 mentions this issue: buildlet: change image name for COS-with-vmx buildlet

gopherbot pushed a commit to golang/build that referenced this issue Feb 19, 2019
The COS image I'd forked from earlier didn't have CONFIG_KVM or
CONFIG_KVM_INTEL enabled in its kernel, so even though I'd enabled the
VMX license bit for the VM, the kernel was unable to use it.

Now I've instead rebuilt the ChromiumOS "lakitu" board with a modified
kernel config:

   https://cloud.google.com/container-optimized-os/docs/how-to/building-from-open-source

More docs later. Still tinkering. Nothing uses this yet.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Id2839066e67d9ddda939d96c5f4287af3267a769
Reviewed-on: https://go-review.googlesource.com/c/163057
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 21, 2019

Change https://golang.org/cl/163301 mentions this issue: env/linux-x86-vmx: add new Debian host that's like Container-Optimized OS + vmx

gopherbot pushed a commit to golang/build that referenced this issue Feb 21, 2019
…d OS + vmx

This adds scripts to create a new builder host image that acts like
Container-Optimized OS (has docker, runs konlet on startup) but with a
Debian 9 kernel + userspace that permits KVM for nested
virtualization.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ib1d3a250556703856083c222be2a70c4e8d91884
Reviewed-on: https://go-review.googlesource.com/c/163301
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 21, 2019

Change https://golang.org/cl/202478 mentions this issue: dashboard: update Dragonfly tip policy for ABI change, add release builder

gopherbot pushed a commit to golang/build that referenced this issue Oct 21, 2019
…ilder

From golang/go#34958 (comment) :

> Go's DragonFly support policy is that we support the latest stable
> release primarily, but also try to keep DragonFly master passing, in
> prep for it to become the latest stable release.
>
> But that does mean we need one more builder at the moment.

Updates golang/go#34958
Updates golang/go#23060

Change-Id: I84be7c64eac593dee2252c397f9529deea13605a
Reviewed-on: https://go-review.googlesource.com/c/build/+/202478
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Oct 21, 2019

@tuxillo, looks like no progress on that bug, eh?

@tuxillo

This comment has been minimized.

Copy link

@tuxillo tuxillo commented Oct 21, 2019

Thanks for the reminder, I kind of forgot about this one. It's being a tough one anyways. I'll check with the team again next week to see if we could do something.

@cnst

This comment has been minimized.

Copy link

@cnst cnst commented Oct 21, 2019

@bradfitz I have some time to work on it again, but my credits expired, and trying to signup for a new account required some sort of an extra verification. Is there a way to get the credits again to work on this? Also, is there any way to reproduce this bug outside of Google environment? As per my 2018 comments, our driver works just fine in regular KVM using NetBSD's instructions for activating the codepath.

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Oct 21, 2019

GCP has a Free Tier these days:
https://cloud.google.com/free/

COMPUTE

Compute Engine
1
F1-micro instance per month

Scalable, high-performance virtual machines.

1 f1-micro instance per month (US regions only — excluding Northern Virginia [us-east4])
30 GB-months HDD
5 GB-months snapshot in select regions
1 GB network egress from North America to all region destinations per month (excluding China and Australia)

There's no way to reproduce it locally. GCP uses KVM but doesn't use QEMU and its implementation of virtio-scsi etc isn't open source.

@cnst

This comment has been minimized.

Copy link

@cnst cnst commented Oct 21, 2019

@bradfitz How long does it take recompile the kernel on this free instance? A few hours? It was already taking too long even on non-micro GCP instances compared to 15-year old hardware.

I think it'd be great if there was a way to reproduce this problem locally, because our virtio-scsi drivers work just fine with anything but the proprietary GCP implementation.

Would it be helpful to provide automation for any other cloud provider?

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Oct 21, 2019

@cnst, I didn't imagine you'd be using the f1-micro installation for compilations. I thought you'd use your normal development environment to build and then use the f1-micro to test boot them on GCE until it worked.

@tuxillo

This comment has been minimized.

Copy link

@tuxillo tuxillo commented Oct 23, 2019

@cnst what I did in my tests was to download the latest IMG, mount null it, build kernel with modifications and install it in the mountpoint. Then I used gcloud/gsutil to upload the img and create the disk and the instance. You can retrieve the console output with gcloud iirc.

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Oct 23, 2019

because our virtio-scsi drivers work just fine with anything but the proprietary GCP implementation.

FWIW, Go runs the following operating systems on GCP that all work with Google's virtio-scsi implementation:

  • Windows
  • Linux
  • OpenBSD
  • FreeBSD
  • NetBSD
  • Plan 9

Either Dragonfly has a bug, or all those operating systems have worked around bugs in Google's implementation. Or both.

@tuxillo

This comment has been minimized.

Copy link

@tuxillo tuxillo commented Nov 7, 2019

Just to give a quick update, we've done some steps in the right direction to fix this. At least the VM now sees the disk but further changes and testing are needed. I'll update this with more information as soonn as we have it.

da0 at vtscsi0 bus 0 target 1 lun 0
da0: <Google PersistentDisk 1> Fixed Direct Access SCSI-6 device 
da0: Serial Number                     
da0: 300.000MB/s transfers
da0: Command Queueing Enabled
da0: 2048MB (4194304 512 byte sectors: 255H 63S/T 261C)

@bradfitz

This comment has been minimized.

Copy link
Contributor Author

@bradfitz bradfitz commented Nov 7, 2019

Great!

codebien added a commit to codebien/build that referenced this issue Nov 13, 2019
…ilder

From golang/go#34958 (comment) :

> Go's DragonFly support policy is that we support the latest stable
> release primarily, but also try to keep DragonFly master passing, in
> prep for it to become the latest stable release.
>
> But that does mean we need one more builder at the moment.

Updates golang/go#34958
Updates golang/go#23060

Change-Id: I84be7c64eac593dee2252c397f9529deea13605a
Reviewed-on: https://go-review.googlesource.com/c/build/+/202478
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.