Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: find new cloud provider for Solaris, Illumos builders? #15581

Open
bradfitz opened this issue May 6, 2016 · 18 comments

Comments

@bradfitz
Copy link
Member

commented May 6, 2016

Now that we have SmartOS builders on Joyent in a custom image using the buildlet, let's use the Joyent API and dynamically create the containers as needed. This should be done by creating a new pool type in x/build/cmd/coordinator, similar to the GCE, reverse, and Kubernetes pool types.

This will both be cheaper (run zero when we need zero), but also let us scale from 0 to dozens as needed and let us do sharded builds and let SmartOS be a trybot. (currently we just run 2 containers all the time)

I see lots of joyent stuff at https://godoc.org/?q=joyent

/cc @davecheney @4ad

@bradfitz bradfitz added the Builders label May 6, 2016
@bradfitz bradfitz added this to the Unreleased milestone May 6, 2016
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 6, 2016

Old bug: #9515

@zombiezen zombiezen removed their assignment Mar 17, 2017
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2017

I was just about to file this dup bug, forgetting I'd already filed it, so I'll copy the text I was about to post:


Currently the Joyent builders (GOOS=solaris, but really GOOS=illumos once #20603 happens) are statically created and the instances sit there idle most of the time, long polling the build coordinator for work. And when there is a burst of work, we can't process a burst, because we only have N instances.

That is, they use the buildlet's "reverse" mode, where the buildlets connect to farmer.golang.org and register themselves, rather than being dynamically created.

We currently have three implementations of the coordinator's BuildletPool interface,

  • dynamically create GCE VMs
  • dynamically create GKE containers
  • "reverse" (dedicated machines connected to the coordinator)

It's kinda a waste that we're paying for N static Joyent instances just to run in reverse mode, since Joyent can already quickly spin up containers.

We should implement a JoyentBuildletPool implementations of BuildletPool and implement the Joyent API.

Of course, if we could run illumos or OmniOS on GCE that would be more ideal from a less-code-to-write angle, but I don't think they run there yet.

I do see references to EC2 AMIs for illumos and OmniOS, so maybe writing an EC2BuidlletPool implementation of the BuildletPool interface is a better use of our time and could be used for other OSes that don't run on GCE's KVM.

In any case, the static reverse builder situation is not ideal.

/cc @adams-sarah @cybrcodr

@4ad

This comment has been minimized.

Copy link
Member

commented Aug 2, 2017

I do see references to EC2 AMIs for illumos and OmniOS

The future of OmniOS is uncertain: https://lists.omniti.com/pipermail/omnios-discuss/2017-April/008699.html

@bradfitz bradfitz added the new-builder label Nov 2, 2018
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 2, 2018

I just ran OmniOS-CE (the community edition) at home (omniosce-r151026u.iso, 7th of May, 2018) on KVM/QEMU and it works fine and passes all.bash.

It supports running under virtio-net but not virtio-scsi (that driver exists somewhere for ilumos, but it's not merged? or not in omnios-ce?). It does, however, support virtio-blk. But GCE doesn't support virtio-blk.

So we can't run OmniOS directly on GCE.

But because GCE now supports nested virtualization, we could do something slightly gross or lovely:

  • boot Linux on GCE that then runs KVM/QEMU to run OmniOS-CE using virtio-blk+virtio-net.

I think that's our best bet for Solaris scalable, trybots at this point. It's slightly tedious, but it stays within the GCP ecosystem we're already mostly using and where we have tons of quota, and the network is super fast, not leaving a building.

/cc @dmitshur

@bradfitz bradfitz added the OS-Solaris label Nov 2, 2018
@bradfitz bradfitz changed the title x/build: make joyent SmartOS solaris builders elastic, be trybots x/build: make Solaris trybots/gomotes, somehow Nov 2, 2018
@gopherbot

This comment has been minimized.

Copy link

commented Feb 15, 2019

Change https://golang.org/cl/162959 mentions this issue: dashboard, buildlet: add a disabled builder with nested virt, for testing

gopherbot pushed a commit to golang/build that referenced this issue Feb 15, 2019
…ting

This adds a linux-amd64 COS builder that should be just like our
existing linux-amd64 COS builder except that it's using a forked image
that has the VMX license bit enabled for nested virtualization. (GCE
appears to be using the license mechanism as some sort of opt-in
mechanism for features that aren't yet GA; might go away?)

Once this is in, it won't do any new builds as regular+trybot builders
are disabled. But it means I can then use gomote + debugnewvm to work
on preparing the other four image types.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ic55f17eea17908dba7f58618d8cd162a2ed9b015
Reviewed-on: https://go-review.googlesource.com/c/162959
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Feb 19, 2019

Change https://golang.org/cl/163057 mentions this issue: buildlet: change image name for COS-with-vmx buildlet

gopherbot pushed a commit to golang/build that referenced this issue Feb 19, 2019
The COS image I'd forked from earlier didn't have CONFIG_KVM or
CONFIG_KVM_INTEL enabled in its kernel, so even though I'd enabled the
VMX license bit for the VM, the kernel was unable to use it.

Now I've instead rebuilt the ChromiumOS "lakitu" board with a modified
kernel config:

   https://cloud.google.com/container-optimized-os/docs/how-to/building-from-open-source

More docs later. Still tinkering. Nothing uses this yet.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Id2839066e67d9ddda939d96c5f4287af3267a769
Reviewed-on: https://go-review.googlesource.com/c/163057
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Feb 21, 2019

Change https://golang.org/cl/163301 mentions this issue: env/linux-x86-vmx: add new Debian host that's like Container-Optimized OS + vmx

gopherbot pushed a commit to golang/build that referenced this issue Feb 21, 2019
…d OS + vmx

This adds scripts to create a new builder host image that acts like
Container-Optimized OS (has docker, runs konlet on startup) but with a
Debian 9 kernel + userspace that permits KVM for nested
virtualization.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ib1d3a250556703856083c222be2a70c4e8d91884
Reviewed-on: https://go-review.googlesource.com/c/163301
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@andybons

This comment has been minimized.

Copy link
Member

commented Sep 30, 2019

Joyent Public Cloud is closing down November 9, 2019.

@andybons andybons changed the title x/build: make Solaris trybots/gomotes, somehow x/build: move Solaris builders off Joyent due to EOL announcement Sep 30, 2019
@andybons andybons added the Soon label Sep 30, 2019
@bradfitz bradfitz added help wanted and removed Soon labels Oct 10, 2019
@gopherbot

This comment has been minimized.

Copy link

commented Oct 10, 2019

Change https://golang.org/cl/200219 mentions this issue: dashboard, cmd/coordinator: remove Joyent builders

gopherbot pushed a commit to golang/build that referenced this issue Oct 10, 2019
Joyent.com is shutting down their public cloud, so we no longer
have our GOOS=solaris or GOOS=illumos builders there.

Maybe somebody will find a new place to run them. Or maybe the ports
will be abandoned. We'll see.

Updates golang/go#15581

Change-Id: I0590227ce61b6b298b6aa4554e5e3bc9e4c464b5
Reviewed-on: https://go-review.googlesource.com/c/build/+/200219
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@bradfitz bradfitz changed the title x/build: move Solaris builders off Joyent due to EOL announcement x/build: find new cloud provider for Solaris, Illumos builders? Oct 10, 2019
@bcmills

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

We appear to no longer have any Illumos builders. Should we file an issue to remove/deprecate the port in 1.14?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 16, 2019

@jclulow, does it run on GCE yet? (virtio-scsi was WIP last I heard?)

@jclulow

This comment has been minimized.

Copy link

commented Oct 16, 2019

The Virtio SCSI support is still a WIP, but I'm circling back around to look at it. The other critical issue we had with GCE was this bug in the GCE hypervisor itself -- but I received notification that it's been fixed in the last week, so I'm going to try it out!

In the interim, I've seen people asking for some kind of key for a builder on the mailing list. If I can provide a zone similar to the one that was provided by Joyent, is that something I can get configured as a stop gap for this week?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 16, 2019

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

@jclulow

This comment has been minimized.

Copy link

commented Oct 16, 2019

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

Do you mean this one?

golang/build@b61ecd0

https://go-review.googlesource.com/c/build/+/200219

I'll have a look!

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 16, 2019

Yup.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 17, 2019

Change https://golang.org/cl/201597 mentions this issue: dashboard: add interim illumos builder

@jclulow

This comment has been minimized.

Copy link

commented Oct 17, 2019

On a Linux machine, I ran:

GOOS=illumos GOARCH=amd64 BOOTSTRAP_FORMAT=mintgz ./bootstrap.bash

I've made this available inside the zone:

[root@gobuild1 ~]# /opt/go/bootstrap/bin/go version
go version devel +dad616375f Wed Oct 16 18:27:16 2019 +0000 illumos/amd64

I also built a stage0 binary from cmd/buildlet/stage0 in the build repo, and I've run that under SMF in the zone with this environment:

"HOME": "/home/gobuild",
"GOROOT_BOOTSTRAP": "/opt/go/bootstrap",
"USER": "gobuild",
"LOGNAME": "gobuild",
"PATH": "/usr/bin:/usr/sbin:/sbin:/opt/local/bin:/opt/local/sbin:/opt/go/bootstrap/bin",
"TMPDIR": "/var/tmp",
"LANG": "en_US.UTF-8",

I was able to use curl to get the buildlet to unpack a tar of the Go source and build it in the work directory. Once I add GO_BUILDER_ENV=host-illumos-amd64-jclulow to the enviroment, the buildlet then wants the key:

stage0: 2019/10/17 00:53:34 bootstrap binary running
stage0: 2019/10/17 00:53:34 waiting for network.
stage0: 2019/10/17 00:53:34 network up after 300ms
stage0: 2019/10/17 00:53:34 downloading https://storage.googleapis.com/go-builder-data/buildlet.illumos-amd64 to ./buildlet.exe ...
stage0: 2019/10/17 00:53:34 downloaded ./buildlet.exe (14194957 bytes)
stage0: 2019/10/17 00:53:34 downloaded buildlet in 100ms
2019/10/17 00:53:34 buildlet starting.
2019/10/17 00:53:34 failed to find key for host-illumos-amd64-jclulow: cannot read key file "/home/gobuild/.gobuildkey-host-illumos-amd64-jclulow": open /home/gobuild/.gobuildkey-host-illumos-amd64-jclulow: no such file or directory
stage0: 2019/10/17 00:53:34 Error running buildlet: exit status 1
...

So I think this is all good to go, with the addition to the dashboard in the CL? I didn't put in a health check entry because it seems like that's just for infrastructure that's currently managed by the Go team.

Please let me know what to do next!

@gopherbot

This comment has been minimized.

Copy link

commented Oct 17, 2019

Change https://golang.org/cl/201740 mentions this issue: doc/go1.14.html: add some TODOs about various ports

gopherbot pushed a commit that referenced this issue Oct 17, 2019
Updates #15581
Updates #34368

Change-Id: Ife3be7ed484cbe87960bf972ac701954d86127d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/201740
Reviewed-by: Bryan C. Mills <bcmills@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Oct 17, 2019
While the work to make illumos a first class GCE guest is completed, use
this interim zone provided by an illumos community member to run illumos
builds.

Updates golang/go#15581

Change-Id: I1784847e5407894d01ce0aadf489b38d7e5c1924
Reviewed-on: https://go-review.googlesource.com/c/build/+/201597
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.