Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: find new cloud provider for Solaris, Illumos builders? #15581

Open
bradfitz opened this issue May 6, 2016 · 31 comments
Open

x/build: find new cloud provider for Solaris, Illumos builders? #15581

bradfitz opened this issue May 6, 2016 · 31 comments

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented May 6, 2016

Now that we have SmartOS builders on Joyent in a custom image using the buildlet, let's use the Joyent API and dynamically create the containers as needed. This should be done by creating a new pool type in x/build/cmd/coordinator, similar to the GCE, reverse, and Kubernetes pool types.

This will both be cheaper (run zero when we need zero), but also let us scale from 0 to dozens as needed and let us do sharded builds and let SmartOS be a trybot. (currently we just run 2 containers all the time)

I see lots of joyent stuff at https://godoc.org/?q=joyent

/cc @davecheney @4ad

@bradfitz bradfitz added the Builders label May 6, 2016
@bradfitz bradfitz added this to the Unreleased milestone May 6, 2016
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented May 6, 2016

Old bug: #9515

@zombiezen zombiezen removed their assignment Mar 17, 2017
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Aug 2, 2017

I was just about to file this dup bug, forgetting I'd already filed it, so I'll copy the text I was about to post:


Currently the Joyent builders (GOOS=solaris, but really GOOS=illumos once #20603 happens) are statically created and the instances sit there idle most of the time, long polling the build coordinator for work. And when there is a burst of work, we can't process a burst, because we only have N instances.

That is, they use the buildlet's "reverse" mode, where the buildlets connect to farmer.golang.org and register themselves, rather than being dynamically created.

We currently have three implementations of the coordinator's BuildletPool interface,

  • dynamically create GCE VMs
  • dynamically create GKE containers
  • "reverse" (dedicated machines connected to the coordinator)

It's kinda a waste that we're paying for N static Joyent instances just to run in reverse mode, since Joyent can already quickly spin up containers.

We should implement a JoyentBuildletPool implementations of BuildletPool and implement the Joyent API.

Of course, if we could run illumos or OmniOS on GCE that would be more ideal from a less-code-to-write angle, but I don't think they run there yet.

I do see references to EC2 AMIs for illumos and OmniOS, so maybe writing an EC2BuidlletPool implementation of the BuildletPool interface is a better use of our time and could be used for other OSes that don't run on GCE's KVM.

In any case, the static reverse builder situation is not ideal.

/cc @adams-sarah @cybrcodr

@4ad
Copy link
Member

@4ad 4ad commented Aug 2, 2017

I do see references to EC2 AMIs for illumos and OmniOS

The future of OmniOS is uncertain: https://lists.omniti.com/pipermail/omnios-discuss/2017-April/008699.html

@bradfitz bradfitz added the new-builder label Nov 2, 2018
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Nov 2, 2018

I just ran OmniOS-CE (the community edition) at home (omniosce-r151026u.iso, 7th of May, 2018) on KVM/QEMU and it works fine and passes all.bash.

It supports running under virtio-net but not virtio-scsi (that driver exists somewhere for ilumos, but it's not merged? or not in omnios-ce?). It does, however, support virtio-blk. But GCE doesn't support virtio-blk.

So we can't run OmniOS directly on GCE.

But because GCE now supports nested virtualization, we could do something slightly gross or lovely:

  • boot Linux on GCE that then runs KVM/QEMU to run OmniOS-CE using virtio-blk+virtio-net.

I think that's our best bet for Solaris scalable, trybots at this point. It's slightly tedious, but it stays within the GCP ecosystem we're already mostly using and where we have tons of quota, and the network is super fast, not leaving a building.

/cc @dmitshur

@bradfitz bradfitz added the OS-Solaris label Nov 2, 2018
@bradfitz bradfitz changed the title x/build: make joyent SmartOS solaris builders elastic, be trybots x/build: make Solaris trybots/gomotes, somehow Nov 2, 2018
@gopherbot
Copy link

@gopherbot gopherbot commented Feb 15, 2019

Change https://golang.org/cl/162959 mentions this issue: dashboard, buildlet: add a disabled builder with nested virt, for testing

gopherbot pushed a commit to golang/build that referenced this issue Feb 15, 2019
…ting

This adds a linux-amd64 COS builder that should be just like our
existing linux-amd64 COS builder except that it's using a forked image
that has the VMX license bit enabled for nested virtualization. (GCE
appears to be using the license mechanism as some sort of opt-in
mechanism for features that aren't yet GA; might go away?)

Once this is in, it won't do any new builds as regular+trybot builders
are disabled. But it means I can then use gomote + debugnewvm to work
on preparing the other four image types.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ic55f17eea17908dba7f58618d8cd162a2ed9b015
Reviewed-on: https://go-review.googlesource.com/c/162959
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Feb 19, 2019

Change https://golang.org/cl/163057 mentions this issue: buildlet: change image name for COS-with-vmx buildlet

gopherbot pushed a commit to golang/build that referenced this issue Feb 19, 2019
The COS image I'd forked from earlier didn't have CONFIG_KVM or
CONFIG_KVM_INTEL enabled in its kernel, so even though I'd enabled the
VMX license bit for the VM, the kernel was unable to use it.

Now I've instead rebuilt the ChromiumOS "lakitu" board with a modified
kernel config:

   https://cloud.google.com/container-optimized-os/docs/how-to/building-from-open-source

More docs later. Still tinkering. Nothing uses this yet.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Id2839066e67d9ddda939d96c5f4287af3267a769
Reviewed-on: https://go-review.googlesource.com/c/163057
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Feb 21, 2019

Change https://golang.org/cl/163301 mentions this issue: env/linux-x86-vmx: add new Debian host that's like Container-Optimized OS + vmx

gopherbot pushed a commit to golang/build that referenced this issue Feb 21, 2019
…d OS + vmx

This adds scripts to create a new builder host image that acts like
Container-Optimized OS (has docker, runs konlet on startup) but with a
Debian 9 kernel + userspace that permits KVM for nested
virtualization.

Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)

Change-Id: Ib1d3a250556703856083c222be2a70c4e8d91884
Reviewed-on: https://go-review.googlesource.com/c/163301
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@andybons
Copy link
Member

@andybons andybons commented Sep 30, 2019

Joyent Public Cloud is closing down November 9, 2019.

@andybons andybons changed the title x/build: make Solaris trybots/gomotes, somehow x/build: move Solaris builders off Joyent due to EOL announcement Sep 30, 2019
@andybons andybons added the Soon label Sep 30, 2019
@bradfitz bradfitz added help wanted and removed Soon labels Oct 10, 2019
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 10, 2019

Change https://golang.org/cl/200219 mentions this issue: dashboard, cmd/coordinator: remove Joyent builders

gopherbot pushed a commit to golang/build that referenced this issue Oct 10, 2019
Joyent.com is shutting down their public cloud, so we no longer
have our GOOS=solaris or GOOS=illumos builders there.

Maybe somebody will find a new place to run them. Or maybe the ports
will be abandoned. We'll see.

Updates golang/go#15581

Change-Id: I0590227ce61b6b298b6aa4554e5e3bc9e4c464b5
Reviewed-on: https://go-review.googlesource.com/c/build/+/200219
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@bradfitz bradfitz changed the title x/build: move Solaris builders off Joyent due to EOL announcement x/build: find new cloud provider for Solaris, Illumos builders? Oct 10, 2019
@bcmills
Copy link
Member

@bcmills bcmills commented Oct 16, 2019

We appear to no longer have any Illumos builders. Should we file an issue to remove/deprecate the port in 1.14?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Oct 16, 2019

@jclulow, does it run on GCE yet? (virtio-scsi was WIP last I heard?)

@jclulow
Copy link
Contributor

@jclulow jclulow commented Oct 16, 2019

The Virtio SCSI support is still a WIP, but I'm circling back around to look at it. The other critical issue we had with GCE was this bug in the GCE hypervisor itself -- but I received notification that it's been fixed in the last week, so I'm going to try it out!

In the interim, I've seen people asking for some kind of key for a builder on the mailing list. If I can provide a zone similar to the one that was provided by Joyent, is that something I can get configured as a stop gap for this week?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Oct 16, 2019

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

@jclulow
Copy link
Contributor

@jclulow jclulow commented Oct 16, 2019

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

Do you mean this one?

golang/build@b61ecd0

https://go-review.googlesource.com/c/build/+/200219

I'll have a look!

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Oct 16, 2019

Yup.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 17, 2019

Change https://golang.org/cl/201597 mentions this issue: dashboard: add interim illumos builder

@jclulow
Copy link
Contributor

@jclulow jclulow commented Oct 17, 2019

On a Linux machine, I ran:

GOOS=illumos GOARCH=amd64 BOOTSTRAP_FORMAT=mintgz ./bootstrap.bash

I've made this available inside the zone:

[root@gobuild1 ~]# /opt/go/bootstrap/bin/go version
go version devel +dad616375f Wed Oct 16 18:27:16 2019 +0000 illumos/amd64

I also built a stage0 binary from cmd/buildlet/stage0 in the build repo, and I've run that under SMF in the zone with this environment:

"HOME": "/home/gobuild",
"GOROOT_BOOTSTRAP": "/opt/go/bootstrap",
"USER": "gobuild",
"LOGNAME": "gobuild",
"PATH": "/usr/bin:/usr/sbin:/sbin:/opt/local/bin:/opt/local/sbin:/opt/go/bootstrap/bin",
"TMPDIR": "/var/tmp",
"LANG": "en_US.UTF-8",

I was able to use curl to get the buildlet to unpack a tar of the Go source and build it in the work directory. Once I add GO_BUILDER_ENV=host-illumos-amd64-jclulow to the enviroment, the buildlet then wants the key:

stage0: 2019/10/17 00:53:34 bootstrap binary running
stage0: 2019/10/17 00:53:34 waiting for network.
stage0: 2019/10/17 00:53:34 network up after 300ms
stage0: 2019/10/17 00:53:34 downloading https://storage.googleapis.com/go-builder-data/buildlet.illumos-amd64 to ./buildlet.exe ...
stage0: 2019/10/17 00:53:34 downloaded ./buildlet.exe (14194957 bytes)
stage0: 2019/10/17 00:53:34 downloaded buildlet in 100ms
2019/10/17 00:53:34 buildlet starting.
2019/10/17 00:53:34 failed to find key for host-illumos-amd64-jclulow: cannot read key file "/home/gobuild/.gobuildkey-host-illumos-amd64-jclulow": open /home/gobuild/.gobuildkey-host-illumos-amd64-jclulow: no such file or directory
stage0: 2019/10/17 00:53:34 Error running buildlet: exit status 1
...

So I think this is all good to go, with the addition to the dashboard in the CL? I didn't put in a health check entry because it seems like that's just for infrastructure that's currently managed by the Go team.

Please let me know what to do next!

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 17, 2019

Change https://golang.org/cl/201740 mentions this issue: doc/go1.14.html: add some TODOs about various ports

gopherbot pushed a commit that referenced this issue Oct 17, 2019
Updates #15581
Updates #34368

Change-Id: Ife3be7ed484cbe87960bf972ac701954d86127d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/201740
Reviewed-by: Bryan C. Mills <bcmills@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Oct 17, 2019
While the work to make illumos a first class GCE guest is completed, use
this interim zone provided by an illumos community member to run illumos
builds.

Updates golang/go#15581

Change-Id: I1784847e5407894d01ce0aadf489b38d7e5c1924
Reviewed-on: https://go-review.googlesource.com/c/build/+/201597
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@nwilkens
Copy link

@nwilkens nwilkens commented Oct 21, 2019

I'd be happy to sponsor build hosts at https://mnx.io to offset the JPC EOL issues. Feel free to discuss needs here, or directly via email nick @ mnx io.

@rorth
Copy link

@rorth rorth commented Nov 6, 2019

I just noticed the Solaris comment here, which is mostly wrong: while
Shawn Walker has left Oracle, the builder never ran at an Oracle site, but on on a system maintained by me.

Oracle Solaris certainly still is maintained (e.g. I'm running current betas), and I've sort of taken
over maintaining the builder.

My primary interest is to get early warning when upstream golang changes break the Solaris
support, but I'm the GCC Solaris maintainer with an interest in keeping gccgo working. Ian has
access to several Solaris systems at our site to investigate issues if necessary.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Nov 6, 2019

@rorth, great! I'll update our notes.

Can you provide any more info about the machine/VM specs and its OS version?

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 6, 2019

Change https://golang.org/cl/205600 mentions this issue: dashboard: update Solaris owner

gopherbot pushed a commit to golang/build that referenced this issue Nov 6, 2019
Updates golang/go#15581

Change-Id: Idae332b234d18f0bd90eb354f611fd5a824feb85
Reviewed-on: https://go-review.googlesource.com/c/build/+/205600
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@rorth
Copy link

@rorth rorth commented Nov 6, 2019

codebien added a commit to codebien/build that referenced this issue Nov 13, 2019
Joyent.com is shutting down their public cloud, so we no longer
have our GOOS=solaris or GOOS=illumos builders there.

Maybe somebody will find a new place to run them. Or maybe the ports
will be abandoned. We'll see.

Updates golang/go#15581

Change-Id: I0590227ce61b6b298b6aa4554e5e3bc9e4c464b5
Reviewed-on: https://go-review.googlesource.com/c/build/+/200219
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
codebien added a commit to codebien/build that referenced this issue Nov 13, 2019
While the work to make illumos a first class GCE guest is completed, use
this interim zone provided by an illumos community member to run illumos
builds.

Updates golang/go#15581

Change-Id: I1784847e5407894d01ce0aadf489b38d7e5c1924
Reviewed-on: https://go-review.googlesource.com/c/build/+/201597
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
codebien added a commit to codebien/build that referenced this issue Nov 13, 2019
Updates golang/go#15581

Change-Id: Idae332b234d18f0bd90eb354f611fd5a824feb85
Reviewed-on: https://go-review.googlesource.com/c/build/+/205600
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 3, 2020

@rorth One of the outstanding TODOs in the Go 1.14 release notes is:

TODO: announce something about the Go Solaris port? Solaris itself is unmaintained? The builder is still running at Oracle, but the employee who set it up left the company and we have no way to maintain it.

From https://build.golang.org/, I see the solaris-amd64-oraclerel builder is passing for main Go repository (on tip, release-branch.go1.13 and release-branch.go1.12) and golang.org/x repos (also on tip, release-branch.go1.13 and release-branch.go1.12).

Based on your #15581 (comment) above, it sounds to me that we can resolve that TODO by not saying anything about Solaris in the Go 1.14 release notes. Does that sound right to you, or do you think we should say something about Solaris itself not being maintained (I'm not very familiar with its state)? Would you mind sending a CL to doc/go1.14.html to address that TODO? Thank you.

Edit: I've sent CL 217738.

/cc @toothrot @cagedmantis @golang/osp-team

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 4, 2020

@jclulow Thanks for adding the interim illumos-amd64 builder in CL 201597. An outstanding TODO in the Go 1.14 release notes is:

TODO: is Illumos up with a builder and passing? https://golang.org/issue/15581.

I've checked, and it is up and passing on Go tip and release-branch.go1.13 (with one failure that appears to be flaky due to being out of memory). It's also passing on all golang.org/x repos on tip and release-branch.go1.13.

Would you like to send a CL to update the release notes, resolving that TODO?

Edit: I've sent CL 217737.

/cc @cagedmantis @toothrot @golang/osp-team

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 4, 2020

Change https://golang.org/cl/217737 mentions this issue: doc/go1.14: remove TODO about Illumos port

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 4, 2020

Change https://golang.org/cl/217738 mentions this issue: doc/go1.14: remove TODO about Solaris port

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 4, 2020

@rorth I've sent CL 217738 that implements what I described in #15581 (comment). Please take a look if you can. Also, is there an email we can use to reach you (either directly, or for Gerrit code reviews)? Thank you.

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 4, 2020

@jclulow I've sent CL 217737 for #15581 (comment) and added you as a reviewer.

gopherbot pushed a commit that referenced this issue Feb 4, 2020
There is an active builder that was added in CL 201597,
and it is passing on Go tip and release-branch.go1.13
(with one failure that appears to be flaky due to being
out of memory). It's also passing on all golang.org/x repos
on tip and release-branch.go1.13. It's not configured to
run on Go 1.12 release branches.

Updates #36878
Updates #15581

Change-Id: I4ed7fc62c11a09743832fca39bd61fa0cf6e7ded
Reviewed-on: https://go-review.googlesource.com/c/go/+/217737
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit that referenced this issue Feb 4, 2020
The solaris-amd64-oraclerel builder is passing for the main Go repo
(on tip and release branches for 1.13 and 1.12), and golang.org/x repos
(also on tip and release branches for 1.13 and 1.12).

The builder is still maintained as described at
https://golang.org/issue/15581#issuecomment-550368581.

Updates #36878
Updates #15581

Change-Id: Icc6f7529ca2e05bb34f09ce4363d9582e80829c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/217738
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@rorth
Copy link

@rorth rorth commented Feb 6, 2020

@rorth
Copy link

@rorth rorth commented Feb 6, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.