Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add LUCI linux-ppc64le-power10 builder #64660

Closed
pmur opened this issue Dec 11, 2023 · 23 comments
Closed

x/build: add LUCI linux-ppc64le-power10 builder #64660

pmur opened this issue Dec 11, 2023 · 23 comments
Assignees
Labels
arch-ppc64x Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Milestone

Comments

@pmur
Copy link
Contributor

pmur commented Dec 11, 2023

This will be a builder for GOPPC64=power10 GOARCH=ppc64le GOOS=linux.

The hostname should be linux-ppc64le-power10. The csr is attached: linux-ppc64le-power10.csr.txt

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Dec 11, 2023
@gopherbot gopherbot added this to the Unreleased milestone Dec 11, 2023
@prattmic prattmic self-assigned this Dec 12, 2023
@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 12, 2023
@prattmic
Copy link
Member

prattmic commented Dec 12, 2023

Right now there is a linux-ppc64le builder backed by the linux-ppc64le-power8 bots.

Am I understanding correctly that this is different hardware and we actually want to rename the linux-ppc64le builder to linux-ppc64le-power8 backed only by linux-ppc64le-power8 bots, and add a new linux-ppc64le-power10 builder backed by the linux-ppc64le-power10 bots?

The way things are configured right now, if we simply add the linux-ppc64le-power10 bots they will get mixed together with the power8 bots in the linux-ppc64le builder because both will report themselves as GOOS=linux, GOARCH=ppc64le.

@prattmic
Copy link
Member

If these are same hardware (all POWER 10), then the different would be a matter of setting GOPPC64 env var when building. If they are different hardware, then we'll need to adjust LUCI to be able to determine which version of POWER the current machine is.

@dmitshur dmitshur added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Dec 12, 2023
@pmur
Copy link
Contributor Author

pmur commented Dec 12, 2023

The linux/ppc64{,le} builders are backed by physically different hardware as implied by the hostname. The intent is to suss out ISA violations.

@prattmic I think the answer to your question is yes.

@prattmic
Copy link
Member

prattmic commented Dec 12, 2023

Thanks for the confirmation, that's what I was expecting.

LUCI uses "dimensions" reported by the bot running on the machine to determine which bots are used for which builders. Right now, the linux-ppc64le builder uses os=Linux and cpu=ppc64le as the matching dimensions. There is no preexisting dimension that reports POWER 8 vs 10, so we will need to extend the bot to report on that.

I think the right way to do this is to extend the cpu dimension. The source for this dimension is https://source.chromium.org/chromium/infra/infra/+/main:luci/appengine/swarming/swarming_bot/api/os_utilities.py;l=349. Note that dimensions can have multiple values. So today these machines report ppc64le and ppc64le-64. I think we should also extend them to report something like ppc64le-64-power8 (perhaps leave out the superfluous middle 64).

@pmur could you tell us the best way to determine whether a machine is POWER 8 vs 10 so we can modify the bot? Or if you are willing to navigate the LUCI contribution process and send a CL directly, that would be appreciated.

Given that there are ppc64 ports on Linux, OpenBSD, and AIX, it seems that ideally this detection would be portable across all OSes, though today I believe that only Linux has builders for different ISAs.

@pmur
Copy link
Contributor Author

pmur commented Dec 13, 2023

I will need it to differentiate between power8/9/10, eventually. Unfortunately, there is no common way to do this. It will take me awhile to get internal approval to send patches to LUCI, I'll start the process. It will be quicker if google can patch it. Here are some notes:

On linux, we match the most recent AT_HWCAP2 ISA support bit set (e.g 3.1 is Power10, 3.00 is Power9, 2.07 is Power8).

On the other platforms, I think it is OK to assume power8 as is done for openbsd/aix today. Though, both of those builders are running power9 hardware.

@pmur
Copy link
Contributor Author

pmur commented Dec 13, 2023

@prattmic looking at the repo, I think something like the following might do what we want:

--- a/appengine/swarming/swarming_bot/api/platforms/linux.py
+++ b/appengine/swarming/swarming_bot/api/platforms/linux.py
@@ -206,6 +206,8 @@ def get_cpuinfo():
     # MIPS.
     cpu_info['flags'] = values['isa']
     cpu_info['name'] = values['cpu model']
+  elif "POWER" in values.get('cpu',''):
+    cpu_info['name'] = re.search('POWER[0-9]+', values.get('cpu','')).group(0)
   elif values:
     # CPU implementer == 0x41 means ARM.
     if 'Features' in values:

Reading /proc/cpuinfo isn't always perfect, but probably sufficient. Here are the relevant portions from the PPC64 builders (endian doesn't seem to matter):

processor	: 0
cpu		: POWER10 (architected), altivec supported
clock		: 2750.000000MHz
revision	: 2.0 (pvr 0080 0200)

processor	: 0
cpu		: POWER9 (architected), altivec supported
clock		: 2200.000000MHz
revision	: 2.2 (pvr 004e 1202)

processor	: 0
cpu		: POWER8 (architected), altivec supported
clock		: 3425.000000MHz
revision	: 2.1 (pvr 004b 0201)

@pmur
Copy link
Contributor Author

pmur commented Jan 8, 2024

I think my google CLA from IBM also covers that project too, so I went ahead and opened a CL there .

@prattmic I think this change should populate the cpu field with something like POWER8 which can be translated into passing GOPPC64=power8 to the builder. Though, I am not sure where that should happen.

@pmur
Copy link
Contributor Author

pmur commented Jan 31, 2024

@prattmic do you know when we will be able to verify the change to detect cpu on ppc will roll out to luci?

@prattmic
Copy link
Member

The latest CLs to the swarming bot haven't been released yet, I'll update here when they are.

@pmur
Copy link
Contributor Author

pmur commented Feb 22, 2024

The change seems to be in. I see ppc64le | ppc64le-64 | ppc64le-64-POWER8 being reported. What needs updated to set GOPPC64?

@prattmic
Copy link
Member

I think we want to do two things:

  1. In dimensions_of, have the host_type linux-ppc64le-power8 translate to adding dim["cpu"] = "ppc64le-64-POWER8" in order to match the builder to the right hosts. Similar for power10.
  2. Separately, in define_builder, add GOPPC64 values for the appropriate builders.

(This is assuming linux-ppc64le builder is renamed to linux-ppc64le-power8. If we want to keep the original name, I believe we can use DEFAULT_HOST_SUFFIX to translate linux-ppc64le to linux-ppc64le-power8.)

cc @mknyszek to check that this sounds reasonable.

@pmur
Copy link
Contributor Author

pmur commented Feb 22, 2024

This change should also be made for linux-ppc64-* too. Renaming the builders is OK with me, will that also require regenerating the certs too?

@mknyszek
Copy link
Contributor

The naming convention right now would result in interpreting power8 in linux-ppc64le-power8 as a run mod instead of a property of the host. Elsewhere, we delineate that with an underscore after GOOS/GOARCH (so linux-ppc64le_power8). That small note aside, that makes sense to me.

(However, I feel like we also discussed elsewhere that "power8" is a bit weird here in that you can run different ppc64 versions on different hardware, making it closer to a run mod in spirit. I don't remember where we landed on that. The way you've phrased it here, as being a property of the host, makes me think we should use the underscore.)

@pmur
Copy link
Contributor Author

pmur commented Mar 6, 2024

Ping. I think the linux-ppc64le bot is already named linux-ppc64le-power8. Are we suggesting to change that to linux-ppc64le_power8?

@mknyszek
Copy link
Contributor

mknyszek commented Mar 6, 2024

I just meant the builder, not the bot.

@pmur
Copy link
Contributor Author

pmur commented Apr 1, 2024

Hi, is there any progress on the Go team's tooling? I'd like to continue migrating the remaining ppc64 and ppc64le builders.

@dmitshur dmitshur self-assigned this Apr 2, 2024
@dmitshur
Copy link
Contributor

dmitshur commented Apr 2, 2024

Thanks for the ping @pmur. I have time to help move this forward this week.

I've mailed CL 575856 as the next step here, which should make it possible for you to start connecting bots with the linux-ppc64le-power10 host without them interfering with our LUCI builder definitions for linux-ppc64le (soon to be renamed to linux-ppc64le_power8).

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Apr 2, 2024
@gopherbot
Copy link

Change https://go.dev/cl/575856 mentions this issue: main.star: make power{8,9,10} a part of the host, not runmod

gopherbot pushed a commit to golang/build that referenced this issue Apr 2, 2024
Per discussion in go.dev/issue/64660, the intent is to have distinct
hardware test the GOOS=linux GOARCH=ppc64{,le} GOPPC64=power{8,9,10}
Go ports, instead of using the same hardware with a different value
of the GOPPC64 environment variable set.

By now crrev.com/c/5178250 has propagated and the swarming bot reports
in its 'cpu' dimension whether it's a POWER{8,9,10} type of CPU. Update
our configuration accordingly: make it a property of the host which type
of CPU should be matched, and replace the previous power10 runmod in
favor of setting the GOPPC64 env var explicitly as part of the builder
definition (since a runmod is redundant with the host suffix and would
make the builder name more repetitive).

After this change is in, it will become possible to have hardware for
testing the GOOS=linux GOARCH=ppc64le GOPPC64=power10 port connect to
LUCI and not get selected as a host for the LUCI builder now defined
as 'linux-ppc64le_power8' (previously defined as 'linux-ppc64le').

For golang/go#64660.
For golang/go#64588.

Change-Id: I1bf00231c46a29052c3e3dfce92f6bea20fea708
Reviewed-on: https://go-review.googlesource.com/c/build/+/575856
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@dmitshur
Copy link
Contributor

dmitshur commented Apr 3, 2024

Here's the certificate for the bot hostname linux-ppc64le-power10:

linux-ppc64le-power10-1712180608.cert.txt

@gopherbot
Copy link

Change https://go.dev/cl/576138 mentions this issue: main.star: add linux-ppc64le_power{9,10} builders

gopherbot pushed a commit to golang/build that referenced this issue Apr 4, 2024
For golang/go#66667.
For golang/go#64660.

Change-Id: If36ab865d6b6250b27b2c4d2359603ea839ceadf
Reviewed-on: https://go-review.googlesource.com/c/build/+/576138
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
@dmitshur dmitshur assigned pmur and unassigned prattmic and dmitshur Apr 4, 2024
@pmur
Copy link
Contributor Author

pmur commented Apr 5, 2024

Builder is up and running. Thanks everyone.

@pmur pmur closed this as completed Apr 5, 2024
@dmitshur
Copy link
Contributor

dmitshur commented Apr 5, 2024

Thank you very much @pmur. I've mailed you go.dev/cl/576917 to remove its known issue.

@gopherbot
Copy link

Change https://go.dev/cl/576917 mentions this issue: main.star: remove known issue for linux-ppc64le_power{9,10} builders

gopherbot pushed a commit to golang/build that referenced this issue Apr 5, 2024
They've connected and started running.
See https://ci.chromium.org/b/8751480727881152721
and https://ci.chromium.org/b/8751480728981608321.

For golang/go#66667.
For golang/go#64660.

Change-Id: I96d1a5ebc7513a0aa66bb3467d46f1cb0325b79d
Reviewed-on: https://go-review.googlesource.com/c/build/+/576917
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-ppc64x Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Projects
Status: Done
Development

No branches or pull requests

6 participants