Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: option to disable hyperthreading #1392

Merged

Conversation

@staebler
Copy link
Contributor

staebler commented Mar 8, 2019

Add a hyperthreading field to machine pools with options of Enabled or Disabled. The default is Enabled.

When a machine pool has hyperthreading disabled, the Machines asset for that pool creates a MachineConfig manifest for that pool. The ignition config lays down a file at /etc/default/rhcos/karg/nosmt with no contents. The machine-config-daemon is responsible for reading that file and setting the nosmt karg for the machine.

@openshift-ci-robot openshift-ci-robot requested review from crawford and russellb Mar 8, 2019

@staebler staebler force-pushed the staebler:hyperthreading_option branch from 2c365c0 to 2847b92 Mar 8, 2019

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Mar 12, 2019

is there a design doc for this and/or has this been discussed somewhere with the MCO team? Could you also CC us for work related to the MCO

@staebler

This comment has been minimized.

Copy link
Contributor Author

staebler commented Mar 12, 2019

is there a design doc for this and/or has this been discussed somewhere with the MCO team? Could you also CC us for work related to the MCO

There is a diagram that Steven Milner created after the design meeting last week with Ian McLeod and Abhinav Dahiya. You received and responded to the email with that diagram. If there are comments on the design, I suppose that that diagram would be the best place to put them. I will adjust this PR to reflect any changes made to that design.

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Mar 12, 2019

There is a diagram that Steven Milner created after the design meeting last week with Ian McLeod and Abhinav Dahiya. You received and responded to the email with that diagram. If there are comments on the design, I suppose that that diagram would be the best place to put them. I will adjust this PR to reflect any changes made to that design.

Thanks! I have that in my backlog still...

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Mar 12, 2019

We have an MCD issue here openshift/machine-config-operator#388 - I would have expected any work on this to show up there personally.

)

const (
kargDir = "/etc/default/rhcos/karg/"

This comment has been minimized.

Copy link
@ashcrow

ashcrow Mar 22, 2019

Member

Should be /etc/default/rhcos/kargs ... where kargs is a file with requested changes to operate on.

Out of date.

"machineconfiguration.openshift.io/role": role,
},
},
Spec: *specForKarg("nosmt", ""),

This comment has been minimized.

Copy link
@ashcrow

ashcrow Mar 22, 2019

Member

PTAL https://url.corp.redhat.com/0d85a22 for what's expected in terms of the local service. There's a POC there that's turning into a usable service.

Out of date

abhinavdahiya added a commit to abhinavdahiya/machine-config-operator that referenced this pull request Mar 26, 2019

controller/bootstrap: use files with multiple yaml documents
Users can push manifests during bootstrap that of the form:

```yaml
---
```

Especially for the installer: setting authorizes_keys [1] and setting hyperthreading [2] will push a manifest that includes multiple machineconfig objects for
control-plane (master) and compute (worker) roles.

Single file with multiple k8s objects separated by `---` is also a supported structure for `oc create|apply` ie. there is a high chance that users trying to push machineconfigs at
install time might create such files.

This commit allows bootstrap controller to read all k8s objects, even ones described above to find all the `machineconfiguration.openshift.io` Objects.

[1]: openshift/installer#1150
[2]: openshift/installer#1392

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch from 2847b92 to 77099df Mar 26, 2019

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Mar 26, 2019

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch 2 times, most recently from 544bf47 to 15a0beb Mar 26, 2019

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Mar 26, 2019

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch from 15a0beb to c2f360e Apr 3, 2019

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch from c2f360e to 95c9d38 Apr 3, 2019

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Apr 3, 2019

@staebler @wking @ashcrow rebased and ready for review.

@wking
Copy link
Member

wking left a comment

To naming nits, otherwise this looks good to me.

Show resolved Hide resolved pkg/asset/machines/machineconfig/hyperthreading.go Outdated
Show resolved Hide resolved pkg/asset/machines/machineconfig/hyperthreading.go Outdated
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,QUREIG5vc210

This comment has been minimized.

Copy link
@ashcrow

ashcrow Apr 3, 2019

Member

Decodes to: ADD nosmt

👍

storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,QUREIG5vc210

This comment has been minimized.

Copy link
@ashcrow

ashcrow Apr 3, 2019

Member

Decodes to: ADD nosmt

👍

@wking

This comment has been minimized.

Copy link
Member

wking commented Apr 4, 2019

/lgtm

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Apr 5, 2019

/hold

need to get approval on the hyperthreading exception.

@eparis

This comment has been minimized.

Copy link
Member

eparis commented Apr 8, 2019

When this PR is ready, @abhinavdahiya you can remove the hold.

@wking

This comment has been minimized.

Copy link
Member

wking commented Apr 8, 2019

I'm still happy with this as it stands. But I'll leave hold-pulling to @abhinavdahiya ;).

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Apr 8, 2019

Waiting on RHCOS bootimage that contains the pivot changes mentioned here https://github.com/openshift/pivot/blob/master/cmd/root.go#L43-L45

This PR only works if we bump rhcos.json to correct RHCOS version that includes the changes mentioned ^^

Waiting on OS team
cc @ashcrow @cgwalters

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Apr 9, 2019

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Apr 9, 2019

In chat I passed over an updated rhcos.json from ART builds that contain the latest pivot, rpm-ostree, and cri-o.

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Apr 9, 2019

We must have a publicly-accessible location for the images (currently libvirt but really we need all of the openstack, bare metal etc.), i.e. something like this patch:

git diff
diff --git a/data/data/rhcos.json b/data/data/rhcos.json
index 60878c249..04224bb09 100644
--- a/data/data/rhcos.json
+++ b/data/data/rhcos.json
@@ -46,7 +46,7 @@
             "hvm": "ami-0eac581fbaa9fa9c6"
         }
     },
-    "baseURI": "https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/410.8.20190325.0/",
+    "baseURI": "https://download.redhat.com/rhel/coreos/410.8.20190325.0/",
     "buildid": "410.8.20190325.0",
     "images": {
         "metal-bios": {

Or replace download.redhat.com with try.openshift.com/rhcos-bootimages/ or whatever.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Apr 9, 2019

@imcleod is helping make this available.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Apr 9, 2019

FWIW AMI's can be used to test now, and the payload has the proper content if one wants to test manually.

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch from 71c6f2d to 80e0c51 Apr 11, 2019

staebler and others added some commits Feb 26, 2019

Add the Hyperthreading field to MachinePool which allows the user to
enable or disable hyperthreading for machines. The default is for
hyperthreading to be enabled.

RHCOS ships with pivot.service that uses the `/etc/pivot/kernel-args` to override the kernel arguments for hosts. Adding `nosmt` kernel argument switches hyperthreading off.

Add MachineConfig to disable hyperthreading for control plane and compute that have the hyperthreading option disabled.
data/data: update rhcos to 410.8.20190410.0
this new version of RHCOS include the latest pivot `0.0.4-2.el8` that includes the changes required to act on the hyperthreading disabled
kernel arg at /etc/pivot/kernel-args

@abhinavdahiya abhinavdahiya force-pushed the staebler:hyperthreading_option branch from 80e0c51 to 5601e6b Apr 11, 2019

@sdodson

This comment has been minimized.

Copy link
Member

sdodson commented Apr 11, 2019

/lgtm
only change was the addition of RHCOS data

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Apr 11, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson, staebler, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Apr 11, 2019

created an AWS cluster with:

apiVersion: v1beta4
baseDomain: devcluster.openshift.com
controlPlane:
  name: master
  hyperthreading: Disabled
  replicas: 3
  platform:
    aws:
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
compute:
- name: worker
  replicas: 3
  platform:
    aws:
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
metadata:
  name: adahiya-1
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostSubnetLength: 9
  machineCIDR: 10.0.0.0/16
  serviceCIDR: 172.30.0.0/16
  type: OpenShiftSDN
platform:
  aws:
    region: us-east-1

on master machine:

[core@ip-10-0-130-101 ~]$ sudo cat /etc/pivot/kernel-args
ADD nosmt[core@ip-10-0-130-101 ~]$ cat /proc/cmdline
BOOT_IMAGE=/ostree/rhcos-59aadfa277d7d06047d323e53ccf6e937589fc3412e3d1ef7a52c2f2086ac29d/vmlinuz-4.18.0-80.el8.x86_64 no_timer_check console=tty0 console=ttyS0,115200n8 rootflags=defaults,prjquota rw root=UUID=0dc164db-5750-41e7-ba19-b9489acc69b6 ostree=/ostree/boot.1/rhcos/59aadfa277d7d06047d323e53ccf6e937589fc3412e3d1ef7a52c2f2086ac29d/0 coreos.oem.id=ec2 ignition.platform.id=ec2 nosmt
[core@ip-10-0-130-101 ~]$

seems to me as working :yay:

/hold cancel

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Apr 11, 2019

e2e-failure

FLAG: --version=\"false\"\nI0411 23:01:04.426031       1 flags.go:33] FLAG: --vmodule=\"\"\nI0411 23:01:04.426038       1 flags.go:33] FLAG: --write-config-to=\"\"\nI0411 23:01:05.621527       1 serving.go:312] Generated self-signed cert (/var/run/kubernetes/kube-scheduler.crt, /var/run/kubernetes/kube-scheduler.key)\nfailed to create listener: failed to listen on 0.0.0.0:10251: listen tcp 0.0.0.0:10251: bind: address already in use\n"

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20190411-230420.xml

error: 2 fail, 569 pass, 548 skip (26m28s)

/retest

@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Apr 12, 2019

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Apr 12, 2019

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot

This comment has been minimized.

Copy link

openshift-bot commented Apr 12, 2019

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 003f145 into openshift:master Apr 12, 2019

11 checks passed

ci/prow/e2e-aws Job succeeded.
Details
ci/prow/gofmt Job succeeded.
Details
ci/prow/golint Job succeeded.
Details
ci/prow/govet Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/shellcheck Job succeeded.
Details
ci/prow/tf-fmt Job succeeded.
Details
ci/prow/tf-lint Job succeeded.
Details
ci/prow/unit Job succeeded.
Details
ci/prow/yaml-lint Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.