Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonnet: unlock dependencies for 4.9 development cycle #1214

Merged
merged 2 commits into from
Jun 17, 2021

Conversation

paulfantom
Copy link
Contributor

@paulfantom paulfantom commented Jun 14, 2021

Unlocking all jsonnet deps. All relevant work was done in jsonnet/ directory. assets/ and manifests/ are generated.
Additional work to allow using latest libs:

  • thanos ruler and querier need targetGroups during mixin inclusion
  • kube-prometheus components need an explicit parameter for kube-rbac-proxy image
  • removed securityContext from thanos querier pods. It wasn't included earlier and with one provided by the upstream community, querier cannot start. This is due to OpenShift constraints.

Some jsonnet repositories changed place or default branch:

  • etcd migrated a long time ago to etcd-io/etcd
  • etcd default branch is now main
  • thanos default branch is now main
  • kube-thanos default branch is now main
  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@prashbnair please verify if this is also bringing in relevant changes from kubernetes-mixin

/cc @simonpasquier

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 14, 2021
@prashbnair
Copy link
Contributor

@paulfantom My changes are in.

@paulfantom
Copy link
Contributor Author

CI was filing due to securityContext set on thanos querier pods. The event for that was:

2m43s       Warning   FailedCreate             replicaset/thanos-querier-877fd4494                Error creating: pods "thanos-querier-877fd4494-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{65534}: 65534 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000440000, 1000449999], spec.containers[1].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000440000, 1000449999], spec.containers[2].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000440000, 1000449999], spec.containers[3].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000440000, 1000449999], spec.containers[4].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000440000, 1000449999], provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]  

@@ -26,7 +26,7 @@
"subdir": "jsonnet/prometheus-operator"
}
},
"version": "release-0.47"
"version": "master"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the jsonnet code generates the operator's CRDs, should we pin to a release branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRDs and prometheus-operator alerts are coming from the same place and are locked with this version flag. Having this set to master allows us to test new alerts and ensure CRDs are really backward compatible as they should.

I agree that this should be set to particular released version of prometheus-operator when we lock all dependencies before releasing OpenShift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks for the explanation. I'm fine with this approach :)

@paulfantom
Copy link
Contributor Author

/retest

Comment on lines +190 to +191
nodeSelector:
beta.kubernetes.io/os: linux
Copy link
Member

@dgrisonnet dgrisonnet Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we most likely don't want that since we are platform agnostic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not platform agnostic. Thanos is built only for linux, same for prometheus, alertmanager, ksm, and all other components:

$ grep nodeSelector -A1 -R . 
./alertmanager/alertmanager.yaml:  nodeSelector:
./alertmanager/alertmanager.yaml-    kubernetes.io/os: linux
--
./grafana/deployment.yaml:      nodeSelector:
./grafana/deployment.yaml-        beta.kubernetes.io/os: linux
--
./kube-state-metrics/deployment.yaml:      nodeSelector:
./kube-state-metrics/deployment.yaml-        kubernetes.io/os: linux
--
./node-exporter/daemonset.yaml:      nodeSelector:
./node-exporter/daemonset.yaml-        kubernetes.io/os: linux
--
./openshift-state-metrics/deployment.yaml:      nodeSelector:
./openshift-state-metrics/deployment.yaml-        kubernetes.io/os: linux
--
./prometheus-adapter/deployment.yaml:      nodeSelector:
./prometheus-adapter/deployment.yaml-        kubernetes.io/os: linux
--
./prometheus-k8s/prometheus.yaml:  nodeSelector:
./prometheus-k8s/prometheus.yaml-    kubernetes.io/os: linux
--
./prometheus-operator-user-workload/deployment.yaml:      nodeSelector:
./prometheus-operator-user-workload/deployment.yaml-        kubernetes.io/os: linux
--
./prometheus-operator/deployment.yaml:      nodeSelector:
./prometheus-operator/deployment.yaml-        kubernetes.io/os: linux
--
./prometheus-user-workload/prometheus.yaml:  nodeSelector:
./prometheus-user-workload/prometheus.yaml-    kubernetes.io/os: linux
--
./telemeter-client/deployment.yaml:      nodeSelector:
./telemeter-client/deployment.yaml-        beta.kubernetes.io/os: linux
--
./thanos-querier/deployment.yaml:      nodeSelector:
./thanos-querier/deployment.yaml-        beta.kubernetes.io/os: linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I thought they could be scheduled on any nodes and the respective runtimes would handle the rest.

@@ -46,13 +46,19 @@ spec:
- --query.replica-label=prometheus_replica
- --query.replica-label=thanos_ruler_replica
- --store=dnssrv+_grpc._tcp.prometheus-operated.openshift-monitoring.svc.cluster.local
- --query.auto-downsampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any concerns regarding enabling auto-downsampling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any, it should reduce the load in some situations.

@dgrisonnet
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 17, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, paulfantom, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dgrisonnet,paulfantom,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 17, 2021

@paulfantom: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-single-node d35eb8d link /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants