Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add performance metrics for FederatedHPA #3972

Merged
merged 1 commit into from
Aug 26, 2023

Conversation

zhy76
Copy link
Member

@zhy76 zhy76 commented Aug 22, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:
Part of #3946

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Instrumentation: Introduced `federatedhpa_process_duration_seconds` and `federatedhpa_pull_metrics_duration_seconds` for FederatedHPA, which will be emitted by `karmada-controller-manager`.

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 22, 2023
@jwcesign
Copy link
Member

/assign

Copy link
Member

@jwcesign jwcesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @zhy76. Just a few minor issues.

pkg/metrics/resource.go Outdated Show resolved Hide resolved
pkg/metrics/resource.go Outdated Show resolved Hide resolved
Signed-off-by: zhy76 <958474674@qq.com>
@codecov-commenter
Copy link

Codecov Report

Merging #3972 (d64894c) into master (c938785) will decrease coverage by 0.07%.
The diff coverage is 0.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #3972      +/-   ##
==========================================
- Coverage   54.76%   54.70%   -0.07%     
==========================================
  Files         232      232              
  Lines       22535    22560      +25     
==========================================
- Hits        12342    12341       -1     
- Misses       9515     9541      +26     
  Partials      678      678              
Flag Coverage Δ
unittests 54.70% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/controllers/federatedhpa/metrics/client.go 0.00% <0.00%> (ø)
pkg/metrics/resource.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

Copy link
Member

@jwcesign jwcesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

Can you assist in testing and providing the result here?

@zhy76
Copy link
Member Author

zhy76 commented Aug 24, 2023

LGMT LGMT

Can you assist in testing and providing the result here?

Sure~

@jwcesign
Copy link
Member

You can refer to this document to set up a testing environment: https://karmada.io/docs/next/tutorials/autoscaling-with-federatedhpa

@zhy76
Copy link
Member Author

zhy76 commented Aug 25, 2023

You can refer to this document to set up a testing environment: https://karmada.io/docs/next/tutorials/autoscaling-with-federatedhpa

Thanks,I‘ll test it today.

@zhy76
Copy link
Member Author

zhy76 commented Aug 25, 2023

I had a problem deploying metric-server:
image
image
image
logs:
image
haven't found a solution yet

@jwcesign
Copy link
Member

How you install the metrics server in member clusters? by the local-up-karmada.sh scripts?

@zhy76
Copy link
Member Author

zhy76 commented Aug 26, 2023

Yes, but I modified the image due to cant pull image registry.k8s.io/metrics-server/metrics-server:v0.6.3:
image

root@kindling-master01:/home/lz# kubectl get po metrics-server-559444bc84-l99rk -n kube-system -oyaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-08-25T13:35:46Z"
  generateName: metrics-server-559444bc84-
  labels:
    k8s-app: metrics-server
    pod-template-hash: 559444bc84
  name: metrics-server-559444bc84-l99rk
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: metrics-server-559444bc84
    uid: 954036ec-a028-46f9-aaf2-7960e986e519
  resourceVersion: "3906"
  uid: c8de0965-b7c3-4910-934b-97c6bce2197d
spec:
  containers:
  - args:
    - --kubelet-insecure-tls=true
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --kubelet-use-node-status-port
    - --metric-resolution=15s
    image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.3
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /livez
        port: https
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: metrics-server
    ports:
    - containerPort: 4443
      name: https
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /readyz
        port: https
        scheme: HTTPS
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 100m
        memory: 200Mi
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp
      name: tmp-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-z94pw
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: member1-control-plane
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: metrics-server
  serviceAccountName: metrics-server
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: tmp-dir
  - name: kube-api-access-z94pw
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-08-25T13:35:46Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-08-25T13:35:46Z"
    message: 'containers with unready status: [metrics-server]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-08-25T13:35:46Z"
    message: 'containers with unready status: [metrics-server]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-08-25T13:35:46Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://80ebf8b78ebaa2331da563e7cc6f6c28b14e366d90ae71f94ee944fbc5050fbf
    image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.3
    imageID: registry.aliyuncs.com/google_containers/metrics-server@sha256:8839d0a5b2cb6f3977f81d3fccc874f05af0763fb4352250f7ce73c35d083d56
    lastState:
      terminated:
        containerID: containerd://80ebf8b78ebaa2331da563e7cc6f6c28b14e366d90ae71f94ee944fbc5050fbf
        exitCode: 2
        finishedAt: "2023-08-26T03:45:13Z"
        reason: Error
        startedAt: "2023-08-26T03:45:12Z"
    name: metrics-server
    ready: false
    restartCount: 15
    started: false
    state:
      waiting:
        message: back-off 1m20s restarting failed container=metrics-server pod=metrics-server-559444bc84-l99rk_kube-system(c8de0965-b7c3-4910-934b-97c6bce2197d)
        reason: CrashLoopBackOff
  hostIP: 192.168.64.5
  phase: Running
  podIP: 10.10.0.4
  podIPs:
  - ip: 10.10.0.4
  qosClass: Burstable
  startTime: "2023-08-25T13:35:46Z"

@jwcesign
Copy link
Member

I tried it in my env, and it works fine:
image

With that error, you can continue to find the root reason. we can go forward.

/lgtm

/cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2023
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
Thanks.

I updated the release-notes(in PR description), by the way.

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2023
@karmada-bot karmada-bot merged commit 758a6ce into karmada-io:master Aug 26, 2023
12 checks passed
@jwcesign
Copy link
Member

Hi, @zhy76 , Can you help update the docs: https://karmada.io/docs/next/reference/instrumentation/metrics#metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants