Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] az-log does not follow controller logs when logs get rotated #1541

Closed
jmclong opened this issue Oct 4, 2022 · 4 comments
Closed

[v2] az-log does not follow controller logs when logs get rotated #1541

jmclong opened this issue Oct 4, 2022 · 4 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jmclong
Copy link

jmclong commented Oct 4, 2022

What happened:
Due to a limitation in Kubernetes, the log stream from GetLogs will not follow logs if logs are rotated on the node:

req := clientsetK8s.CoreV1().Pods(GetReleaseNamespace()).GetLogs(podName, &podLogOptions[i])

Related issues:

What you expected to happen:
I expect that az-log continues to report the logs after log rotation.

How to reproduce it:

  1. Create a cluster with 41 linux nodes (example command):
az aks create --name selfonboarda3hf6f1aks --resource-group SelfOnboardA3HF6F1 --location 'southcentralus' \
  --node-resource-group SelfOnboardA3HF6F1-NodeGroup --dns-name-prefix selfonboarda3hf6f1aksdns \
  --Kubernetes-version 1.22.11 --max-pods 50 --network-plugin azure --node-count 41 --node-osdisk-size 128 \
  --node-osdisk-type Ephemeral --node-vm-size Standard_D16s_v3 --nodepool-name agentpool \
  --vm-set-type VirtualMachineScaleSets --uptime-sla --generate-ssh-keys --yes  \
  --windows-admin-username azureuser --windows-admin-password ********** --disable-disk-driver
  1. Create the namespace 'test':
kubectl create ns test
  1. Create a stateful set with 1000 replicas with PVCs. This is to generate enough logs from the disk driver (alternatively, find a way to set the log rotation threshold low on all nodes to observe this behavior more easily):
---
# Source: statefulset/templates/statefulset-template.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
  namespace: test
  labels:
    app: nginx
spec:
  podManagementPolicy: Parallel  # default is OrderedReady
  serviceName: test
  replicas:
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      schedulerName: csi-azuredisk-scheduler-extender
      containers:
        - name: test
          image: mcr.microsoft.com/oss/nginx/nginx:1.19.5
          command:
            - "/bin/bash"
            - "-c"
            - set -euo pipefail; while true; do echo $(date) >> /mnt/azuredisk/outfile; sleep 1; done
          volumeMounts:
          - name: persistent-storage
            mountPath: /mnt/azuredisk
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
        - key: "node.kubernetes.io/unschedulable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
        - key: "node.kubernetes.io/not-ready"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  volumeClaimTemplates:
  - metadata:
      name: persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: azuredisk-standard-ssd-lrs
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
  1. Install the v2 driver using default values.
  2. Execute az-log get controller --follow | tee out.txt
  3. on AKS, once out.txt reaches ~45MiB, az-log will stop streaming the logs. I believe that this is because the logs have been rotated. az-log does not exit because buf.Scan continues to block:
  4. If az-log get controller --follow | tee out.txt is executed again, the logs start streaming again.

Anything else we need to know?:

Environment:

  • CSI Driver version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants