[v2] az-log does not follow controller logs when logs get rotated #1541

jmclong · 2022-10-04T17:51:09Z

What happened:
Due to a limitation in Kubernetes, the log stream from GetLogs will not follow logs if logs are rotated on the node:

azuredisk-csi-driver/cmd/az-log/cmd/pod.go

Line 98 in a41ab0e

    
           req := clientsetK8s.CoreV1().Pods(GetReleaseNamespace()).GetLogs(podName, &podLogOptions[i])

Related issues:

Pod logs stop being pulled when container log files are rotated ansible/receptor#446
kubectl logs should support log rotation for CRI container runtime. kubernetes/kubernetes#59902

What you expected to happen:
I expect that az-log continues to report the logs after log rotation.

How to reproduce it:

Create a cluster with 41 linux nodes (example command):

az aks create --name selfonboarda3hf6f1aks --resource-group SelfOnboardA3HF6F1 --location 'southcentralus' \
  --node-resource-group SelfOnboardA3HF6F1-NodeGroup --dns-name-prefix selfonboarda3hf6f1aksdns \
  --Kubernetes-version 1.22.11 --max-pods 50 --network-plugin azure --node-count 41 --node-osdisk-size 128 \
  --node-osdisk-type Ephemeral --node-vm-size Standard_D16s_v3 --nodepool-name agentpool \
  --vm-set-type VirtualMachineScaleSets --uptime-sla --generate-ssh-keys --yes  \
  --windows-admin-username azureuser --windows-admin-password ********** --disable-disk-driver

Create the namespace 'test':

kubectl create ns test

Create a stateful set with 1000 replicas with PVCs. This is to generate enough logs from the disk driver (alternatively, find a way to set the log rotation threshold low on all nodes to observe this behavior more easily):

---
# Source: statefulset/templates/statefulset-template.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
  namespace: test
  labels:
    app: nginx
spec:
  podManagementPolicy: Parallel  # default is OrderedReady
  serviceName: test
  replicas:
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      schedulerName: csi-azuredisk-scheduler-extender
      containers:
        - name: test
          image: mcr.microsoft.com/oss/nginx/nginx:1.19.5
          command:
            - "/bin/bash"
            - "-c"
            - set -euo pipefail; while true; do echo $(date) >> /mnt/azuredisk/outfile; sleep 1; done
          volumeMounts:
          - name: persistent-storage
            mountPath: /mnt/azuredisk
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
        - key: "node.kubernetes.io/unschedulable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
        - key: "node.kubernetes.io/not-ready"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 72000
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  volumeClaimTemplates:
  - metadata:
      name: persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: azuredisk-standard-ssd-lrs
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Install the v2 driver using default values.
Execute az-log get controller --follow | tee out.txt
on AKS, once out.txt reaches ~45MiB, az-log will stop streaming the logs. I believe that this is because the logs have been rotated. az-log does not exit because buf.Scan continues to block:

azuredisk-csi-driver/cmd/az-log/cmd/common.go

Line 126 in a41ab0e

for buf.Scan() {
If az-log get controller --follow | tee out.txt is executed again, the logs start streaming again.

Anything else we need to know?:

Environment:

CSI Driver version:
Kubernetes version (use kubectl version):
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2023-01-02T19:45:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-02-01T20:32:50Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-03-03T21:04:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-03-03T21:04:39Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] az-log does not follow controller logs when logs get rotated #1541

[v2] az-log does not follow controller logs when logs get rotated #1541

jmclong commented Oct 4, 2022 •

edited

Loading

k8s-triage-robot commented Jan 2, 2023

k8s-triage-robot commented Feb 1, 2023

k8s-triage-robot commented Mar 3, 2023

k8s-ci-robot commented Mar 3, 2023

[v2] az-log does not follow controller logs when logs get rotated #1541

[v2] az-log does not follow controller logs when logs get rotated #1541

Comments

jmclong commented Oct 4, 2022 • edited Loading

k8s-triage-robot commented Jan 2, 2023

k8s-triage-robot commented Feb 1, 2023

k8s-triage-robot commented Mar 3, 2023

k8s-ci-robot commented Mar 3, 2023

jmclong commented Oct 4, 2022 •

edited

Loading