Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cgroupsv2 support #954

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

LucaDev
Copy link
Contributor

@LucaDev LucaDev commented Jul 1, 2023

What this PR does / why we need it:

This PR adds cgroup v2 support to the recently introduced IO throttling feature. It also removes the requirement for a shell to be present by directly writing to the cgroup file system instead on relying on the shell to do so.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add cgroup v2 support to IO throttling

@codecov-commenter
Copy link

codecov-commenter commented Jul 1, 2023

Codecov Report

Merging #954 (3564319) into main (b118a6c) will increase coverage by 13.22%.
Report is 42 commits behind head on main.
The diff coverage is n/a.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@             Coverage Diff             @@
##             main     #954       +/-   ##
===========================================
+ Coverage   29.80%   43.02%   +13.22%     
===========================================
  Files          24       24               
  Lines        1570     1571        +1     
===========================================
+ Hits          468      676      +208     
+ Misses       1040      823      -217     
- Partials       62       72       +10     

see 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@carlory
Copy link
Member

carlory commented Jul 4, 2023

Hi @LucaDev, thanks for your contributions. It's a good catch for the volume qos feature. Could you paste test result in the issue's description?

And I'll test it in my cluster and hope it merged in the master branch as soon as possible.

@carlory
Copy link
Member

carlory commented Jul 10, 2023

Hi @LucaDev , upstream is changed, could you rebase it?

@carlory
Copy link
Member

carlory commented Jul 17, 2023

Hi @LucaDev, are you still working on this? If you need help, please let me know.

@LucaDev
Copy link
Contributor Author

LucaDev commented Jul 18, 2023

@carlory Yes, thank you! I apologize for the delayed response. Currently, I quite time limited, but I will prioritize the rebase throughout the day. While you wait, please feel free to test it out. My test cluster is currently unavailable to me.

@LucaDev LucaDev force-pushed the cgroups-v2-support branch 2 times, most recently from 133e709 to cd4aafe Compare July 18, 2023 21:49
…tries

Signed-off-by: Luca Kröger <l.kroeger01@gmail.com>
@LucaDev
Copy link
Contributor Author

LucaDev commented Jul 18, 2023

I have reimplemented the cgroup v2 support into the newly created structures. I'll give this a test run tomorrow when my test cluster is back up and running.

)

const (
blkioPath = "/sys/fs/cgroup/blkio"
ioMaxFile = "/sys/fs/cgroup/kubepods/io.max"
Copy link
Member

@carlory carlory Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In design, the qos attributes of a volume are independed on kubernetes. We use the system cgroup to limit volume qos. Because the mkfs command is also restricted by the cgroup, we make the qos attributes take effect after volume has filesystem.

See #958 for more details.

Signed-off-by: Luca Kröger <l.kroeger01@gmail.com>
@carlory
Copy link
Member

carlory commented Jul 26, 2023

Hi @LucaDev , If it's ready for review and test, please let me know. and I'll test it in my environment.

@carlory
Copy link
Member

carlory commented Jul 26, 2023

I tested this pr in a cluster which cgroup version is v1. I found that the QoS settings of volume don't take effect. Reproduce steps:

Step 1: Install the HwameStor and replace the local-storage image

git clone https://github.com/hwameistor/hwameistor.git
cd hwameistor && gh pr checkout 954
make build_ls_image
kubectl -n hwameistor set image ds/hwameistor-local-storage member=ghcr.io/hwameistor/local-storage:99.9-dev

Step 2: Apply the following yaml

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2023-06-14T02:13:52Z"
  name: hwameistor-storage-lvm-hdd-1
  resourceVersion: "113005"
  uid: 63ecdfce-0264-49f7-b782-16e7c1d61f21
parameters:
  convertible: "false"
  csi.storage.k8s.io/fstype: xfs
  poolClass: HDD
  poolType: REGULAR
  replicaNumber: "1"
  striped: "true"
  volumeKind: LVM
  provision-throughput-on-creation: 1Mi
  provision-iops-on-creation: "100"
provisioner: lvm.hwameistor.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: demo11
  name: demo11
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo11
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo11
    spec:
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: demo11
      containers:
      - command:
        - sleep
        - "100000"
        image: busybox
        name: busybox
        resources: {}
        volumeMounts:
        - name: data
          mountPath: /data
status: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: demo11
spec:
  storageClassName: hwameistor-storage-lvm-hdd-1
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Step 3: test qos

[root@master hwameistor]# kubectl exec -it demo11-588b6ff65b-56frb sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ #
/ # dd if=/dev/zero of=/data/test bs=4k count=10 oflag=direct
10+0 records in
10+0 records out
40960 bytes (40.0KB) copied, 0.001271 seconds, 30.7MB/s
/ #

CmdArgs: []string{"-c", fmt.Sprintf("echo %s >> %s", value, filename)},
})
return result.Error
func writeFile(filename, value string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause is the writeFile func. In this pr, the func write content into container cgroup, not OS cgroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quite sure cgroup changes would be synced for privileged containers. I'll haven't had time to try it myself yet. To support distros like Talos we have to use a method which doesn't involve using a shell and or programs like echo in the host NS. I'll have a look into alternative ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 12519 is process in a privileged container. the file path of blkio on host is /system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice, and it does't equal to /sys/fs/cgroup/blkio

[root@master 12519]# pwd
/proc/12519
[root@master 12519]# cat cgroup
11:cpuacct,cpu:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
10:devices:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
9:hugetlb:/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
8:blkio:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
7:memory:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
6:cpuset:/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
5:pids:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
4:net_prio,net_cls:/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
3:freezer:/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
2:perf_event:/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c
1:name=systemd:/system.slice/containerd.service/kubepods-besteffort-podd019a581_0a38_4938_81d8_2dacae511559.slice:cri-containerd:be546da0994003735b936cea90735d147b7457a6b6e7cc534cf547977f03f21c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucaDev, There're linux distributions supported by Hwameistor now. I'd like Hwameistor support more others linux in the future. Before we support it, let this pr focus on IO throttling feature. To support other distributions like Talos, it maybe require some other changes and tests. It's a better way that we can do it in another pr. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You're probably right. We should still try to find a good way to make it compatible soon. I have to change my test setup then :D

Signed-off-by: Luca Kröger <l.kroeger01@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants