Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Increased Memory Usage #17

Open
dejanzelic opened this issue Apr 5, 2023 · 4 comments
Open

Investigate Increased Memory Usage #17

dejanzelic opened this issue Apr 5, 2023 · 4 comments

Comments

@dejanzelic
Copy link
Contributor

Hola!

Thanks for making this, it's super useful!

I'm running this on a 4 cluster node (3x 2GB Pi 4b, and 1x 8GB Pi 4b) to expose the sound device and USB device. I was having a weird issue where everything works great for 2 of my nodes and the I could only get it to work with the other 2 nodes once (with the default DaemonSet config). The 2 nodes that didn't work would never show any logs.

The issue started when I was trying to use the new USB device feature. But even when I would go back to the default, I still had the same issue.

I read the issue here: #11 And decided to try also upping the memory limit. As soon as I did that everything worked!

So it does sound like the new build uses more memory. Here is my current config that's working:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - '{"name": "audio", "groups": [{"count": 10, "paths": [{"path": "/dev/snd"}]}]}'
        - --device
        - '{"name": "zwave", "groups": [{"usb": [{"vendor": "0658", "product": "0200"}]}]}'
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 20Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

While I'm here, I figured I should give some additional feedback. On the nodes without the USB device, I get this log message constently:

{"caller":"usb.go:245","level":"info","msg":"no USB devices found attached to system","resource":"squat.ai/zwave","ts":"2023-04-05T01:31:47.627269533Z"}

It's not a problem, but I don't think this should be a "info" level message.

@squat
Copy link
Owner

squat commented Apr 5, 2023

Nice! Those are both really good pieces of information! If you feel up to it, I'd happily merge a PR that changes that line to a debug level message (no pressure, I can get to it later today). Yes, for the time being we should probably bump the memory requested by the plugin in the default manifest included in the repository. In the medium term we should try to look at why memory usage has increased. I'll do some memory profiling to see if we can get it back down. Thanks again @dejanzelic

@dejanzelic
Copy link
Contributor Author

Sweet! I'll submit a pr for the log level thing when I'm in front of my computer.

@squat squat changed the title Not Starting on Certain Nodes Investigate Increased Memory Usage Apr 23, 2023
@TakahiroW4047
Copy link

TakahiroW4047 commented Jul 22, 2023

Hello,

Not sure if this related, but I seem to be encountering a memory leak where Generic Device process is killed due to "Memory cgroup out of memory" (see dmesg log below).

I'm currently running Kubernetes 1.27 with the latest release of your 'generic-device-plugin' as of today. Running on RaspberryPi4B 8gb clusters. Ubuntu Server 22.04 LTS

[145926.626040] generic-device- invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
[145926.626083] CPU: 2 PID: 298580 Comm: generic-device- Tainted: G C E 5.15.0-1033-raspi #36-Ubuntu
[145926.626097] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[145926.626104] Call trace:
[145926.626109] dump_backtrace+0x0/0x200
[145926.626126] show_stack+0x20/0x30
[145926.626136] dump_stack_lvl+0x8c/0xb8
[145926.626150] dump_stack+0x18/0x34
[145926.626159] dump_header+0x54/0x21c
[145926.626173] oom_kill_process+0x22c/0x230
[145926.626184] out_of_memory+0xf4/0x370
[145926.626193] mem_cgroup_out_of_memory+0x150/0x184
[145926.626206] try_charge_memcg+0x5c4/0x670
[145926.626218] charge_memcg+0x5c/0x100
[145926.626229] __mem_cgroup_charge+0x40/0x8c
[145926.626242] __add_to_page_cache_locked+0x20c/0x3c0
[145926.626255] add_to_page_cache_lru+0x5c/0x100
[145926.626266] pagecache_get_page+0x1d0/0x624
[145926.626278] filemap_fault+0x588/0x830
[145926.626289] __do_fault+0x44/0xe0
[145926.626301] do_read_fault+0xe4/0x1b0
[145926.626313] do_fault+0xc0/0x360
[145926.626325] handle_pte_fault+0x5c/0x1c0
[145926.626336] __handle_mm_fault+0x1d0/0x350
[145926.626348] handle_mm_fault+0x108/0x294
[145926.626360] do_page_fault+0x160/0x560
[145926.626369] do_translation_fault+0x98/0xf0
[145926.626378] do_mem_abort+0x4c/0xbc
[145926.626387] el0_ia+0x9c/0x204
[145926.626397] el0t_64_sync_handler+0x124/0x12c
[145926.626407] el0t_64_sync+0x1a4/0x1a8
[145926.626417] memory: usage 10160kB, limit 10240kB, failcnt 27595
[145926.626429] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[145926.626440] Memory cgroup stats for /kubepods.slice/kubepods-pod54d889da_b92a_49f8_a602_5ed21cc852e5.slice:
[145926.626505] anon 9568256
file 139264
kernel_stack 196608
pagetables 143360
percpu 22464
sock 0
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 9560064
active_anon 8192
inactive_file 98304
active_file 0
unevictable 0
slab_reclaimable 91808
slab_unreclaimable 123440
slab 215248
workingset_refault_anon 0
workingset_refault_file 29514
workingset_activate_anon 0
workingset_activate_file 92
workingset_restore_anon 0
workingset_restore_file 45
workingset_nodereclaim 0
pgfault 107712
pgmajfault 43
pgrefill 1264
pgscan 825908
pgsteal 29480
pgactivate 863
pgdeactivate 955
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 0
thp_collapse_alloc 0
[145926.626525] Tasks state (memory values in pages):
[145926.626533] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[145926.626550] [ 298513] 65535 298513 198 1 36864 0 -998 pause
[145926.626575] [ 298550] 0 298550 181003 4262 122880 0 -997 generic-device-
[145926.626595] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-e689c4e731d1a743a8861b71ea4d79648cd1cc8b80e1fe83e1faabdb114849ff.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-pod54d889da_b92a_49f8_a602_5ed21cc852e5.slice,task_memcg=/kubepods.slice/kubepods-pod54d889da_b92a_49f8_a602_5ed21cc852e5.slice/cri-containerd-e689c4e731d1a743a8861b71ea4d79648cd1cc8b80e1fe83e1faabdb114849ff.scope,task=generic-device-,pid=298550,uid=0
[145926.626815] Memory cgroup out of memory: Killed process 298550 (generic-device-) total-vm:724012kB, anon-rss:8736kB, file-rss:8312kB, shmem-rss:0kB, UID:0 pgtables:120kB oom_score_adj:-997

@squat
Copy link
Owner

squat commented Sep 16, 2023

xref: #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants