[WIP][DO_NOT_MERGE] KVM implementation of tasks#978
Conversation
| _ = CtrStop(containerID, true) | ||
|
|
||
| stdinDir := path.Join("/run", "containers-stdin", containerID) | ||
| if err := os.RemoveAll(stdinDir); err != nil { |
There was a problem hiding this comment.
Just cleaning it up, since the next CtrCreate expects these directories to be not present. This got exposed when LKTaskLaunch() deletes existing container (if any) before creating new container with the same domainID
There was a problem hiding this comment.
I think we need to dig into this more @cshari-zededa -- since it may expose an issue with how we're using containerd and I'd rather fix the root cause (since containerd becomes a center piece of our design). Is there an easy way to reproduce this?
There was a problem hiding this comment.
It is not reproducible, I think I added this change unnecessarily. Removed it!
There was a problem hiding this comment.
Hi @rvs,
With the #containerd issue fixed,
we no longer need this stdin workaround, so removed that piece of code, and this cleanup
code as well. Please let me know if you think we should retain it.
| config := "/containers/services" + linuxkit + "/config.json" | ||
| rootfs := "/containers/services" + linuxkit + "/lower" | ||
| config := "/containers/services/" + linuxkit + "/config.json" | ||
| rootfs := "/containers/services/" + linuxkit + "/rootfs" |
There was a problem hiding this comment.
If you need a R/W filesystem you need to prepare a volume for it. A better question is: why DO you need a R/W root filesystem? All the R/W bits are given to you via /run and /persist anyway -- so what's missing?
There was a problem hiding this comment.
I had to make rootfs as R/W because of /dev/, /sys, /proc systems, since qemu/kernel is creating /dev/pts/ and /proc/ files during the VM launch.
There was a problem hiding this comment.
If those are the only ones -- we need to dig deeper into this as well. Those are supposed to be R/W mounted from the outside when you ask runc to run the container. Lets start with seeing what gets mounted on those endpoint when you call LKTaskLaunch() and we can proceed from there.
There was a problem hiding this comment.
Hi @rvs, you are right, I unnecessarily marked this as R/W, whereas Readonly rootfs works just fine, as I verified it today.
There was a problem hiding this comment.
Hey @cshari-zededa -- do you plan to update your PR to reflect that going back to a R/O rootfs?
There was a problem hiding this comment.
Hi @rvs, I already reverted the Root.Readonly = false part.
There was a problem hiding this comment.
The only issue with read-only root partition is that, we can't use "screen" with default settings,
since it tries to create /tmp/.screens directory, and it fails. But we can always override that by
explicitly mentioning "SCREENDIR" env variable. So we are all good with the Readonly root fs
|
|
||
| spec.Root.Path = rootfs | ||
| spec.Root.Readonly = true | ||
| if domSettings != nil { |
There was a problem hiding this comment.
I had to make rootfs as R/W because of /dev/, /sys, /proc systems, since qemu/kernel is creating /dev/pts/ and /proc/ files during the VM launch.
There was a problem hiding this comment.
reverted the Readonly change to true.
There was a problem hiding this comment.
The only issue with read-only root partition is that, we can't use "screen" with default settings,
since it tries to create /tmp/.screens directory, and it fails. But we can always override that by
explicitly mentioning "SCREENDIR" env variable. So we are all good with the Readonly root fs.
rvs
left a comment
There was a problem hiding this comment.
Hey @cshari-zededa -- sorry for the delay -- lets make sure we commit it this week. A few things left to do here:
- rebase on the latest EVE
- take care of the feedback I left just now
- lets see if we can reduce overhead to less than 800M (800M really seems excessive)
For #3 I suggest we do the memory pinning and start passing -overcommit mem-lock=on and -overcommit cpu-pm=on to the qemu we're running.
I think that will help us arrive at a more reasonable overhead figure
| config := "/containers/services" + linuxkit + "/config.json" | ||
| rootfs := "/containers/services" + linuxkit + "/lower" | ||
| config := "/containers/services/" + linuxkit + "/config.json" | ||
| rootfs := "/containers/services/" + linuxkit + "/rootfs" |
There was a problem hiding this comment.
Hey @cshari-zededa -- do you plan to update your PR to reflect that going back to a R/O rootfs?
| } | ||
|
|
||
| m := int64(dom.Memory * 1024) | ||
| m := int64(dom.Memory*1024) + int64(qemuOverHead) |
There was a problem hiding this comment.
This is not quite right @cshari-zededa -- since this function is going to be used for much more than just launching qemu tasks -- we need a more generic way to accommodate the overhead. Please move it one level up to at least LKTaskLaunch(). IOW, make LKTaskLaunch do a generic UpdateFromDomain() and later manually tweak the memory setting at that level based on the overhead argument passed to LKTaskLaunch
Done
Done, please let me know if you think I have missed anything.
Arrived at 500MB with some trail-and-error experiments with PCI passthrough settings.
Done
|
rvs
left a comment
There was a problem hiding this comment.
Hey @cshari-zededa -- this is great, but there's a tiny, tiny nit left (see below). I would've pulled as-is but it pushes the size of rootfs over the 250Mb limit on ARM -- hence it breaks the ARM build.
| pixman=0.34.0-r5 \ | ||
| yajl=2.1.0-r0 \ | ||
| keyutils=1.5.10-r0 \ | ||
| screen=4.6.2-r1 \ |
There was a problem hiding this comment.
Lets not pull screen in for now -- anyone debugging the system can always apk add screen (and in fact, we should probably move towards using a socket for stdio anyway -- but that shouldn't be part of this PR).
There was a problem hiding this comment.
Note that you don't have to be in the same R/O rootfs/container for using screen
There was a problem hiding this comment.
Hi @rvs, excluded this change to pull in screen. I've updated the PR. Please take a look.
Signed-off-by: Hariharasubramanian C S <cshari@zededa.com>
|
Thanks @cshari-zededa -- merged! |
Signed-off-by: Hariharasubramanian C S cshari@zededa.com