-
Notifications
You must be signed in to change notification settings - Fork 92
kube-proxy connection tracking adjustments are crashing smoke tests #724
Comments
Is this change also on the release-1.1 branch? |
i have not ported this to release-1.1 because the conntrack sysctl/sysfs-module behavior hasn't landed there (yet?) |
using a modified jprobe originally obtained at http://www.friedhoff.org/posixfilecapsold.html I've observed that int cr_capable (const struct cred *cred, struct user_namespace *ns, int cap, int audit)
{
/*
* extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
* int cap, int audit);
*/
printk(KERN_NOTICE "%s: asking for capability %d for %s\n",
__FUNCTION__, cap, current->comm);
jprobe_return();
return 0;
} |
@jdef And our privileged dind containers don't have those capabilities? |
@sttts that is the next thing to check :) the mesos-slave is using native mesos containerization to create the container that the executor runs in. so we have a combination effect here to consider:
|
Which means that maybe the slave steals the needed capabilities, right? |
perhaps. it's also possible that the nf_conntrack module isn't loaded into the kernel in our CI environment |
We are root there. We could load it. |
also, in |
The test container is the problem? |
Feel free to add |
it could be. i'd like to test 1 change at a time to see exactly where the problem is. so i'm going to create a testing branch where i roll back the conntrack=0 flag lines I added in the hotfix, then start making some changes to our test scripts to see what fixes the prob |
data points:
|
tweaking the /sys/module/nf_conntrack/parameters/hashsize value causes problems. the proposed PR (kubernetes/kubernetes#19303) simply defaults the conntrackMax value to 0, which avoids changing the hashsize. Arguably since mesos is a multi-tenant cluster we shouldn't be mucking with this by default anyway. |
Nice catch regarding nf_conntrack hashsize! I am still wondering about the concrete root cause of the crashes. The upstream PR introduces a higher default hashsize value (256k vs. 64k) than the default. Was 256k still to small causing packets dropped by the kernel connection tracking causing the smoke test crash? |
I suspect some strange interaction between our 3-level-nested container CI that said, i can't see the specific error code because our CI env loses the On Wed, Jan 6, 2016 at 3:27 AM, Sergiusz Urbaniak notifications@github.com
|
@jdef thanks for the explanation and for the links! |
task list:
/sys/module/nf_conntrack/parameters/hashsize
is failing; our CI env loses the logs here so I can't see the error codecurrent thinking is that it's likely some kind of permissions problem related to kube-proxy attempting to tweak networking sysctl's
The text was updated successfully, but these errors were encountered: