kube-proxy connection tracking adjustments are crashing smoke tests #724

jdef · 2016-01-05T02:01:22Z

task list:

implement temporary workaround: MESOS: disable conntrack behavior mods that are possibly causing smoke tests to bomb kubernetes/kubernetes#19277 (MERGED)
root cause analysis for correct, long term solution
- the write to /sys/module/nf_conntrack/parameters/hashsize is failing; our CI env loses the logs here so I can't see the error code
- xref WIP/MESOS: test harness fixes for 724 kubernetes/kubernetes#19290
implement long(er) term solution
- (WIP) MESOS: avoid updating nf_conntrack_max and _hashsize settings by default kubernetes/kubernetes#19303

current thinking is that it's likely some kind of permissions problem related to kube-proxy attempting to tweak networking sysctl's

The text was updated successfully, but these errors were encountered:

sttts · 2016-01-05T14:37:19Z

Is this change also on the release-1.1 branch?

jdef · 2016-01-05T14:40:13Z

i have not ported this to release-1.1 because the conntrack sysctl/sysfs-module behavior hasn't landed there (yet?)

jdef · 2016-01-05T15:52:24Z

using a modified jprobe originally obtained at http://www.friedhoff.org/posixfilecapsold.html I've observed that CAP_SYS_ADMIN is required to access the sysfs-modules filesystem, and that CAP_NET_ADMIN as well as CAP_SYS_ADMIN are required to access the sysctl file that's tweaked by kube-proxy. modified jprobe code is here:

int cr_capable (const struct cred *cred, struct user_namespace *ns, int cap, int audit)
{
        /*
         * extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
         *                        int cap, int audit);
         */
        printk(KERN_NOTICE "%s: asking for capability %d for %s\n",
                __FUNCTION__, cap, current->comm);
        jprobe_return();
        return 0;
}

sttts · 2016-01-05T15:53:22Z

@jdef And our privileged dind containers don't have those capabilities?

jdef · 2016-01-05T15:59:55Z

@sttts that is the next thing to check :) the mesos-slave is using native mesos containerization to create the container that the executor runs in. so we have a combination effect here to consider:

the mesos slave appears to be running in a privileged docker container (via --privileged)
the mesos slave creates the kubelet-executor (non-Docker) container to host the minion process
the minion process launches the kube-proxy process

docker-container[privileged](slave::mesos-container[?caps?](minion::kube-proxy)

sttts · 2016-01-05T16:02:11Z

Which means that maybe the slave steals the needed capabilities, right?

jdef · 2016-01-05T16:03:53Z

perhaps. it's also possible that the nf_conntrack module isn't loaded into the kernel in our CI environment

sttts · 2016-01-05T16:04:23Z

We are root there. We could load it.

jdef · 2016-01-05T16:13:58Z

also, in contrib/mesos/ci/run-with-cluster.sh we're not running the test harness docker container with --privileged. this seems a likely culprit.

sttts · 2016-01-05T16:14:40Z

The test container is the problem?

sttts · 2016-01-05T16:15:53Z

Feel free to add --privileged there too. But it feels strange that the tests need those capabilities. /cc @karlkfi

jdef · 2016-01-05T16:16:30Z

it could be. i'd like to test 1 change at a time to see exactly where the problem is. so i'm going to create a testing branch where i roll back the conntrack=0 flag lines I added in the hotfix, then start making some changes to our test scripts to see what fixes the prob

jdef · 2016-01-05T17:39:04Z

data points:

adding --privileged to run-with-cluster.sh DOES NOT allow smoke test to pass
running lsmod in run-with-cluster.sh SHOWS nf_conntrack is LOADED

jdef · 2016-01-05T23:07:33Z

tweaking the /sys/module/nf_conntrack/parameters/hashsize value causes problems. the proposed PR (kubernetes/kubernetes#19303) simply defaults the conntrackMax value to 0, which avoids changing the hashsize. Arguably since mesos is a multi-tenant cluster we shouldn't be mucking with this by default anyway.

s-urbaniak · 2016-01-06T08:27:53Z

Nice catch regarding nf_conntrack hashsize! I am still wondering about the concrete root cause of the crashes. The upstream PR introduces a higher default hashsize value (256k vs. 64k) than the default. Was 256k still to small causing packets dropped by the kernel connection tracking causing the smoke test crash?

jdef · 2016-01-06T15:59:27Z

I suspect some strange interaction between our 3-level-nested container CI
environment and the attempts to tweak linux kernel mods at runtime. the
dockerized slave has it's own netns. skimming the kernel code, it looks
like the nf_conntrack module isn't very happy when setting hashsize on a
netns that's != init_net (the root-level, initial netns of the OS):

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/net/netfilter/nf_conntrack_core.c?id=refs/tags/v3.18.25#n1555

https://books.google.com/books?id=RpsQAwAAQBAJ&pg=PA423&lpg=PA423&dq=netns+init_net+namespace&source=bl&ots=rAuK1vwKVl&sig=5VCs683iZFEYq-S03ummEJEGF9s&hl=en&sa=X&ved=0ahUKEwi4443WxZXKAhUC8z4KHVwICkoQ6AEISjAH#v=onepage&q=netns%20init_net%20namespace&f=false

that said, i can't see the specific error code because our CI env loses the
executor and proxy logs (we should have a bug for that somewhere - it makes
debugging very hard).

On Wed, Jan 6, 2016 at 3:27 AM, Sergiusz Urbaniak notifications@github.com
wrote:

Nice catch regarding nf_conntrack hashsize! I am still wondering about the
concrete root cause of the crashes. The upstream PR introduces a higher
default hashsize value (256k vs. 64k) than the default. Was 256k still to
small causing packets dropped by the kernel connection tracking causing the
smoke test crash?

—
Reply to this email directly or view it on GitHub
#724 (comment)
.

s-urbaniak · 2016-01-06T16:01:49Z

@jdef thanks for the explanation and for the links!

jdef added class/bug tracking area/networking tests/smoke priority/P1 priority/P0 WIP and removed priority/P1 priority/P0 labels Jan 5, 2016

jdef self-assigned this Jan 5, 2016

jdef mentioned this issue Jan 5, 2016

WIP/MESOS: test harness fixes for 724 kubernetes/kubernetes#19290

Closed

This was referenced Jan 5, 2016

Set conntrack params in kube-proxy kubernetes/kubernetes#19182

Merged

MESOS: avoid updating nf_conntrack_max and _hashsize settings by default kubernetes/kubernetes#19303

Merged

jdef added PTAL and removed WIP labels Jan 5, 2016

jdef added LGTM and removed PTAL labels Jan 6, 2016

jdef closed this as completed Jan 7, 2016

jdef removed the LGTM label Jan 7, 2016

jdef added this to the v0.7.2 milestone Jan 10, 2016

k82cn mentioned this issue Oct 15, 2016

Failed to start kube-proxy in Docker kubernetes/kubernetes#34820

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-proxy connection tracking adjustments are crashing smoke tests #724

kube-proxy connection tracking adjustments are crashing smoke tests #724

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

s-urbaniak commented Jan 6, 2016

jdef commented Jan 6, 2016

s-urbaniak commented Jan 6, 2016

kube-proxy connection tracking adjustments are crashing smoke tests #724

kube-proxy connection tracking adjustments are crashing smoke tests #724

Comments

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

sttts commented Jan 5, 2016

sttts commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

jdef commented Jan 5, 2016

s-urbaniak commented Jan 6, 2016

jdef commented Jan 6, 2016

s-urbaniak commented Jan 6, 2016