Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contour 0.14 examples - cannot create log #1279

Closed
so0k opened this issue Jul 29, 2019 · 9 comments · Fixed by #1302
Closed

Contour 0.14 examples - cannot create log #1279

so0k opened this issue Jul 29, 2019 · 9 comments · Fixed by #1302
Milestone

Comments

@so0k
Copy link

so0k commented Jul 29, 2019

What steps did you take and what happened:
Deployed Contour as ds-hostnet-combined:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  creationTimestamp: "2019-07-26T10:34:16Z"
  generation: 1
  labels:
    app: contour-internal
    chart: contour-0.4.0
    component: router
    heritage: Tiller
    release: contour-internal
  name: contour-internal
  namespace: default
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: contour-internal
      component: router
      release: contour-internal
  template:
    metadata:
      annotations:
        ad.datadoghq.com/contour.check_names: |
          [
            "prometheus"
          ]
        ad.datadoghq.com/contour.init_configs: |
          [
            {}
          ]
        ad.datadoghq.com/contour.instances: |
          [
            {
              "prometheus_url": "http://%%host%%:8000/metrics",
              "namespace": "contour",
              "metrics": ["contour_*"]
            }
          ]
        ad.datadoghq.com/contour.logs: |
          [
            {
              "source":"contour",
              "service":"ingress"
            }
          ]
        ad.datadoghq.com/envoy.check_names: |
          [
            "envoy"
          ]
        ad.datadoghq.com/envoy.init_configs: |
          [
            {}
          ]
        ad.datadoghq.com/envoy.instances: |
          [
            {
              "stats_url": "http://%%host%%:8002/stats"
            }
          ]
        ad.datadoghq.com/envoy.logs: |
          [
            {
              "source":"envoy",
              "service":"ingress"
            }
          ]
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      creationTimestamp: null
      labels:
        app: contour-internal
        chart: contour-0.4.0
        component: router
        heritage: Tiller
        release: contour-internal
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/spot-edge
                operator: Exists
            - matchExpressions:
              - key: node-role.kubernetes.io/edge
                operator: Exists
      automountServiceAccountToken: true
      containers:
      - args:
        - -c
        - /config/contour.json
        - --service-cluster
        - envoy-cluster
        - --service-node
        - $(NODE_NAME)
        - --base-id
        - "1"
        command:
        - envoy
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: docker.io/envoyproxy/envoy-alpine:v1.10.0
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - wget
              - -qO-
              - http://localhost:9001/healthcheck/fail
        name: envoy
        ports:
        - containerPort: 8080
          hostPort: 8080
          name: http
          protocol: TCP
        - containerPort: 8443
          hostPort: 8443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8002
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 3
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 64Mi
          requests:
            cpu: 100m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: contour-config
      - args:
        - serve
        - --incluster
        - --xds-port
        - "8001"
        - --http-port
        - "8000"
        - --stats-port
        - "8002"
        - --envoy-service-http-port
        - "8080"
        - --envoy-service-https-port
        - "8443"
        - --ingress-class-name=contour-internal
        command:
        - contour
        image: gcr.io/heptio-images/contour:v0.14.0
        imagePullPolicy: IfNotPresent
        name: contour
        ports:
        - containerPort: 8001
          hostPort: 8001
          name: xds
          protocol: TCP
        - containerPort: 8000
          hostPort: 8000
          name: metrics
          protocol: TCP
        resources:
          limits:
            memory: 64Mi
          requests:
            cpu: 100m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirstWithHostNet
      hostNetwork: true
      initContainers:
      - args:
        - bootstrap
        - /config/contour.json
        - --xds-port
        - "8001"
        - --admin-port
        - "9001"
        command:
        - contour
        image: gcr.io/heptio-images/contour:v0.14.0
        imagePullPolicy: IfNotPresent
        name: envoy-initconfig
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: contour-config
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: contour-internal
      serviceAccountName: contour-internal
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: spotInstance
      - effect: NoSchedule
        key: edge
      volumes:
      - emptyDir: {}
        name: contour-config
  templateGeneration: 1
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 33%
    type: RollingUpdate

It worked for a while, then envoy lost gRPC connection:

[2019-07-26 14:39:34.267][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
[2019-07-26 14:39:34.413][1][info][upstream] [source/server/lds_api.cc:66] lds: remove listener 'ingress_https'
[2019-07-26 14:39:34.469][1][info][upstream] [source/server/lds_api.cc:74] lds: add/update listener 'ingress_https'

investigation shows contour restarted - with kubectl logs -c contour -p showing:

time="2019-07-26T13:35:02Z" level=info msg=response connection=6 context=grpc count=1 error_detail=nil resource_names="[ingress_https]" response_nonce=7 type_url=type.googleapis.com/envoy.api.v2.RouteConfiguration version_info=7
time="2019-07-26T13:35:02Z" level=info msg=stream_wait connection=6 context=grpc error_detail=nil resource_names="[ingress_https]" response_nonce=2 type_url=type.googleapis.com/envoy.api.v2.RouteConfiguration version_info=2
W0726 14:39:33.851239       1 reflector.go:289] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: watch of *v1beta1.Ingress ended with: too old resource version: 153445 (182755)
log: exiting because of error: log: cannot create log: open /tmp/contour.ip-10-50-176-240.unknownuser.log.WARNING.20190726-143933.1: no such file or directory

What did you expect to happen:
Contour should not crash

Anything else you would like to add:
Perhaps we should add the /tmp emptyDir mount to the example manifests?

Environment:

  • Contour version: 0.14 / Envoy 1.10
  • Kubernetes version: (use kubectl version): 1.13-eks
  • Kubernetes installer & version: eks
  • Cloud provider or hardware configuration: AWS - latest EKS optimized AMIs
  • OS (e.g. from /etc/os-release): Amazon Linux 2
@so0k
Copy link
Author

so0k commented Jul 29, 2019

Additional notes:
This is a fresh cluster provisioned on Friday, as you may notice taints are used for edge nodes, originally the edge nodes were provisioned with the wrong aws security group, after updating the security group configuration in the ASG launch template, it seems only 1 node was replaced (allowing things to work on Friday evening as traffic flowed from NLB to the single healthy egde node)

Over the weekend, the cluster-autoscaler had downscaled the edge nodes, keeping the older node which was running the workload, but which actually was unhealthy on the NLB as it still had the old security group configuration ...

The error message regarding /tmp is confusing, but might not have been the cause of my issue - I rotated the old node out (with new nodes which have the correct security group from the launch template) and after monitoring for 2 hours... everything seems to work (apart from frequent envoy messages about losing connection to contour - even though envoy and contour are in the same pod, sharing hostNetwork ...)

@so0k so0k closed this as completed Jul 30, 2019
@mattalberts
Copy link

Same issue with 0.14.0 a few instances

2019-08-08T00:23:15.000 99s5 log: exiting because of error: log: cannot create log: open /tmp/contour.contour-large-58595f77db-8n6z2.unknownuser.log.WARNING.20190808-002315.1: no such file or directory
2019-08-08T00:23:15.000 99s18 W0808 00:23:15.829571       1 reflector.go:289] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: watch of *v1beta1.IngressRoute ended with: too old resource version: 103335177 (103335598)
2019-08-08T00:23:15.000 99s18 log: exiting because of error: log: cannot create log: open /tmp/contour.contour-large-58595f77db-hbpr9.unknownuser.log.WARNING.20190808-002315.1: no such file or directory
2019-08-08T00:23:24.000 99s16 log: exiting because of error: log: cannot create log: open /tmp/contour.contour-medium-bc9f97c44-7qm84.unknownuser.log.WARNING.20190808-002324.1: no such file or directory
2019-08-08T00:23:24.000 99s16 W0808 00:23:24.687019       1 reflector.go:289] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: watch of *v1beta1.IngressRoute ended with: too old resource version: 103335177 (103335600)
2019-08-08T00:23:25.000 99s20 W0808 00:23:25.023187       1 reflector.go:289] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: watch of *v1beta1.IngressRoute ended with: too old resource version: 103335177 (103335600)
2019-08-08T00:23:25.000 99s20 log: exiting because of error: log: cannot create log: open /tmp/contour.contour-medium-bc9f97c44-9twrr.unknownuser.log.WARNING.20190808-002325.1: no such file or directory

@davecheney davecheney reopened this Aug 8, 2019
@davecheney davecheney added this to the 0.15.0 milestone Aug 8, 2019
@mattalberts
Copy link

I think 0.14.0 made the effort to get rid of glog, i see klog in the vendor directory, which appears to have the same link into flags. Does this still need to be forced to disable file based logging?

_ = flag.Lookup("logtostderr").Value.Set("true")

@davecheney
Copy link
Contributor

davecheney commented Aug 8, 2019 via email

@mattalberts
Copy link

mattalberts commented Aug 8, 2019

I'm with you :) At least you just released 0.14.1 (its probably too soon, i've just literally run into the same thing ... mint version .. find crash)

suspect a quick klog.InitFlags(nil) will get you where you want to be (it sets stderr by default)

@mattalberts
Copy link

I believe this would be you minimum patchset

diff --git a/cmd/contour/contour.go b/cmd/contour/contour.go
index 041cea2..da47ef3 100644
--- a/cmd/contour/contour.go
+++ b/cmd/contour/contour.go
@@ -24,9 +24,11 @@ import (
        "k8s.io/client-go/kubernetes"
        "k8s.io/client-go/rest"
        "k8s.io/client-go/tools/clientcmd"
+       "k8s.io/klog"
 )
 
 func main() {
+       klog.InitFlags(nil)
        log := logrus.StandardLogger()
        app := kingpin.New("contour", "Heptio Contour Kubernetes ingress controller.")
 
diff --git a/go.mod b/go.mod
index 0ff23d8..d4cff7a 100644
--- a/go.mod
+++ b/go.mod
@@ -39,6 +39,7 @@ require (
        k8s.io/client-go v11.0.0+incompatible
        k8s.io/code-generator v0.0.0-20190311093542-50b561225d70
        k8s.io/gengo v0.0.0-20190116091435-f8a0810f38af // indirect
+       k8s.io/klog v0.3.0
        k8s.io/kube-openapi v0.0.0-20190115222348-ced9eb3070a5 // indirect
        k8s.io/utils v0.0.0-20190607212802-c55fbcfc754a // indirect
        mvdan.cc/unparam v0.0.0-20190310220240-1b9ccfa71afe

given the internals of InitFlags

func InitFlags(flagset *flag.FlagSet) {

	// Initialize defaults.
	initDefaultsOnce.Do(func() {
		logging.logDir = ""
		logging.logFile = ""
		logging.logFileMaxSizeMB = 1800
		logging.toStderr = true
		logging.alsoToStderr = false
		logging.skipHeaders = false
		logging.skipLogHeaders = false
	})

	if flagset == nil {
		flagset = flag.CommandLine
	}

	flagset.StringVar(&logging.logDir, "log_dir", logging.logDir, "If non-empty, write log files in this directory")
	flagset.StringVar(&logging.logFile, "log_file", logging.logFile, "If non-empty, use this log file")
	flagset.Uint64Var(&logging.logFileMaxSizeMB, "log_file_max_size", logging.logFileMaxSizeMB,
		"Defines the maximum size a log file can grow to. Unit is megabytes. "+
			"If the value is 0, the maximum file size is unlimited.")
	flagset.BoolVar(&logging.toStderr, "logtostderr", logging.toStderr, "log to standard error instead of files")
	flagset.BoolVar(&logging.alsoToStderr, "alsologtostderr", logging.alsoToStderr, "log to standard error as well as files")
	flagset.Var(&logging.verbosity, "v", "number for the log level verbosity")
	flagset.BoolVar(&logging.skipHeaders, "skip_headers", logging.skipHeaders, "If true, avoid header prefixes in the log messages")
	flagset.BoolVar(&logging.skipLogHeaders, "skip_log_headers", logging.skipLogHeaders, "If true, avoid headers when opening log files")
	flagset.Var(&logging.stderrThreshold, "stderrthreshold", "logs at or above this threshold go to stderr")
	flagset.Var(&logging.vmodule, "vmodule", "comma-separated list of pattern=N settings for file-filtered logging")
	flagset.Var(&logging.traceLocation, "log_backtrace_at", "when logging hits line file:N, emit a stack trace")
}

`logging.toStderr` is defaulted to true

@stevesloka
Copy link
Member

@mattalberts would you like to send a PR with your changes?

@mattalberts
Copy link

mattalberts commented Aug 8, 2019 via email

@mattalberts
Copy link

@stevesloka I'll have the patch up shortly

mattalberts pushed a commit to mattalberts/contour that referenced this issue Aug 8, 2019
The default behavior of klog will be to attempt to log to a file,
similar to glog, which will result in a container crash because the
filesystem is read-only.

closes projectcontour#1279

Signed-off-by: Matt Alberts <malberts@cloudflare.com>
mattalberts pushed a commit to mattalberts/contour that referenced this issue Aug 8, 2019
The default behavior of klog will be to attempt to log to a file,
similar to glog, which will result in a container crash because the
filesystem is read-only.

closes projectcontour#1279

Signed-off-by: Matt Alberts <malberts@cloudflare.com>
davecheney pushed a commit to davecheney/contour that referenced this issue Aug 9, 2019
Fixes projectcontour#1279
Fixes projectcontour#1306

The default behavior of klog will be to attempt to log to a file,
similar to glog, which will result in a container crash because the
filesystem is read-only.

Signed-off-by: Matt Alberts <malberts@cloudflare.com>
davecheney pushed a commit that referenced this issue Aug 9, 2019
Fixes #1279
Fixes #1306

The default behavior of klog will be to attempt to log to a file,
similar to glog, which will result in a container crash because the
filesystem is read-only.

Signed-off-by: Matt Alberts <malberts@cloudflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants