Add command-line control for logrus log-level #556

mattalberts · 2018-07-18T21:03:23Z

If you're interested, I've added a command-line option to configure the logrus log-level. I added the option to serve command (rather than globally to the app), happy to move it if you prefer a wider scope.

Example

      containers:
        - name: contour
          image: gcr.io/heptio-images/contour:master
          imagePullPolicy: Always
          command: ["contour"]
          args:
            - serve
            - --incluster
            - --log-level=warn
            - --ingress-class-name=contour

mattalberts · 2018-07-23T18:29:41Z

@davecheney

Following up from the PR. I thought I'd capture some contour log information and share it to see what/where we'd like to do. I'll try to share enough information to make the results reproducible.

tl;dr

idle log rates at 128 services/ingress
- ~ 138 lines/sec
- ~ 33012 bytes/sec
live log rates
- ~ 151 lines/sec
- ~ 38692 bytes/sec

That's about 3GB a day per-replica; I'm currently running 5 replica's (~15GB a day).

Summary

We can break (contour) logging behavior into 4 categories

idle - services created, no request traffic, no changes to watched resources
request - services created, active request traffic, no changes to watched resources
watch - services created, no request traffic, active changes to watched resources
live - monitor a live system experiencing all three of the above within a grouping identified by the above.

I started with idle and live but skipped request and watch. if we find value in isolating those variables, I can extend my tooling. The results from idle and live were enough to suggest a log-level configuration.

Manifest|Setup

---
apiVersion: v1
kind: Service
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: service
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: $META_NAME
    role: service
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: $META_NAME
      role: service
  template:
    metadata:
      labels:
        app: $META_NAME
        role: service
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: echo
        image: k8s.gcr.io/echoserver:1.10
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: ingress
  annotations:
    kubernetes.io/ingress.class: contour
spec:
  tls:
  - hosts:
    - $SPEC_HOST
    secretName: xip-io
  rules:
  - host: $SPEC_HOST
    http:
      paths:
      - path: /
        backend:
          serviceName: $META_NAME
          servicePort: 80
---

image: gcr.io/heptio-images/contour:master
META_NAME and SPEC_HOST are template variables

Idle Log Behavior

Idle log behavior is simply a capture of the logging behavior observed at rest (e.g. the ingress controller is not serving traffic and there are no active changes to the items under watch). Here, we allow the data collection to range across n triples (built from the above manifest).

n=1 ~ 1 service, 1deployment (with 1 replica), 1 ingress (with tls)

n	t0	tf	dt	lines	bytes	dl/dt	db/dt
1	2:27:58	2:59:01	1,863.00	2047	482607	1.098765432	259.0483092
2	2:59:10	3:30:47	1,897.00	4126	974288	2.175013179	513.5940959
4	3:30:55	4:01:30	1,835.00	7984	1885548	4.350953678	1027.546594
8	4:01:42	4:32:29	1,847.00	16412	3878805	8.885760693	2100.056849
16	4:32:46	5:03:35	1,849.00	32500	7732124	17.57706869	4181.786912
32	5:03:58	5:34:26	1,828.00	64756	15404704	35.42450766	8427.080963
64	5:34:57	6:05:48	1,851.00	131124	31232120	70.83954619	16873.10643
128	6:06:51	6:37:54	1,863.00	257556	61500964	138.2479871	33011.78959

Example Logs where N=0

time="2018-07-24T15:46:59Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:10Z" level=info msg=response connection=1 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:10Z" level=info msg=stream_wait connection=1 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:10Z" level=info msg=response connection=2 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:10Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:27Z" level=info msg=response connection=1 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:27Z" level=info msg=stream_wait connection=1 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:27Z" level=info msg=response connection=2 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:27Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=

Live Log Behavior

Live log behavior is a capture of the observed log behavior on an active kubernetes cluster. This will be the least reproducible, given that it is specific to my environment, but I thought sharing would be useful. I suspect the cluster being observed is somewhere between n=64 and n=128 (based on ingress and service counts).

Cluster:

services ~ 140
ingresses ~ 64
deployments ~ 120
contour replicas - 5

Log Behavior

duration - 4698s
log lines - 3546640 ~ 151 lines/sec
log bytes - 908890202B ~ 38692 bytes/sec

davecheney · 2019-03-12T07:16:38Z

@mattalberts i'm going to close this issue, its been open for 10 months without an update. In the mean time I've made some improvements to log verbosity. Hopefully this is better that what it was back in July last year and we don't need to add additional controls to logrus.

mattalberts · 2019-03-12T16:33:03Z

No worries! Thanks for the update. I'll check it out as part of migrating to v0.10.0.

Signed-off-by: Steve Kriss <krisss@vmware.com>

mattalberts mentioned this issue Jul 18, 2018

Add command-line control for logrus log-level #557

Closed

mattalberts mentioned this issue Aug 16, 2018

Observed envoy memory while adding|removing ingress #499

Closed

davecheney closed this as completed Mar 12, 2019

furdarius mentioned this issue Mar 16, 2021

Continuous stream of "handling v3 xDS resource request" messages in logs #3483

Closed

sunjayBhatia pushed a commit that referenced this issue Jan 30, 2023

add dependabot config (#556)

dc3ca1b

Signed-off-by: Steve Kriss <krisss@vmware.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add command-line control for logrus log-level #556

Add command-line control for logrus log-level #556

mattalberts commented Jul 18, 2018

mattalberts commented Jul 23, 2018 •

edited

davecheney commented Mar 12, 2019

mattalberts commented Mar 12, 2019

Add command-line control for logrus log-level #556

Add command-line control for logrus log-level #556

Comments

mattalberts commented Jul 18, 2018

mattalberts commented Jul 23, 2018 • edited

tl;dr

Summary

Manifest|Setup

Idle Log Behavior

Live Log Behavior

davecheney commented Mar 12, 2019

mattalberts commented Mar 12, 2019

mattalberts commented Jul 23, 2018 •

edited