Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command-line control for logrus log-level #556

Closed
mattalberts opened this issue Jul 18, 2018 · 3 comments
Closed

Add command-line control for logrus log-level #556

mattalberts opened this issue Jul 18, 2018 · 3 comments

Comments

@mattalberts
Copy link

@davecheney

If you're interested, I've added a command-line option to configure the logrus log-level. I added the option to serve command (rather than globally to the app), happy to move it if you prefer a wider scope.

Example

      containers:
        - name: contour
          image: gcr.io/heptio-images/contour:master
          imagePullPolicy: Always
          command: ["contour"]
          args:
            - serve
            - --incluster
            - --log-level=warn
            - --ingress-class-name=contour
@mattalberts
Copy link
Author

mattalberts commented Jul 23, 2018

@davecheney

Following up from the PR. I thought I'd capture some contour log information and share it to see what/where we'd like to do. I'll try to share enough information to make the results reproducible.

tl;dr

  • idle log rates at 128 services/ingress
    • ~ 138 lines/sec
    • ~ 33012 bytes/sec
  • live log rates
    • ~ 151 lines/sec
    • ~ 38692 bytes/sec

That's about 3GB a day per-replica; I'm currently running 5 replica's (~15GB a day).

Summary

We can break (contour) logging behavior into 4 categories

  • idle - services created, no request traffic, no changes to watched resources
  • request - services created, active request traffic, no changes to watched resources
  • watch - services created, no request traffic, active changes to watched resources
  • live - monitor a live system experiencing all three of the above within a grouping identified by the above.

I started with idle and live but skipped request and watch. if we find value in isolating those variables, I can extend my tooling. The results from idle and live were enough to suggest a log-level configuration.

Manifest|Setup

---
apiVersion: v1
kind: Service
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: service
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: $META_NAME
    role: service
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: $META_NAME
      role: service
  template:
    metadata:
      labels:
        app: $META_NAME
        role: service
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: echo
        image: k8s.gcr.io/echoserver:1.10
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: $META_NAME
  labels:
    app: $META_NAME
    role: ingress
  annotations:
    kubernetes.io/ingress.class: contour
spec:
  tls:
  - hosts:
    - $SPEC_HOST
    secretName: xip-io
  rules:
  - host: $SPEC_HOST
    http:
      paths:
      - path: /
        backend:
          serviceName: $META_NAME
          servicePort: 80
---
  • image: gcr.io/heptio-images/contour:master
  • META_NAME and SPEC_HOST are template variables

Idle Log Behavior

Idle log behavior is simply a capture of the logging behavior observed at rest (e.g. the ingress controller is not serving traffic and there are no active changes to the items under watch). Here, we allow the data collection to range across n triples (built from the above manifest).

  • n=1 ~ 1 service, 1deployment (with 1 replica), 1 ingress (with tls)
n t0 tf dt lines bytes dl/dt db/dt
1 2:27:58 2:59:01 1,863.00 2047 482607 1.098765432 259.0483092
2 2:59:10 3:30:47 1,897.00 4126 974288 2.175013179 513.5940959
4 3:30:55 4:01:30 1,835.00 7984 1885548 4.350953678 1027.546594
8 4:01:42 4:32:29 1,847.00 16412 3878805 8.885760693 2100.056849
16 4:32:46 5:03:35 1,849.00 32500 7732124 17.57706869 4181.786912
32 5:03:58 5:34:26 1,828.00 64756 15404704 35.42450766 8427.080963
64 5:34:57 6:05:48 1,851.00 131124 31232120 70.83954619 16873.10643
128 6:06:51 6:37:54 1,863.00 257556 61500964 138.2479871 33011.78959

idle log line velocity

idle log byte velocity

Example Logs where N=0

time="2018-07-24T15:46:59Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:10Z" level=info msg=response connection=1 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:10Z" level=info msg=stream_wait connection=1 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:10Z" level=info msg=response connection=2 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:10Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:27Z" level=info msg=response connection=1 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:27Z" level=info msg=stream_wait connection=1 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=
time="2018-07-24T15:47:27Z" level=info msg=response connection=2 context=grpc count=0 error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=
time="2018-07-24T15:47:27Z" level=info msg=stream_wait connection=2 context=grpc error_detail=nil resource_names="[]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.Listener version_info=

Live Log Behavior

Live log behavior is a capture of the observed log behavior on an active kubernetes cluster. This will be the least reproducible, given that it is specific to my environment, but I thought sharing would be useful. I suspect the cluster being observed is somewhere between n=64 and n=128 (based on ingress and service counts).

Cluster:

  • services ~ 140
  • ingresses ~ 64
  • deployments ~ 120
  • contour replicas - 5

Log Behavior

  • duration - 4698s
  • log lines - 3546640 ~ 151 lines/sec
  • log bytes - 908890202B ~ 38692 bytes/sec

@davecheney
Copy link
Contributor

@mattalberts i'm going to close this issue, its been open for 10 months without an update. In the mean time I've made some improvements to log verbosity. Hopefully this is better that what it was back in July last year and we don't need to add additional controls to logrus.

@mattalberts
Copy link
Author

No worries! Thanks for the update. I'll check it out as part of migrating to v0.10.0.

sunjayBhatia pushed a commit that referenced this issue Jan 30, 2023
Signed-off-by: Steve Kriss <krisss@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants