Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activating Cluster level logging sends lots of backslashes on fluentd target #24545

Closed
izaac opened this issue Dec 12, 2019 · 13 comments
Closed
Assignees
Labels
area/logging kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement [zube]: Done
Milestone

Comments

@izaac
Copy link
Contributor

izaac commented Dec 12, 2019

Activating Cluster level logging sends lots of backslashes making impossible to read the logs.

What kind of request is this (question/bug/enhancement/feature request):
Bug

Steps to reproduce (least amount of steps as possible):
Spin up a new cluster using rancher 2.3-head. I used [INFO] Rancher version 59110e9
Activate cluster logging using fluentd as target. I've used fluentd:v1.7.4-debian-2.0

fluent.conf:

<source>
  @type forward
  port 9881
  bind 0.0.0.0
</source>
<match **>
  @type stdout
</match>

Result:
Many backslashes send in the logged events to fluentd.

Other details that may be helpful:

Environment information

  • Rancher version : Rancher version` 59110e9
  • Installation option (single install/HA): Single

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): GKE
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): cloud
  • Kubernetes version (use kubectl version):
v1.14.8-gke.17
  • Docker version (use docker version):
18.9.7
@izaac izaac added area/logging kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement labels Dec 12, 2019
@izaac izaac added this to the v2.3.4 milestone Dec 12, 2019
@izaac izaac self-assigned this Dec 12, 2019
@izaac
Copy link
Contributor Author

izaac commented Dec 12, 2019

Validated Rancher version: 2.3.3 I didn't reproduce the issue.

@izaac
Copy link
Contributor Author

izaac commented Dec 12, 2019

This seems to happen more often with the "enableJSONParsing": bool cluster level settings enabling and disabling it with the API seems somehow trigger this behavior. @dramich

@dramich dramich modified the milestones: v2.3.4, v2.4 Dec 12, 2019
@dramich
Copy link
Contributor

dramich commented Dec 12, 2019

#24157

@izaac
Copy link
Contributor Author

izaac commented Dec 12, 2019

It doesn't seem to be the same scenario. The result is the same for sure but not enabling the enableJSONParsing in 2.3.4-rc3 Makes the logs to behave as expected but as soon as I turn it to true the backslash started flowing. @dramich

@loganhz
Copy link

loganhz commented Dec 24, 2019

Root Cause

Cluster logging will collect logs from many components. But not all of the logs are in JSON format.

So fluentd will throw a warning level log when we do the parse if it is not a JSON format log. This is expected.

However, rancher fluentd is picking up and processing its own logs, which is causing a feedback loop - every time it reads a log with JSON escaped strings, it has to escape it again before printing, which leads to lots of backslashes to escape other backslashes.

Solution

Use multi_format plugin to avoid fluentd logging warning when parsing the log to JSON if it's not in JSON format.

  • Step 1: Release a new rancher/fluentd image with multi_format plugin installed
  • Step 2: Add a new version of logging in system chart to bump the rancher/fluentd version
  • Step 3: Create a rancher PR to switch to use multi_format.
  <parse>
    @type multi_format
    <pattern>
      format json
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>

Other

Maybe we should disable it by hiding the Enable JSON Parsing option in UI in 2.3.4

@loganhz
Copy link

loganhz commented Dec 24, 2019

@izaac Just FYI

If you deploy the target fluentd in the same k8s cluster, please use file instead of stdout. Otherwise, you will still hit the similar feedback loop issue even with the fix.

<source>
  @type forward
  port 9881
  bind 0.0.0.0
</source>
<match **>
  @type file
  path /tmp/file
</match>

@cloudnautique
Copy link
Contributor

@izaac, I think we should hold this open until we can add the multiformat plugin and log the fluentd pods to a file. The file should be rotated either by date and/or size so that it doesn't overrun the underlying filesystem.

@loganhz
Copy link

loganhz commented Dec 26, 2019

shanewxy added a commit to shanewxy/rancher that referenced this issue Dec 27, 2019
2. Fixed fluentd non-json parsing bug by multi_format plugin rancher#24545
shanewxy added a commit to shanewxy/rancher that referenced this issue Dec 27, 2019
2. Fixed fluentd non-json parsing bug by multi_format plugin rancher#24545
deniseschannon pushed a commit that referenced this issue Dec 27, 2019
2. Fixed fluentd non-json parsing bug by multi_format plugin #24545
deniseschannon pushed a commit that referenced this issue Dec 27, 2019
2. Fixed fluentd non-json parsing bug by multi_format plugin #24545
@izaac
Copy link
Contributor Author

izaac commented Dec 27, 2019

These issues were validated working after these fixes:

#24367
#23646

2.3-head
2019/12/27 16:05:06 [INFO] Rancher version e9a549e59 is starting

Master-head
2019/12/27 16:33:53 [INFO] Rancher version 189e4337c is starting

I'm still going to do a couple of tests on windows nodes as that was also updated as part of these set of changes.

@izaac
Copy link
Contributor Author

izaac commented Dec 27, 2019

Rancher version: 2.3-head (12/27/2019) e9a549e
EC2 setup with

  • 1 master node (etcd/control plane) Ubuntu 18.04.3
  • 1 Linux worker - Ubuntu 18.04.3
  • 1 Windows worker - Windows_Server-2019-English-Full-ContainersLatest-2019.12.16 (ami-0ff0566e781481cd9)

I activate the fluentd logging and it seems normal, it's no longer showing the backslashes in the logs but the rancher-logging-fluentd-windows pod keeps crashing it stays in Available state for a little while but then it shows:

CrashLoopBackOff: back-off 5m0s restarting failed container=rancher-logging-fluentd pod=rancher-logging-fluentd-windows-gkpmm_cattle-logging(5169351f-9fac-4d53-b077-cd4b58a8f732) |  

@loganhz
Copy link

loganhz commented Dec 28, 2019

logging only works in host gw mode. It’s a known issue. I guess you are using vxlan

@izaac
Copy link
Contributor Author

izaac commented Dec 28, 2019

@loganhz oh yes, I was using vxlan

@sowmyav27
Copy link
Contributor

sowmyav27 commented Dec 30, 2019

Verified on 2.3-head latest - commit id: ed6537bfa

Windows setup - AWS nodes

1 all-roles node (etcd/control plane/worker) Ubuntu 18.04.3
1 Windows worker - Windows_Server-2019 container
Gateway mode - followed steps from here https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/windows-clusters/host-gateway-requirements/#disabling-private-ip-address-checks
k8s - 1.16.4

Fluentd logging was enabled on the cluster
No longer showing the backslashes in the logs - rancher-logging-fluentd
Screen Shot 2019-12-30 at 12 36 05 PM

Verified on master-head latest - commit id: 7578b3c76

Windows setup - AWS nodes

1 all-roles node (etcd/control plane/worker) Ubuntu 18.04.3
1 Windows worker - Windows_Server-2019 container
Gateway mode - followed steps from here https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/windows-clusters/host-gateway-requirements/#disabling-private-ip-address-checks
k8s - 1.16.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement [zube]: Done
Projects
None yet
Development

No branches or pull requests

8 participants