New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates fluentd in fluentd-es-image to fluentd 1.1.0 #58525

Merged
merged 7 commits into from Jan 23, 2018

Conversation

@monotek
Contributor

monotek commented Jan 19, 2018

What this PR does / why we need it: Updates fluentd in fluentd-es-image to fluentd 1.1.0.

Its also needed to be able to use the new fluentd-elasticsearch helm chart from: helm/charts#3379

Release note:

updates fluentd in fluentd-es-image to fluentd 1.1.0

monotek added some commits Jan 10, 2018

added fluent-plugin-detect-exceptions plugin to fluentd-es-image
added configmap changes

raised fluentd-es-configmap version

fixed missing version match

@k8s-ci-robot k8s-ci-robot added size/S and removed size/M labels Jan 19, 2018

@coffeepac

This comment has been minimized.

Member

coffeepac commented Jan 19, 2018

/ok-to-test

@coffeepac

This comment has been minimized.

Member

coffeepac commented Jan 19, 2018

@monotek you need to format your relases note comment correctly. The release note should be in the grey box when previewing the PR description

@coffeepac

This comment has been minimized.

Member

coffeepac commented Jan 19, 2018

@monotek keeping up with new versions is always good but between this and the linked chart PR you are implying there is a concrete reason for this upgrade. Can you share the reason?

@k8s-ci-robot k8s-ci-robot added size/L and removed size/S labels Jan 19, 2018

@monotek

This comment has been minimized.

Contributor

monotek commented Jan 19, 2018

@coffepac
With the new fluentd versions its possible to use the "@type record_transformer" in fluentd.conf without needing to install the "record reformer" plugin as before.

That the reason we like to use the newer fluentd version in our setup and therefore its used for this pull too.

path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
pos_file /var/log/fluentd-containers.log.pos

This comment has been minimized.

@coffeepac

coffeepac Jan 19, 2018

Member

is there a reason to change this? could have adverse effects if people are upgrading.

format syslog
path /var/log/startupscript.log
pos_file /var/log/es-startupscript.log.pos
pos_file /var/log/startupscript.log.pos

This comment has been minimized.

@coffeepac

coffeepac Jan 19, 2018

Member

same as above, is there a reason to change this file name?

format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
path /var/log/docker.log
pos_file /var/log/es-docker.log.pos
pos_file /var/log/docker.log.pos

This comment has been minimized.

@coffeepac

coffeepac Jan 19, 2018

Member

file name change, same as before

filters [{ "_SYSTEMD_UNIT": "docker.service" }]
pos_file /var/log/gcp-journald-docker.pos
#pos_file /var/log/journald-docker.pos

This comment has been minimized.

@coffeepac

coffeepac Jan 19, 2018

Member

do we not need the position file here? What happens if the fluentd container gets restarted

EDIT: I'm guessing the block takes care of it. if so, best to delete the line then leave it commented out

This comment has been minimized.

@dannyk81

dannyk81 Jan 19, 2018

Sorry for barging in 😄 but since you have read_from_head true, I don't think it's wise to remove the pos_file, this combination will result in fluentd reading all the events in the journal when it is restarted as there's no reference where it last stopped, this could lead to huge bursts of re-ingested logs.

This comment has been minimized.

@monotek

monotek Jan 20, 2018

Contributor

pos_file is deprecated in the new plugin version. see: https://github.com/reevoo/fluent-plugin-systemd/blob/1e14ace403891b9aa745dc8a2a920103e5fda21e/README.md

pos_file

This parameter is deprecated and will be removed in favour of storage in v1.0.

Path to pos file, stores the journald cursor. File is created if does not exist.

storage

Configuration for a storage plugin used to store the journald cursor.

Upgrading from pos_file

If pos_file is specified in addition to a storage plugin with persistent set to true, the cursor will be copied from the pos_file on startup, and the old pos_file removed.

This comment has been minimized.

@dannyk81

dannyk81 Jan 20, 2018

@monotek you're absolutely right, but should you consider the "upgrade" scenario? keeping the exisiting pos_file configuration will ensure the cursor is copied to the new storage.

Also, shouldn't there be a path in the <storage> section?

This comment has been minimized.

@monotek

monotek Jan 20, 2018

Contributor

@dannyk81
Not necessarily. I've used the root_dir option. See: https://docs.fluentd.org/v1.0/articles/storage_local

This is the reason for the new system.conf file: https://github.com/monotek/kubernetes/blob/0aff9bcf0402179c2dd5d11da31fccf3b228b5a1/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml#L9

Beside that you'll get deprecation warnings when you still use the pos files. See:
https://github.com/reevoo/fluent-plugin-systemd/blob/dbd7359c6a8ce7e3b7ad9cfa447c35a5ec2fde04/lib/fluent/plugin/in_systemd.rb#L18

So i thought its better to have clean log when fluentd is started.

disable_retry_limit
# Use multiple threads for processing.
num_threads 2
@id elasticsearch

This comment has been minimized.

@coffeepac

coffeepac Jan 19, 2018

Member

thanks for fixing this spacing issue.

@coffeepac

This comment has been minimized.

Member

coffeepac commented Jan 19, 2018

/approve no-issue

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 19, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coffeepac, monotek

Associated issue requirement bypassed by: coffeepac

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

fejta-bot commented Jan 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 23, 2018

@monotek: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kubeadm-gce-canary ec187f7 link /test pull-kubernetes-e2e-kubeadm-gce-canary

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot

This comment has been minimized.

fejta-bot commented Jan 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@StevenACoffman

This comment has been minimized.

Contributor

StevenACoffman commented Jan 23, 2018

@fejta that was a lot of recurrent failures before it actually succeeded. You're probably aware, but, in case you weren't. whew.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Jan 23, 2018

Automatic merge from submit-queue (batch tested with PRs 56206, 58525). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 5b9ba2f into kubernetes:master Jan 23, 2018

13 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation monotek authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke-gci Jenkins job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@piosz piosz removed their assignment Jan 25, 2018

@Random-Liu

This comment has been minimized.

Member

Random-Liu commented Jan 25, 2018

@StevenACoffman

This comment has been minimized.

Contributor

StevenACoffman commented Jan 26, 2018

For reference, the equivalent daemonset version is being updated in this branch.

@monotek

This comment has been minimized.

Contributor

monotek commented Jan 26, 2018

@Random-Liu Pretty sure this is because the needed 2.0.4 image is not yet available in gcr.io. See: https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/fluentd-elasticsearch

@monotek

This comment has been minimized.

Contributor

monotek commented Jan 31, 2018

Could somebody please trigger the 2.0.4 container image build?
@coffeepac ?

@StevenACoffman

This comment has been minimized.

Contributor

StevenACoffman commented Jan 31, 2018

@piosz or @crassirostris can one of you two build and push the new 2.0.4 version?

@monotek

This comment has been minimized.

Contributor

monotek commented Jan 31, 2018

By the way... Why is this not build automatically?

@coffeepac

This comment has been minimized.

Member

coffeepac commented Feb 2, 2018

needs to be triggered by a googler, I am not a googler.

I've got a sig-instrumentation task to make this not-google specific. but then we get a bit tied up with the general issue of getting k8s images built without requiring a google eng to push the button.

@monotek

This comment has been minimized.

Contributor

monotek commented Feb 4, 2018

@Random-Liu

Can you trigger the build?

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018

Merge pull request kubernetes#58525 from monotek/fluentd-1.1.0
Automatic merge from submit-queue (batch tested with PRs 56206, 58525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

updates fluentd in fluentd-es-image to fluentd 1.1.0

**What this PR does / why we need it**: Updates fluentd in fluentd-es-image to fluentd 1.1.0.

Its also needed to be able to use the new fluentd-elasticsearch helm chart from: helm/charts#3379


**Release note**:
```release-note
updates fluentd in fluentd-es-image to fluentd 1.1.0
```
@vendrov

This comment has been minimized.

Contributor

vendrov commented on cbfcc3f Feb 12, 2018

@monotek the new image v2.0.4 can't be downloaded, seems like the image is been missing from gcr

This comment has been minimized.

Contributor

monotek replied Feb 12, 2018

Known issue. We have to wait for a googler to trigger the build. Should happen soon.

@monotek

This comment has been minimized.

Contributor

monotek commented Feb 13, 2018

Maybe we also have to wait for: #59665

@monotek

This comment has been minimized.

Contributor

monotek commented Feb 14, 2018

The dockerfile was fixed in #59665.
Could somebody trigger the build again please?

@crassirostris

This comment has been minimized.

Member

crassirostris commented Feb 14, 2018

@monotek done already, 2.0.4 is built and pushed

@monotek

This comment has been minimized.

Contributor

monotek commented Feb 14, 2018

Thanks! :-)

path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag raw.kubernetes.*
format json
read_from_head true

This comment has been minimized.

@Random-Liu

Random-Liu Mar 28, 2018

Member

Why do we remove the multi_format? Without this elasticsearch-fluentd won't work with containerd or cri-o.

@monotek

This comment has been minimized.

Contributor

monotek commented Mar 28, 2018

Sorry. Wasn't aware of that.
I'm also likely not able to review / fix / test before beginning / mid of April.

@Random-Liu

This comment has been minimized.

Member

Random-Liu commented Mar 28, 2018

@monotek I sent out a fix #61818.

k8s-merge-robot added a commit that referenced this pull request Apr 3, 2018

Merge pull request #61818 from Random-Liu/fix-elastic-search
Automatic merge from submit-queue (batch tested with PRs 61818, 61800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add CRI container log format support back for elastic search.

The CRI container log format support was removed accidentally in #58525. This PR adds that back.

I've tested it, and it works:
```
SSSSS
------------------------------
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch] 
  should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Mar 28 08:09:01.724: INFO: >>> kubeConfig: /home/lantaol/.kube/config
STEP: Building a namespace api object
Mar 28 08:09:02.952: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:32
[It] should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
Mar 28 08:09:02.988: INFO: Checking the Elasticsearch service exists.
Mar 28 08:09:03.025: INFO: Checking to make sure the Elasticsearch pods are running
Mar 28 08:09:03.066: INFO: Checking to make sure we are talking to an Elasticsearch service.
Mar 28 08:09:03.176: INFO: Checking health of Elasticsearch service.
Mar 28 08:09:03.299: INFO: Starting repeating logging pod synthlogger
STEP: Waiting for logs to ingest
Mar 28 08:09:17.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:27.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:37.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:47.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:57.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:10:07.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
[AfterEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:142
Mar 28 08:10:07.607: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-es-logging-pqlx7" for this suite.
Mar 28 08:10:57.758: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Mar 28 08:11:00.046: INFO: namespace: e2e-tests-es-logging-pqlx7, resource: bindings, ignored listing per whitelist
Mar 28 08:11:00.338: INFO: namespace e2e-tests-es-logging-pqlx7 deletion completed in 52.693713026s

• [SLOW TEST:118.614 seconds]
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/common/framework.go:23
  should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSMar 28 08:11:00.346: INFO: Running AfterSuite actions on all node
Mar 28 08:11:00.346: INFO: Running AfterSuite actions on node 1

Ran 1 of 845 Specs in 123.981 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 844 Skipped PASS

Ginkgo ran 1 suite in 2m4.323020647s
Test Suite Passed
2018/03/28 08:11:00 process.go:152: Step './hack/ginkgo-e2e.sh --ginkgo.focus=Cluster\slevel\slogging\susing\sElasticsearch' finished in 2m5.943972428s
2018/03/28 08:11:00 e2e.go:83: Done
```

Mark 1.10, because this is a regression for CRI container runtimes in 1.10.

The original support was added in 1.9. #54777

**Release note**:

```release-note
none
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment