Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates fluentd in fluentd-es-image to fluentd 1.1.0 #58525

Merged
merged 7 commits into from
Jan 23, 2018

Conversation

monotek
Copy link
Member

@monotek monotek commented Jan 19, 2018

What this PR does / why we need it: Updates fluentd in fluentd-es-image to fluentd 1.1.0.

Its also needed to be able to use the new fluentd-elasticsearch helm chart from: helm/charts#3379

Release note:

updates fluentd in fluentd-es-image to fluentd 1.1.0

André Bauer added 2 commits January 10, 2018 15:20
added configmap changes

raised fluentd-es-configmap version

fixed missing version match
@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 19, 2018
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2018
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 19, 2018
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2018
@coffeepac
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 19, 2018
@coffeepac
Copy link
Contributor

@monotek you need to format your relases note comment correctly. The release note should be in the grey box when previewing the PR description

@coffeepac
Copy link
Contributor

@monotek keeping up with new versions is always good but between this and the linked chart PR you are implying there is a concrete reason for this upgrade. Can you share the reason?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 19, 2018
@monotek
Copy link
Member Author

monotek commented Jan 19, 2018

@coffepac
With the new fluentd versions its possible to use the "@type record_transformer" in fluentd.conf without needing to install the "record reformer" plugin as before.

That the reason we like to use the newer fluentd version in our setup and therefore its used for this pull too.

path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
pos_file /var/log/fluentd-containers.log.pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to change this? could have adverse effects if people are upgrading.

format syslog
path /var/log/startupscript.log
pos_file /var/log/es-startupscript.log.pos
pos_file /var/log/startupscript.log.pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, is there a reason to change this file name?

format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
path /var/log/docker.log
pos_file /var/log/es-docker.log.pos
pos_file /var/log/docker.log.pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name change, same as before

filters [{ "_SYSTEMD_UNIT": "docker.service" }]
pos_file /var/log/gcp-journald-docker.pos
#pos_file /var/log/journald-docker.pos
Copy link
Contributor

@coffeepac coffeepac Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not need the position file here? What happens if the fluentd container gets restarted

EDIT: I'm guessing the block takes care of it. if so, best to delete the line then leave it commented out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for barging in 😄 but since you have read_from_head true, I don't think it's wise to remove the pos_file, this combination will result in fluentd reading all the events in the journal when it is restarted as there's no reference where it last stopped, this could lead to huge bursts of re-ingested logs.

Copy link
Member Author

@monotek monotek Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos_file is deprecated in the new plugin version. see: https://github.com/reevoo/fluent-plugin-systemd/blob/1e14ace403891b9aa745dc8a2a920103e5fda21e/README.md

pos_file

This parameter is deprecated and will be removed in favour of storage in v1.0.

Path to pos file, stores the journald cursor. File is created if does not exist.

storage

Configuration for a storage plugin used to store the journald cursor.

Upgrading from pos_file

If pos_file is specified in addition to a storage plugin with persistent set to true, the cursor will be copied from the pos_file on startup, and the old pos_file removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monotek you're absolutely right, but should you consider the "upgrade" scenario? keeping the exisiting pos_file configuration will ensure the cursor is copied to the new storage.

Also, shouldn't there be a path in the <storage> section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannyk81
Not necessarily. I've used the root_dir option. See: https://docs.fluentd.org/v1.0/articles/storage_local

This is the reason for the new system.conf file: https://github.com/monotek/kubernetes/blob/0aff9bcf0402179c2dd5d11da31fccf3b228b5a1/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml#L9

Beside that you'll get deprecation warnings when you still use the pos files. See:
https://github.com/reevoo/fluent-plugin-systemd/blob/dbd7359c6a8ce7e3b7ad9cfa447c35a5ec2fde04/lib/fluent/plugin/in_systemd.rb#L18

So i thought its better to have clean log when fluentd is started.

disable_retry_limit
# Use multiple threads for processing.
num_threads 2
@id elasticsearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this spacing issue.

@coffeepac
Copy link
Contributor

/approve no-issue

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coffeepac, monotek

Associated issue requirement bypassed by: coffeepac

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 23, 2018

@monotek: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kubeadm-gce-canary ec187f7 link /test pull-kubernetes-e2e-kubeadm-gce-canary

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@StevenACoffman
Copy link
Contributor

@fejta that was a lot of recurrent failures before it actually succeeded. You're probably aware, but, in case you weren't. whew.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 56206, 58525). If you want to cherry-pick this change to another branch, please follow the instructions here.

@Random-Liu
Copy link
Member

Random-Liu commented Jan 25, 2018

I believe the this PR broke the elastic logging test https://k8s-testgrid.appspot.com/google-gce#gci-gce-es-logging.

@kubernetes/sig-instrumentation-bugs

@StevenACoffman
Copy link
Contributor

For reference, the equivalent daemonset version is being updated in this branch.

@monotek
Copy link
Member Author

monotek commented Jan 26, 2018

@Random-Liu Pretty sure this is because the needed 2.0.4 image is not yet available in gcr.io. See: https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/fluentd-elasticsearch

@monotek
Copy link
Member Author

monotek commented Jan 31, 2018

Could somebody please trigger the 2.0.4 container image build?
@coffeepac ?

@StevenACoffman
Copy link
Contributor

@piosz or @crassirostris can one of you two build and push the new 2.0.4 version?

@monotek
Copy link
Member Author

monotek commented Jan 31, 2018

By the way... Why is this not build automatically?

@coffeepac
Copy link
Contributor

needs to be triggered by a googler, I am not a googler.

I've got a sig-instrumentation task to make this not-google specific. but then we get a bit tied up with the general issue of getting k8s images built without requiring a google eng to push the button.

@monotek
Copy link
Member Author

monotek commented Feb 4, 2018

@Random-Liu

Can you trigger the build?

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue (batch tested with PRs 56206, 58525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

updates fluentd in fluentd-es-image to fluentd 1.1.0

**What this PR does / why we need it**: Updates fluentd in fluentd-es-image to fluentd 1.1.0.

Its also needed to be able to use the new fluentd-elasticsearch helm chart from: helm/charts#3379


**Release note**:
```release-note
updates fluentd in fluentd-es-image to fluentd 1.1.0
```
@monotek
Copy link
Member Author

monotek commented Feb 13, 2018

Maybe we also have to wait for: #59665

@monotek
Copy link
Member Author

monotek commented Feb 14, 2018

The dockerfile was fixed in #59665.
Could somebody trigger the build again please?

@crassirostris
Copy link

@monotek done already, 2.0.4 is built and pushed

@monotek
Copy link
Member Author

monotek commented Feb 14, 2018

Thanks! :-)

path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag raw.kubernetes.*
format json
read_from_head true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove the multi_format? Without this elasticsearch-fluentd won't work with containerd or cri-o.

@monotek
Copy link
Member Author

monotek commented Mar 28, 2018

Sorry. Wasn't aware of that.
I'm also likely not able to review / fix / test before beginning / mid of April.

@Random-Liu
Copy link
Member

@monotek I sent out a fix #61818.

k8s-github-robot pushed a commit that referenced this pull request Apr 3, 2018
Automatic merge from submit-queue (batch tested with PRs 61818, 61800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add CRI container log format support back for elastic search.

The CRI container log format support was removed accidentally in #58525. This PR adds that back.

I've tested it, and it works:
```
SSSSS
------------------------------
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch] 
  should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Mar 28 08:09:01.724: INFO: >>> kubeConfig: /home/lantaol/.kube/config
STEP: Building a namespace api object
Mar 28 08:09:02.952: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:32
[It] should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
Mar 28 08:09:02.988: INFO: Checking the Elasticsearch service exists.
Mar 28 08:09:03.025: INFO: Checking to make sure the Elasticsearch pods are running
Mar 28 08:09:03.066: INFO: Checking to make sure we are talking to an Elasticsearch service.
Mar 28 08:09:03.176: INFO: Checking health of Elasticsearch service.
Mar 28 08:09:03.299: INFO: Starting repeating logging pod synthlogger
STEP: Waiting for logs to ingest
Mar 28 08:09:17.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:27.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:37.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:47.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:57.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:10:07.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
[AfterEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:142
Mar 28 08:10:07.607: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-es-logging-pqlx7" for this suite.
Mar 28 08:10:57.758: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Mar 28 08:11:00.046: INFO: namespace: e2e-tests-es-logging-pqlx7, resource: bindings, ignored listing per whitelist
Mar 28 08:11:00.338: INFO: namespace e2e-tests-es-logging-pqlx7 deletion completed in 52.693713026s

• [SLOW TEST:118.614 seconds]
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/common/framework.go:23
  should check that logs from containers are ingested into Elasticsearch
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSMar 28 08:11:00.346: INFO: Running AfterSuite actions on all node
Mar 28 08:11:00.346: INFO: Running AfterSuite actions on node 1

Ran 1 of 845 Specs in 123.981 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 844 Skipped PASS

Ginkgo ran 1 suite in 2m4.323020647s
Test Suite Passed
2018/03/28 08:11:00 process.go:152: Step './hack/ginkgo-e2e.sh --ginkgo.focus=Cluster\slevel\slogging\susing\sElasticsearch' finished in 2m5.943972428s
2018/03/28 08:11:00 e2e.go:83: Done
```

Mark 1.10, because this is a regression for CRI container runtimes in 1.10.

The original support was added in 1.9. #54777

**Release note**:

```release-note
none
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet