out_es: ensure integrity of already recorded logs #2026

thotypous · 2020-03-17T14:05:10Z

The create_doc index privilege was introduced in ElasticSearch 7.5 to ensure a role can only add new logs, but never modify or delete previously recorded ones.

If the role has wider privileges than create_doc and the system running fluent-bit is compromised, one cannot ensure the integrity of logs previously stored in ElasticSearch, since the past logs could be modified by the adversary after the breach.

However, fluent-bit currently does not support running with the create_doc privilege, since it uses the index op_type, which has the semantics of changing a document if it already exists with the same _id. Therefore, any requests with the index op_type are denied for a role whose only privilege is create_doc, even if they would create a new document, e.g.:

{
   "took":464,
   "errors":true,
   "items":[
      {
         "index":{
            "_index":"myindex-test",
            "_type":"flb_type",
            "_id":"dOq6HAB5BvOnZv3fWiu0",
            "status":403,
            "error":{
               "type":"security_exception",
               "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]",
               "suppressed":[
                  {
                     "type":"security_exception",
                     "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]"
                  },
                  {
                     "type":"security_exception",
                     "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]"
                  }
               ]
            }
         }
      },
      {
         "index":{
            "_index":"myindex-test",
            "_type":"flb_type",
            "_id":"dOq6HAB5BvOnZv3fWiu1",
            "status":403,
            "error":{
               "type":"security_exception",
               "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]",
               "suppressed":[
                  {
                     "type":"security_exception",
                     "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]"
                  },
                  {
                     "type":"security_exception",
                     "reason":"action [indices:data/write/index:op_type/index] is unauthorized for user [myuser]"
                  }
               ]
            }
         }
      }
   ]
}

We solve this by replacing all index operations by the create operation, which is authorized for roles which only have the create_doc privilege.

Please note this change is backwards compatible even with very old versions of ElasticSearch.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

[N/A] Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

thotypous · 2020-03-17T14:14:47Z

[Edit: Solved below] I need some guidance regarding the following issue:

If Generate_ID is set to On in the es plugin config and a document with the same _id already exists at the ElasticSearch server (due to a retry of a previously successful operation deemed as failed by fluent-bit), it will return something like this:

{
   "took":9,
   "errors":true,
   "items":[
      {
         "create":{
            "_index":"myindex-test",
            "_type":"flb_type",
            "_id":"dOq6HAB5BvOnZv3fWiu0",
            "status":409,
            "error":{
               "type":"version_conflict_engine_exception",
               "reason":"[dOq6HAB5BvOnZv3fWiu0]: version conflict, document already exists (current version [1])",
               "index_uuid":"MHbN2sP7T8mdTy55MHTlew",
               "shard":"0",
               "index":"myindex-test"
            }
         }
      },
      {
         "create":{
            "_index":"myindex-test",
            "_type":"flb_type",
            "_id":"dOq6HAB5BvOnZv3fWiu1",
            "status":409,
            "error":{
               "type":"version_conflict_engine_exception",
               "reason":"[dOq6HAB5BvOnZv3fWiu1]: version conflict, document already exists (current version [1])",
               "index_uuid":"MHbN2sP7T8mdTy55MHTlew",
               "shard":"0",
               "index":"myindex-test"
            }
         }
      }
   ]
}

This in turn will cause the es plugin to issue a retry here. This would cause some retries in a row until fluent-bit gives up.

Currently, this does not happen because the "index" op_type is interpreted as "replace the existing document".

Do we need to handle this issue, or just let fluent-bit naturally give up retrying? It seems to be a waste of bandwidth, so I would like to handle this. Any suggestions? Should I modify elasticsearch_error_check to ignore this kind of error?

thotypous · 2020-03-18T22:29:30Z

I just updated this pull request. Now it changes the elasticsearch_error_check function to ignore errors with status 409, in order to address the concerns of my previous comment.

Please review and comment on your opinion about this approach.

thotypous · 2020-03-19T12:42:25Z

Example config:

[SERVICE]
    Flush            10
    Daemon           Off
    Log_Level        debug
    HTTP_Monitor     Off

[INPUT]
    Name    tail
    Path    /path/messages
    DB      /path/messages.db
    Parser  syslog-busybox

[OUTPUT]
    Name         es
    Host         mydomain.ufscar.br
    Port         443
    HTTP_User    myuser
    HTTP_Passwd  mypassword
    Index        myindex-test
    Generate_ID  On
    tls          On
    tls.verify   On

Valgrind and debug log, testing the version conflict behavior:

==9982== Memcheck, a memory error detector
==9982== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9982== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==9982== Command: /git/fluent-bit/build/bin/fluent-bit -v -s 24576 -c /path/fluent-bit.conf -R /path/parsers.conf
==9982==
Fluent Bit v1.4.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/03/19 09:29:14] [ info] Configuration:
[2020/03/19 09:29:14] [ info]  flush time     | 10.000000 seconds
[2020/03/19 09:29:14] [ info]  grace          | 5 seconds
[2020/03/19 09:29:14] [ info]  daemon         | 0
[2020/03/19 09:29:14] [ info] ___________
[2020/03/19 09:29:14] [ info]  inputs:
[2020/03/19 09:29:14] [ info]      tail
[2020/03/19 09:29:14] [ info] ___________
[2020/03/19 09:29:14] [ info]  filters:
[2020/03/19 09:29:14] [ info] ___________
[2020/03/19 09:29:14] [ info]  outputs:
[2020/03/19 09:29:14] [ info]      es.0
[2020/03/19 09:29:14] [ info] ___________
[2020/03/19 09:29:14] [ info]  collectors:
[2020/03/19 09:29:14] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/03/19 09:29:14] [ info] [storage] version=1.0.2, initializing...
[2020/03/19 09:29:14] [ info] [storage] in-memory
[2020/03/19 09:29:15] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/03/19 09:29:15] [ info] [engine] started (pid=9982)
[2020/03/19 09:29:15] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/03/19 09:29:15] [debug] [input:tail:tail.0] inotify watch fd=20
[2020/03/19 09:29:15] [debug] [input:tail:tail.0] scanning path /path/messages
[2020/03/19 09:29:15] [debug] [input:tail:tail.0] add to scan queue /path/messages, offset=0
[2020/03/19 09:29:17] [debug] [output:es:es.0] host=mydomain.ufscar.br port=443 uri=/_bulk index=myindex-test type=flb_type
[2020/03/19 09:29:17] [debug] [router] default match rule tail.0:es.0
[2020/03/19 09:29:17] [ info] [sp] stream processor started
[2020/03/19 09:29:17] [debug] [input:tail:tail.0] file=/path/messages read=53 lines=1
[2020/03/19 09:29:17] [debug] [input:tail:tail.0] file=/path/messages promote to TAIL_EVENT
==9982== Warning: client switching stacks?  SP change: 0x1fff0001f8 --> 0x6333240
==9982==          to suppress, use: --max-stackframe=137318158264 or greater
==9982== Warning: client switching stacks?  SP change: 0x63331b8 --> 0x1fff0001f8
==9982==          to suppress, use: --max-stackframe=137318158400 or greater
==9982== Warning: client switching stacks?  SP change: 0x1fff0001f8 --> 0x63331b8
==9982==          to suppress, use: --max-stackframe=137318158400 or greater
==9982==          further instances of this message will not be shown.
[2020/03/19 09:29:27] [debug] [task] created task=0x632ce20 id=0 OK
[2020/03/19 09:29:29] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk
[2020/03/19 09:29:29] [debug] [output:es:es.0] Elasticsearch response
{"took":3,"errors":true,"items":[{"create":{"_index":"myindex-test","_type":"flb_type","_id":"b1104f3e-6de8-6f6b-5574-defe0d86449b","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[b1104f3e-6de8-6f6b-5574-defe0d86449b]: version conflict, document already exists (current version [1])","index_uuid":"MHbN2sP7T8mdTy55MHTlew","shard":"0","index":"myindex-test"}}}]}
[2020/03/19 09:29:29] [debug] [task] destroy task=0x632ce20 (task_id=0)
^C[engine] caught signal (SIGINT)
[2020/03/19 09:29:50] [ info] [input] pausing tail.0
==9982==
==9982== HEAP SUMMARY:
==9982==     in use at exit: 5,156 bytes in 13 blocks
==9982==   total heap usage: 35,443 allocs, 35,430 frees, 13,460,535 bytes allocated
==9982==
==9982== LEAK SUMMARY:
==9982==    definitely lost: 0 bytes in 0 blocks
==9982==    indirectly lost: 0 bytes in 0 blocks
==9982==      possibly lost: 0 bytes in 0 blocks
==9982==    still reachable: 5,156 bytes in 13 blocks
==9982==         suppressed: 0 bytes in 0 blocks
==9982== Reachable blocks (those to which a pointer was found) are not shown.
==9982== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==9982==
==9982== For lists of detected and suppressed errors, rerun with: -s
==9982== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Valgrind and debug log, testing ordinary behavior:

==12456== Memcheck, a memory error detector
==12456== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12456== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12456== Command: /git/fluent-bit/build/bin/fluent-bit -v -s 24576 -c /path/fluent-bit.conf -R /path/parsers.conf
==12456==
Fluent Bit v1.4.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/03/19 09:39:12] [ info] Configuration:
[2020/03/19 09:39:12] [ info]  flush time     | 10.000000 seconds
[2020/03/19 09:39:12] [ info]  grace          | 5 seconds
[2020/03/19 09:39:12] [ info]  daemon         | 0
[2020/03/19 09:39:12] [ info] ___________
[2020/03/19 09:39:12] [ info]  inputs:
[2020/03/19 09:39:12] [ info]      tail
[2020/03/19 09:39:12] [ info] ___________
[2020/03/19 09:39:12] [ info]  filters:
[2020/03/19 09:39:12] [ info] ___________
[2020/03/19 09:39:12] [ info]  outputs:
[2020/03/19 09:39:12] [ info]      es.0
[2020/03/19 09:39:12] [ info] ___________
[2020/03/19 09:39:12] [ info]  collectors:
[2020/03/19 09:39:12] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/03/19 09:39:12] [ info] [storage] version=1.0.2, initializing...
[2020/03/19 09:39:12] [ info] [storage] in-memory
[2020/03/19 09:39:12] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/03/19 09:39:12] [ info] [engine] started (pid=12456)
[2020/03/19 09:39:12] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/03/19 09:39:13] [debug] [input:tail:tail.0] inotify watch fd=20
[2020/03/19 09:39:13] [debug] [input:tail:tail.0] scanning path /path/messages
[2020/03/19 09:39:13] [debug] [input:tail:tail.0] add to scan queue /path/messages, offset=0
[2020/03/19 09:39:15] [debug] [output:es:es.0] host=elasticsearch.sin.ufscar.br port=443 uri=/_bulk index=myindex-test type=flb_type
[2020/03/19 09:39:15] [debug] [router] default match rule tail.0:es.0
[2020/03/19 09:39:15] [ info] [sp] stream processor started
[2020/03/19 09:39:15] [debug] [input:tail:tail.0] file=/path/messages read=53 lines=1
[2020/03/19 09:39:15] [debug] [input:tail:tail.0] file=/path/messages promote to TAIL_EVENT
==12456== Warning: client switching stacks?  SP change: 0x1fff0001f8 --> 0x6333240
==12456==          to suppress, use: --max-stackframe=137318158264 or greater
==12456== Warning: client switching stacks?  SP change: 0x63331b8 --> 0x1fff0001f8
==12456==          to suppress, use: --max-stackframe=137318158400 or greater
==12456== Warning: client switching stacks?  SP change: 0x1fff0001f8 --> 0x63331b8
==12456==          to suppress, use: --max-stackframe=137318158400 or greater
==12456==          further instances of this message will not be shown.
[2020/03/19 09:39:25] [debug] [task] created task=0x632ce20 id=0 OK
[2020/03/19 09:39:27] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk
[2020/03/19 09:39:27] [debug] [output:es:es.0] Elasticsearch response
{"took":8,"errors":false,"items":[{"create":{"_index":"myindex-test","_type":"flb_type","_id":"fb14eda3-e02b-ee57-047d-5d6aa4f3136f","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":35,"_primary_term":2,"status":201}}]}
[2020/03/19 09:39:27] [debug] [task] destroy task=0x632ce20 (task_id=0)
^C[engine] caught signal (SIGINT)
[2020/03/19 09:39:30] [ info] [input] pausing tail.0
==12456==
==12456== HEAP SUMMARY:
==12456==     in use at exit: 5,156 bytes in 13 blocks
==12456==   total heap usage: 35,587 allocs, 35,574 frees, 13,465,080 bytes allocated
==12456==
==12456== LEAK SUMMARY:
==12456==    definitely lost: 0 bytes in 0 blocks
==12456==    indirectly lost: 0 bytes in 0 blocks
==12456==      possibly lost: 0 bytes in 0 blocks
==12456==    still reachable: 5,156 bytes in 13 blocks
==12456==         suppressed: 0 bytes in 0 blocks
==12456== Reachable blocks (those to which a pointer was found) are not shown.
==12456== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12456==
==12456== For lists of detected and suppressed errors, rerun with: -s
==12456== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

thotypous · 2020-03-19T12:45:58Z

I will now force push exactly the same diff just to get the CI to run again. It timed out installing clang packages.

Since ElasticSearch 7.5, the "create_doc" index privilege was introduced, which ensures a role can only add new logs, but never modify or delete previously recorded ones. However, the "index" op_type has the semantic of changing a document if it already exists with the same "_id". Therefore, any requests with the "index" op_type are denied for a role whose only privilege is "create_doc". We solve this by replacing all "index" operations by the "create" operation. However, this has the side effect of producing status 409 errors whenever a previously successful operation is retried and the Generate_ID option is turned on. Therefore, we change the "elasticsearch_error_check" function to ignore this kind of error. Signed-off-by: Paulo Matias <matias@ufscar.br>

thotypous · 2020-03-19T13:43:48Z

Well, now the clang tests passed and the gcc tests timed out installing packages 😆. At least, now we know the change passes all tests 👍

edsiper · 2020-06-30T18:32:22Z

review deferred after v1.5 release

kaay-it · 2020-11-13T15:07:34Z

Hi @fujimotos
What is about it PR?
We are realy waiting it, because it additionaly fixed issue - #2664

kaay-it · 2020-11-13T15:11:58Z

@edsiper, @PettitWesley - we need your review

fujimotos · 2020-11-13T15:16:51Z

@thotypous @AlekseyKalinin Sorry for delay. I'm fine with this patch.

kaay-it · 2020-11-16T11:34:30Z

@edsiper, @PettitWesley - we need your review

@edsiper, @PettitWesley - kindly asking you for making review

farcop · 2020-11-16T11:44:03Z

@edsiper @PettitWesley We need your reviews. Thanx in advance!

big-dima66 · 2020-11-18T12:11:23Z

@edsiper @PettitWesley We need your reviews. Thanx in advance!

BlackAlphaS · 2020-12-15T09:03:56Z

@edsiper @PettitWesley can you please approve this pull-request? Thank you!

PettitWesley · 2020-12-17T02:50:29Z

This would fall under @edsiper scope more than mine... I can take a look if needed...

However, I think normally Fujimotos approval should be sufficient for a merge....

Either way, Eduardo is the only who manages releases and also merging PRs in general. I am the AWS maintainer for the AWS plugins primarily.

marco-claudino · 2021-04-30T16:40:03Z

Hi @edsiper and @PettitWesley,

Is there any chance of work this out?
Datastream, ILM and rollover are pretty stable now in ElasticSearch, it's really bad that we can't use these features.

Thank you

Hi @fujimotos
What is about it PR?
We are realy waiting it, because it additionaly fixed issue - #2664

PettitWesley · 2021-04-30T16:43:52Z

@edsiper With Fujimotos approval, can we merge this?

PettitWesley · 2021-04-30T16:44:32Z

@thotypous @marco-claudino Looks like there are conflicts in the PR that need to be fixed.

edsiper · 2021-04-30T18:11:06Z

I trust @fujimotos review. My only requirement is to rebase this PR on top of GIT master so we can get full CI coverage (recently we moved to Github actions)

fujimotos · 2021-05-02T02:07:55Z

With Fujimotos approval, can we merge this?

@edsiper @PettitWesley I noticed the discussion in this thread. So I decided to take
a couple hours this mornig to check this PR (again) to make things sure.

I can confirm that it works. Using Elasticsearch v7.12.1, Fluent Bit can send records
without any issues. I also verified that the issue of out_es corrupting existing data is
resolved by this patch.

Attached is some screenshot from my testing. The test was done on the master HEAD
with this patch manually applied:

fujimotos · 2021-05-02T02:31:13Z

I kicked GitHub CI to check this PR, and it seems all green now too.

@edsiper @PettitWesley I'm going to step forward and merge this PR.

I'm plannning to do a rebase merge for a clean commit history (instead of
a plain merge). Here is a candidate branch created for a clean merge:

https://github.com/fluent/fluent-bit/commits/PR2026-for-merge

I'll push this PR to master tonight (around 7:00 in EST) after going back to home.
if you have any concern about this, please just let me know.

fujimotos · 2021-05-03T04:20:52Z

Merged via 7f0db9e.

paulden · 2021-05-26T12:41:24Z

Hello @fujimotos, do you know when will this commit be released? We ran into the same issue as #2664 and the fix does not look available in the latest FluentBit version (1.7.6).

Thanks a lot for your work!

fujimotos · 2021-06-01T08:53:26Z

@paulden This patch is on track of included in the next major release (v1.8.0).

Ask Eduardo about the exact release date of Fluent Bit v1.8.

thotypous changed the title ~~es: ensure integrity of already recorded logs~~ out_es: ensure integrity of already recorded logs Mar 17, 2020

edsiper assigned fujimotos May 5, 2020

edsiper added the work-in-process label May 5, 2020

thotypous requested review from edsiper and PettitWesley as code owners July 16, 2020 00:01

fujimotos previously approved these changes Nov 13, 2020

View reviewed changes

Merge branch 'master' into es-integrity

6ccda27

thotypous dismissed fujimotos’s stale review via 6ccda27 May 1, 2021 22:42

github-actions bot added the docs-required label May 1, 2021

fujimotos closed this May 3, 2021

thotypous deleted the es-integrity branch May 3, 2021 12:11

nokute78 mentioned this pull request May 29, 2021

Fluent-bit fails to send logs to an elasticsearch datastream #2664

Closed

nokute78 mentioned this pull request Aug 27, 2021

Es plugin as output for Humio stopped working after updating fluent bit version 1.8.4 #3986

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out_es: ensure integrity of already recorded logs #2026

out_es: ensure integrity of already recorded logs #2026

thotypous commented Mar 17, 2020 •

edited

thotypous commented Mar 17, 2020 •

edited

thotypous commented Mar 18, 2020 •

edited

thotypous commented Mar 19, 2020

thotypous commented Mar 19, 2020

thotypous commented Mar 19, 2020 •

edited

edsiper commented Jun 30, 2020

kaay-it commented Nov 13, 2020

kaay-it commented Nov 13, 2020

fujimotos commented Nov 13, 2020

kaay-it commented Nov 16, 2020

farcop commented Nov 16, 2020

big-dima66 commented Nov 18, 2020

BlackAlphaS commented Dec 15, 2020

PettitWesley commented Dec 17, 2020 •

edited

marco-claudino commented Apr 30, 2021

PettitWesley commented Apr 30, 2021

PettitWesley commented Apr 30, 2021

edsiper commented Apr 30, 2021

fujimotos commented May 2, 2021

fujimotos commented May 2, 2021 •

edited

fujimotos commented May 3, 2021

paulden commented May 26, 2021

fujimotos commented Jun 1, 2021

out_es: ensure integrity of already recorded logs #2026

out_es: ensure integrity of already recorded logs #2026

Conversation

thotypous commented Mar 17, 2020 • edited

thotypous commented Mar 17, 2020 • edited

thotypous commented Mar 18, 2020 • edited

thotypous commented Mar 19, 2020

thotypous commented Mar 19, 2020

thotypous commented Mar 19, 2020 • edited

edsiper commented Jun 30, 2020

kaay-it commented Nov 13, 2020

kaay-it commented Nov 13, 2020

fujimotos commented Nov 13, 2020

kaay-it commented Nov 16, 2020

farcop commented Nov 16, 2020

big-dima66 commented Nov 18, 2020

BlackAlphaS commented Dec 15, 2020

PettitWesley commented Dec 17, 2020 • edited

marco-claudino commented Apr 30, 2021

PettitWesley commented Apr 30, 2021

PettitWesley commented Apr 30, 2021

edsiper commented Apr 30, 2021

fujimotos commented May 2, 2021

fujimotos commented May 2, 2021 • edited

fujimotos commented May 3, 2021

paulden commented May 26, 2021

fujimotos commented Jun 1, 2021

thotypous commented Mar 17, 2020 •

edited

thotypous commented Mar 17, 2020 •

edited

thotypous commented Mar 18, 2020 •

edited

thotypous commented Mar 19, 2020 •

edited

PettitWesley commented Dec 17, 2020 •

edited

fujimotos commented May 2, 2021 •

edited