Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Parser Regex seems works improperly #2250

Closed
Antiarchitect opened this issue Jun 11, 2020 · 7 comments · Fixed by #2255
Closed

Filter Parser Regex seems works improperly #2250

Antiarchitect opened this issue Jun 11, 2020 · 7 comments · Fixed by #2255

Comments

@Antiarchitect
Copy link

Antiarchitect commented Jun 11, 2020

FluentBit fluent/fluent-bit:1.4.6 - official Docker image
I have this parser section:

    [PARSER]
        Name   replicaset
        Format regex
        Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pods_generation>[0-9a-f]+)-(?<k8s_pod_id>[[:alnum:]]{5}))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

And this is how it's used:

    [FILTER]
        Name         parser
        Match        k8s
        Key_Name     source
        Preserve_Key On
        Reserve_Data On
        Parser       replicaset
        Parser       statefulset
        Parser       selfmanaged

And this value is in the source field:

/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log

According to the rules, it should hit the first parser.

And https://rubular.com/ gives me correct results:

k8s_namespace | staging-ci-app-enapter
k8s_pod_fullname | apps-services-provisioning-7858c7f7f9-55tsr
k8s_pod_name | apps-services-provisioning
k8s_pods_generation | 7858c7f7f9
k8s_pod_id | 55tsr
k8s_container_name | service

As many other online regex services i.e. https://regex101.com/r/3Gt1UE/2

But I'm getting this in my Elastic:

{
  "_source": {
    "k8s_namespace": "staging-ci-app-enapter",
    "k8s_pod_fullname": "apps-services-provisioning-7858c7f7f9-55tsr",
    "k8s_pod_name": "apps-services-provisioning-7858c7f7f9",
    "k8s_pod_id": "55tsr",
    "k8s_container_name": "service",
    "source": "/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log",
    ...
  },
}

As you can see k8s_pod_name is incorrect, and k8s_pods_generation is not present at all

@nokute78
Copy link
Collaborator

I tested such configuration and it seemed to work correctly.
"k8s_pod_name" was apps-services-provisioning.

Can you share other parser configurations ? (statefulset and selfmanaged)
They may affect output.

parsers.conf:

[PARSER]
      Name   replicaset
      Format regex
      Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pods_generation>[0-9a-f]+)-(?<k8s_pod_id>[[:alnum:]]{5}))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

a.conf:

[SERVICE]
    Parsers_File parsers.conf

[INPUT]
    Name dummy
    Tag k8s
    Dummy {"source":"/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log"}

[FILTER]
    Name         parser
    Match        k8s
    Key_Name     source
    Preserve_Key On
    Reserve_Data On
    Parser       replicaset

[OUTPUT]
    Name stdout

output is here.
"k8s_pod_name" is apps-services-provisioning.

$ sudo docker run -it --rm -v ~/git/fluent-bit/build/2250:/2250 fluent/fluent-bit:1.4.6 /fluent-bit/bin/fluent-bit -c /2250/a.conf
.
[0] k8s: [1591968129.376953070, {"k8s_namespace"=>"staging-ci-app-enapter", "k8s_pod_fullname"=>"apps-services-provisioning-7858c7f7f9-55tsr", "k8s_pod_name"=>"apps-services-provisioning", "k8s_pods_generation"=>"7858c7f7f9", "k8s_pod_id"=>"55tsr", "k8s_container_name"=>"service", "source"=>"/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log"}]

@Antiarchitect
Copy link
Author

Antiarchitect commented Jun 12, 2020

@nokute78
Yeah, sure:

    [PARSER]
        Name   replicaset
        Format regex
        Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pods_generation>[0-9a-f]+)-(?<k8s_pod_id>[[:alnum:]]{5}))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

    [PARSER]
        Name   statefulset
        Format regex
        Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pod_ordinal_index>\d+))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

    [PARSER]
        Name   selfmanaged
        Format regex
        Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)(-(?<k8s_pod_id>[[:alnum:]]{5}))?)_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

They all got from my fluentd config and there they work as expected. Have I understood correctly: They are applied one by one in the order they declared in FILTER and the one that matches first wins and successors are canceled?

@nokute78
Copy link
Collaborator

@Antiarchitect Thank you.
I can reproduce your issue. The selfmanaged configuration affects.
https://rubular.com/r/gtZf584MD71WLp

They are applied one by one in the order they declared in FILTER and the one that matches first wins and successors are canceled?

Hmm, when Reserve_Data is enabled, fluent-bit doesn't cancel even if the one is matched.
So, it generates a record which is parsed with last parser configuration.
I think it is a bug.
(If Reserve_Data is disabled, fluent-bit cancels when the one is matched )

@Antiarchitect
Copy link
Author

So you mean it's a real bug and behavior shouldn't change dependent on Reserve_Data? Sounds reasonable to me.

@nokute78
Copy link
Collaborator

@Antiarchitect Yes.

Note:
This is a simple configuration to reproduce the issue.

parsers.conf:

[PARSER]
     Name   one
     Format regex
     Regex  ^(?<one>.+?)$

[PARSER]
     Name   two
     Format regex
     Regex  ^(?<two>.+?)$

a.conf

[SERVICE]
    Parsers_File parsers.conf

[INPUT]
    Name dummy
    Dummy {"original":"hoge"}

[FILTER]
    Name         parser
    Match        *
    Key_Name     original
    Reserve_Data On
    Preserve_key on
    Parser       one
    Parser       two

[OUTPUT]
    Name stdout
Reserve_Data Output Note
On {"two"=>"hoge", "original"=>"hoge"} Parser two is matched. It should be {"one"=>"hoge", "original"=>"hoge"}
Off {"one"=>"hoge"} Parser one is matched

@Antiarchitect
Copy link
Author

Great!

nokute78 added a commit to nokute78/fluent-bit that referenced this issue Jun 13, 2020
Signed-off-by: Takahiro YAMASHITA <nokute78@gmail.com>
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Jun 13, 2020
fluent#2250)

Signed-off-by: Takahiro YAMASHITA <nokute78@gmail.com>
@nokute78
Copy link
Collaborator

I sent a patch #2255.

It should be fix the issue.
parsers.conf:

[PARSER]
     Name   replicaset
     Format regex
     Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pods_generation>[0-9a-f]+)-(?<k8s_pod_id>[[:alnum:]]{5}))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

[PARSER]
     Name   statefulset
     Format regex
     Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)-(?<k8s_pod_ordinal_index>\d+))_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

[PARSER]
     Name   selfmanaged
     Format regex
     Regex  ^\/var\/log\/pods\/(?<k8s_namespace>.+?)_(?<k8s_pod_fullname>(?<k8s_pod_name>.+?)(-(?<k8s_pod_id>[[:alnum:]]{5}))?)_.+?\/(?<k8s_container_name>.+?)\/\d+\.log$

a.conf:

[SERVICE]
    Parsers_File parsers.conf

[INPUT]
    Name dummy
    Tag k8s
    Dummy {"source":"/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log"}

[FILTER]
    Name         parser
    Match        k8s
    Key_Name     source
    Preserve_Key On
    Reserve_Data On
    Parser       replicaset
    Parser       statefulset
    Parser       selfmanaged

[OUTPUT]
    Name stdout

output is

{"k8s_namespace"=>"staging-ci-app-enapter", "k8s_pod_fullname"=>"apps-services-provisioning-7858c7f7f9-55tsr", "k8s_pod_name"=>"apps-services-provisioning", "k8s_pods_generation"=>"7858c7f7f9", "k8s_pod_id"=>"55tsr", "k8s_container_name"=>"service", "source"=>"/var/log/pods/staging-ci-app-enapter_apps-services-provisioning-7858c7f7f9-55tsr_0a84a257-92c8-4197-8fbc-ff5fdc9990c0/service/0.log"}

"k8s_pod_name" will be "apps-services-provisioning" with the patch.

nokute78 added a commit to nokute78/fluent-bit that referenced this issue Jun 13, 2020
Signed-off-by: Takahiro YAMASHITA <nokute78@gmail.com>
edsiper pushed a commit that referenced this issue Jun 25, 2020
Signed-off-by: Takahiro YAMASHITA <nokute78@gmail.com>
edsiper pushed a commit that referenced this issue Jun 25, 2020
Signed-off-by: Takahiro YAMASHITA <nokute78@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants