Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grep filter does not seem to work #213

Closed
sudeep-quantelai opened this issue Sep 24, 2019 · 9 comments
Closed

grep filter does not seem to work #213

sudeep-quantelai opened this issue Sep 24, 2019 · 9 comments
Labels
waiting-for-user Waiting for user/contributors feedback or requested changes

Comments

@sudeep-quantelai
Copy link

sudeep-quantelai commented Sep 24, 2019

I have included the following in the conf file

[FILTER]
    Name         grep
    Match        *
    Exclude      log [0-9]*\(space)
    Exclude      log 2019

The filter does not seem to do anything. It passes everything through. What is the right way to use the grep filter. Does it support standard regex patterns or a specific set of custom regex?

@edsiper
Copy link
Member

edsiper commented Sep 24, 2019

Please attach your log example file being used for testing

@edsiper edsiper added the waiting-for-user Waiting for user/contributors feedback or requested changes label Sep 24, 2019
@sudeep-quantelai
Copy link
Author

2019-06-20 14:57:33.029611 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:34.026141 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:34.035226 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:35.030910 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:35.041038 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:36.035636 19858   2019-06-20 14:57:35.629177
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:36.046407 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:37.040407 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:37.050977 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:38.045821 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:38.055950 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:39.051183 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:39.060831 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:40.056906 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:40.066180 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:41.061395 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:41.070936 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:42.066399 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:42.076984 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:43.072016 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:43.081874 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:44.079633 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:44.087403 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:45.084975 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:45.092852 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:46.089922 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:46.097518 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:47.094991 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:47.102149 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:48.101007 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:48.107430 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:49.105742 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:49.111561 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:50.111089 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:50.116145 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:51.121500 19859   2019-06-20 14:57:50.879765
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:51.122078 19859   2019-06-20 14:57:50.879765
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:52.129122 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:52.136576 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:53.142138 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:53.142728 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:54.152644 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:54.158221 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:55.164345 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:55.164856 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:56.179539 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:56.184263 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:57.184391 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:57.189761 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:58.189408 19860   2019-06-20 14:57:57.728267
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:58.199332 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:59.200131 19861   2019-06-20 14:57:58.926505
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:59.205966 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:00.206112 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:00.211526 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:01.211646 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:01.219618 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:02.219734 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:02.227956 Series([], Name: transacttime, dtype: datetime64[ns])

@edsiper
Copy link
Member

edsiper commented Sep 24, 2019

I could not make the regex match your logs, would you please create the test case here:

then share here the permalink

@sudeep-quantelai
Copy link
Author

sudeep-quantelai commented Sep 24, 2019

Ok.. if I want to exclude every row that has the word 'Series' in it how should my 'Exclude' clause be constructed? Because the tool above matches the regex 'Series' (without wildcards) but not the grep filter.

Here is the permalink
https://rubular.com/r/1mjxL5t1uFAxBw

I tried the following

[FILTER]
    Name         grep
    Exclude      log Series
[FILTER]
    Name         grep
    Exclude      log .*Series

and

[FILTER]
    Name         grep
    Regex        log .*Series

None of these filter out the rows with 'Series' in them.

Also my input section in the config file looks like this

[INPUT]
    Name         tail
    Path         /mnt/volume_nyc3_03/genfix*.out
    Tag          genfix

Can I request the documentation for the 'grep' filter be enhanced to include more examples?

@sudeep-quantelai
Copy link
Author

@edsiper Any thing on this?

@edsiper
Copy link
Member

edsiper commented Sep 27, 2019

It actually works:

$ bin/fluent-bit -i tail -p path=series.log -F grep -m '*' -p "exclude=log .*Series" -o stdout -f 1
[0] tail.0: [1569599252.379180976, {"log"=>"2019-06-20 14:57:58.189408 19860   2019-06-20 14:57:57.728267"}]
[1] tail.0: [1569599252.379181993, {"log"=>"Name: transacttime, dtype: datetime64[ns]"}]
[2] tail.0: [1569599252.379184123, {"log"=>"2019-06-20 14:57:59.200131 19861   2019-06-20 14:57:58.926505"}]
[3] tail.0: [1569599252.379185191, {"log"=>"Name: transacttime, dtype: datetime64[ns]"}]

the series.log have the following content:

$ cat series.log 
2019-06-20 14:57:53.142138 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:53.142728 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:54.152644 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:54.158221 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:55.164345 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:55.164856 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:56.179539 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:56.184263 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:57.184391 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:57.189761 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:58.189408 19860   2019-06-20 14:57:57.728267
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:58.199332 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:57:59.200131 19861   2019-06-20 14:57:58.926505
Name: transacttime, dtype: datetime64[ns]
2019-06-20 14:57:59.205966 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:00.206112 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:00.211526 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:01.211646 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:01.219618 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:02.219734 Series([], Name: transacttime, dtype: datetime64[ns])
2019-06-20 14:58:02.227956 Series([], Name: transacttime, dtype: datetime64[ns])

Your Filter don't have a match rule.

@sudeep-quantelai
Copy link
Author

Ok.. so I found out what the issue was. I was trying to use -o on the command line along with a config file because my command file had an output to Apache pulsar that was not working. So when you specify a -o on the command line it somehow ignores the filter.

However, I am still not able to match the following filter

[FILTER]
    Name         grep
    Match        *
    Exclude      "log ^[0-9]* "

I have also tried

[FILTER]
    Name         grep
    Match        *
    Exclude      log ^[0-9]* 

Interestingly... this works from the command line.

Here is the rubular permalink. This has the data with which I am testing it.
https://rubular.com/r/a6YD4p8zpxrHmQ

Data I am using to test

26842   2019-08-02 19:29:40.455850
Name: transacttime, dtype: datetime64[ns]
publishing om log for fdom3
publishing om log for fdom1
0 2
2019-08-02 19:29:41.723811 26843   2019-08-02 19:29:41.368836
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom4
0 1
2019-08-02 19:29:42.728986 26844   2019-08-02 19:29:42.552601
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom5
0 1
2019-08-02 19:29:43.735457 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:44.740316 26845   2019-08-02 19:29:43.870181
26846   2019-08-02 19:29:44.197273
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom4
publishing om log for eqom1
0 2
2019-08-02 19:29:45.745448 26847   2019-08-02 19:29:45.345420
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom4
0 1
2019-08-02 19:29:46.750463 26848   2019-08-02 19:29:45.870497
26849   2019-08-02 19:29:46.655934
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom1
publishing om log for eqom3
0 2
2019-08-02 19:29:47.761779 26850   2019-08-02 19:29:47.610425
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom4
0 1
2019-08-02 19:29:48.774348 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:49.781097 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:50.786818 26851   2019-08-02 19:29:49.800049
26852   2019-08-02 19:29:50.574346
Name: transacttime, dtype: datetime64[ns]
publishing om log for fdom3
publishing om log for eqom2
0 2
2019-08-02 19:29:51.793154 26853   2019-08-02 19:29:51.295029
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom1
0 1
2019-08-02 19:29:52.797845 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:53.801979 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:54.806289 26854   2019-08-02 19:29:54.089877
26855   2019-08-02 19:29:54.311441
Name: transacttime, dtype: datetime64[ns]
publishing om log for fdom2
publishing om log for eqom3
0 2
2019-08-02 19:29:55.811229 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:56.816648 26856   2019-08-02 19:29:56.778773
Name: transacttime, dtype: datetime64[ns]
publishing om log for eqom2
0 1
2019-08-02 19:29:57.821888 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:58.826702 Series([], Name: transacttime, dtype: datetime64[ns])
2019-08-02 19:29:59.831224 Series([], Name: transacttime, dtype: datetime64[ns])

@edsiper
Copy link
Member

edsiper commented Sep 27, 2019

this config works:

[SERVICE]
    Log_Level info
    Flush        1

[INPUT]
    Name   tail
    Path   test.log

[FILTER]
    Name     grep
    Match    *
    Regex    log ^([0-9]).*$

[OUTPUT]
    Name     stdout
    Match    *

@edsiper
Copy link
Member

edsiper commented Oct 8, 2019

Closing as fixed. Above config works.

note: if you find any issue, please open a new ticket and refer this one.

@edsiper edsiper closed this as completed Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-user Waiting for user/contributors feedback or requested changes
Projects
None yet
Development

No branches or pull requests

2 participants