Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Apache log parser #306

Open
greg-FR13 opened this issue Aug 1, 2018 · 2 comments
Open

Question Apache log parser #306

greg-FR13 opened this issue Aug 1, 2018 · 2 comments

Comments

@greg-FR13
Copy link

Hi All,

I am a little bit lost I am using the following rule :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Apache's Log 1 :
...
XX.XX.XX.XX - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050
XX.XX.XX.XX - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985
XX.XX.XX.XX - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

#head -1 /var/log/httpd/my.access_log | /usr/bin/lognormalizer -r apache_access_log.rule -e json
{ "originalmsg": "XX.XXX.XXX.XXX - "" [29/Jul/2018:03:53:53 +0000] "GET /robots.txt HTTP/1.1" 404 985", "unparsed-data": "" }

The rule is working for other Apache's logs, my problem is present only when I have "" in the log.

How can I deal with %auth:word% and "" ?

Thank you for your help and support,

Regards,

@manios
Copy link

manios commented Sep 11, 2018

Hello @greg-FR13 ,

Your rule does not match the logs you are posting, since there is no user agent and referrer part present in the log messages.

For your logs:

192.168.1.1 - "Tester" [29/Jul/2018:05:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985
192.168.1.1 - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

this rule matches :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to{"extradata":"]"}%] "%verb:word% %request:word% HTTP/%httpversion:float{"format":"number"}%" %response:number{"format":"number"}% %blob:rest%

and when you run:

lognormalizer  -H -p -r apache.rule  < apache.log

it produces the following results:

{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:05:15:47 +0000", "auth": "\"Tester\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:06:15:47 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "985", "response": 404, "httpversion": 1.1, "request": "\/robots.txt", "verb": "GET", "timestamp": "29\/Jul\/2018:07:09:05 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "405", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:08:20:39 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }

In order to include user agent and referrer parts then you have 2 options:

  1. Either provide another rule with a higher priority than the aforementioned in the %response rule field.
  2. Enhance the existing rule with an alternative parser.

Keep in mind that liblognorm rules are not regular expressions. They produce Directed Acyclic Graphs (DAG) and the rules are handled in a different way than you may think by the parser . For more information please refer to official documentation.

Best regards,
Christos

@greg-FR13
Copy link
Author

Hi @manios ,
Thank you for your complete answer; I will having a look.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants