Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web_log reports unmatched lines #2295

Closed
alibo opened this issue Jun 7, 2017 · 12 comments · Fixed by #4757
Closed

web_log reports unmatched lines #2295

alibo opened this issue Jun 7, 2017 · 12 comments · Fixed by #4757
Assignees
Labels
area/collectors Everything related to data collection collectors/python.d feature request New features priority/high Super important issue
Milestone

Comments

@alibo
Copy link
Contributor

alibo commented Jun 7, 2017

web_log plugin reports unmatched lines.

This is my log:

162.***.***.45 - - [07/Jun/2017:11:02:16 +0430] "\x15\x03\x01\x00 N\x87\x98\x04l5\xAF\x89\x92\xF7\xDB\xB9 \xD1\xF3\xFF\xBAa" 400 166 0 0.000 "-" "-"

86.**.***.178 - - [07/Jun/2017:11:02:37 +0430] "\x17\x03\x01\x020\xE1\xA8\x1C=\xC3 \xF5\xD95V\x9C\x0C\xB6~\xC4]\x95\xD8d,\xC0\x5C\xB6;cG\xCD\xB6Edh\xC5\xD2\xDBtcZ\xD0\x0B\xC4@~\xC4\x83\xE0\xDA\x84\xB6\xAA\xF9G\x0B`\xEA\xBE\x91\xB6\xED\xAFGd\x11}Sex'7j\xFBY\x05{B\x15\xC8+/bps\x5C^\xC4\xD4\xEA\xF5_8ISq(J\xDF\xE9\x16\x95\x1F\xC4\x1C\xE6\xF6c\xEB\xB1W&\xD1\xE6\x19\x03\x1DT\x05\xF1l\xE5\xEC\x0C\xC1\x16\xE0\x9Ds\xFC~>\x02\x7F\x1B\x22\x84\xE16\x0CJ\xA6\xDB\x98\xDA\xAB\xA0\x82\x7F\xA21\x82\xA7\xE1\x08\x02emm\xB5\xD4\x8A\x9C\xBB\x95m\xE8\x8B\xDB\xAC\xC6\x81\x8E\x8Ef+\xA8\x96\xB5-\x96\xD5\x03\xFCR\xB3\xE3\xA4C|\xCAC\xDC\x0CW\xFF\xA5p\xA9\x83(\xE3\xA6\x03\xF6.W,\x1F\xE5h!U\x11dL\x95\x8F'\xFFK\x1F\xF6^\xD9(\x94+\x7F\x91\xB2\x03\xA4\xCFR\xCB\xD8\x22\xD2\xE5\xEA\x87KLUc\xD4a\xA3u(t\xE0\x1A\xE1z\x04\xF2\xDE\xD1\x02\x95s9\xFBY\xE4\xE88TM8\x11\xC2\xEA\xEC?\xAB5^I\xE1\xB2G\xB0\xAC\xAB\x5C\x88\x9A\xB0\xA3\x99`\xD0)\x17\x12kdX\xE7\xD5\x0B\xCE\xBFm\xB7\xFC!\xFA\xCES\x0B(\xD0^\xC7\xC8\xE3R\xC5\xABExL\xD3\x8B\xBF,1\xCB\x93j\x8A%\xDC\x97\xF1\xA5s\x04\xCC}\xA5\x88'\xA4\x11\x85\x15Y~1" 400 166 0 0.081 "-" "-"

5.***.**.230 - - [07/Jun/2017:11:03:00 +0430] "\x17\x03\x01\x020\x15=\xE3v\xB4\xAD\x85\xE1^\x11\x06\xFE\x1F\x83\xA42[\x8D\xB2\x85\xB6P\xAE\x12\xAC6Cg\xBDX\xC7S=*\xDB\x0F\xC8\xAF\xFBJ*\xAEE\xA1 \xFA\xE0\xBDw\xBBw\x8D~\xC1\x18\xAA\xD9\x01\xF0\x81i\xCA)\xA0\xB0\xCF\xE2IO\xC8\xBC\xFC\xD3k\x8F\xC3\xD2Wb\x22V\xA6\xFA\xC8\xB6\x91\xC0P>M\xA6\xA8@l\xB3f\x10\x8A\x22\xDCm\xC6\xFB\xC2\xBF\xEB\x1F\x88m\xAB\x81\x10\x92\xF6g\xA1t}\xA0\xC4\xB4\x96q\x5C\x10\xDDb\x9C[[\x08\x14D\x82\xF8c\xB1\xE5|\xF4w\xB37B\xF6s.6]\xC7Q\x1F\xAB\xAF\x00\xBC\xDB0\x90\xED)\xFD\x1C\x15\xCDW\xD4\x1B\xB2\x90\xF0\x93\xA6SQ\xB6,\xCE\xCA\xBF^F\xC8\x04\xE8\xAC;\xCB5\x1B\x94r~\xEA\xB6\x8F\xA4'k\x18\x9C\xD4_=M\xA3\x9D\x95\xB9\x93\x99H\x1D\xEC\x8FW\x1F\xEAa\x8A>\xC7\xA0\xF9D0\xCD\x8A\xF3(\xBE\x106\x9B!K\x8F\xD2\xB39\xBE\xA3\x8F\xA4\xA8\xE7\x0E\xC2Z\xAF\xC1\xD6\xE7\x8C\x976\x193{R\x05\x0F" 400 166 0 0.001 "-" "-"

It seems someone is trying to scan our web servers to find vulnerabilities!!

Nginx responded requests with 400, but Netdata shows unmatched for them.

@ktsaou
Copy link
Member

ktsaou commented Jun 9, 2017

@l2isbad have a look...

@ktsaou ktsaou added the bug label Jun 9, 2017
@ilyam8
Copy link
Member

ilyam8 commented Jun 9, 2017

Well, how they should be parsed?

@ktsaou
Copy link
Member

ktsaou commented Jun 10, 2017

hm... the request URL is funny for sure, but the rest of the data seem ok.
Have you understood which part is not matched by the regex?

Generally speaking, we should try to match 100% of the log. A vulnerability scan, or attack that is not matched, will prevent an alarm from triggering. If we decide to ignore funny, but legit, log lines, we should add an alarm for the unmatched entries. I can do this. But I think we should first attempt to match the lines. If this is impossible or unreasonably hard, I will add the alarm.

@ilyam8
Copy link
Member

ilyam8 commented Jun 10, 2017

nginx_ext_insert = re.compile(r'(?P<address>[\da-f.:]+)'
                              r' -.*?"(?P<method>[A-Z]+)'        # <- not matches
                              r' (?P<url>[^ ]+)'
                              r' [A-Z]+/(?P<http_version>\d\.\d)"'  # <- not matches
                              r' (?P<code>[1-9]\d{2})'
                              r' (?P<bytes_sent>\d+)'
                              r' (?P<resp_length>\d+)'
                              r' (?P<resp_time>\d+\.\d+) ')

"\x15\x03\x01\x00 N\x87\x98\x04l5\xAF\x89\x92\xF7\xDB\xB9 \xD1\xF3\xFF\xBAa"
"method url http_version"

@ktsaou
Copy link
Member

ktsaou commented Jun 10, 2017

hm... if we match them, the charts that show these will become full of garbage dimensions.
I guess the solution is the alarm...

@ktsaou
Copy link
Member

ktsaou commented Jun 11, 2017

@l2isbad before adding the alarm, one last thought:

What if we turn method and http_version to [^ ]+ (so that they match anything except space) and we then check them against their proper pattern. If they don't match, we count them at dimension INVALID.

We could then an alarm for INVALID at method and http_version.

What do you think?

@ilyam8
Copy link
Member

ilyam8 commented Jun 11, 2017

I thought about it but "$request" (which parsed as method, url and http_version) on 2nd and 3rd lines consists of 2 space separated parts.

@ilyam8
Copy link
Member

ilyam8 commented Jun 11, 2017

But yes, you can mark the issue as 'enhancement'.
Still alarm for unmatched is a good idea.

@ktsaou ktsaou added enhancement and removed bug labels Jun 11, 2017
@ktsaou
Copy link
Member

ktsaou commented Jun 11, 2017

ok

@stale
Copy link

stale bot commented Nov 23, 2018

Currently netdata team doesn't have enough capacity to work on this issue. We will be more than glad to accept a pull request with a solution to problem described here. This issue will be closed after another 60 days of inactivity.

@stale stale bot added the stale label Nov 23, 2018
@ktsaou
Copy link
Member

ktsaou commented Nov 23, 2018

So, we don't have an alarm for unmatched entries?
@l2isbad can you add this?

@stale stale bot removed the stale label Nov 23, 2018
@cakrit cakrit added the feature request New features label Nov 23, 2018
@ilyam8
Copy link
Member

ilyam8 commented Nov 26, 2018

Yes

@ilyam8 ilyam8 added the priority/high Super important issue label Nov 26, 2018
@cakrit cakrit added this to the v1.12-rc0 milestone Nov 26, 2018
@ilyam8 ilyam8 added collectors/python.d area/collectors Everything related to data collection and removed area/external/python labels Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collectors Everything related to data collection collectors/python.d feature request New features priority/high Super important issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants