Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix grok expression for traefik #6136

Closed
wants to merge 4 commits into from
Closed

fix grok expression for traefik #6136

wants to merge 4 commits into from

Conversation

e8kor
Copy link

@e8kor e8kor commented Jan 22, 2018

This pull request fixes Grok expression for parsing access logs from traefik.
Related issue: #6111
Grok expressions will work for traefik version 1.4.+ and 1.5.+ and it respects options fields in logs according to logger format.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@ruflin
Copy link
Member

ruflin commented Jan 22, 2018

@e8kor Thanks for the PR. Could you add a log file for the 1.4 events for the tests: https://github.com/elastic/beats/tree/master/filebeat/module/traefik/access/tests Perhaps we should name the files accordingly. I don't remember if there is an easy way to generated the expected outcome. @kvch might know?

@ruflin
Copy link
Member

ruflin commented Jan 22, 2018

@e8kor Please also add an entry to the CHANGELOG file.

@e8kor
Copy link
Author

e8kor commented Jan 23, 2018

@ruflin and expected log output, current test.log is completely outdated

@e8kor
Copy link
Author

e8kor commented Jan 23, 2018

Here is example of proper log entries of traefik, minimal and maximal:

85.181.35.98 - "user" [02/Oct/2017:20:22:08 +0000] "GET /ui/favicons/favicon.ico HTTP/1.1" 304 0 "http://example.com/login" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" 271 "Host-host1" "http://172.19.0.3:5601" 3ms
85.181.35.98 - - [02/Oct/2017:20:22:08 +0000] "GET /ui/favicons/favicon.ico HTTP/1.1" - - "-" "-" - - - 0ms

and examples of parsed output

maximal:

{
  "traefik": {
    "access": {
      "response_code": 304,
      "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
      "method": "GET",
      "user_name": "user",
      "http_version": "1.1",
      "backend_url": "http://172.19.0.3:5601",
      "url": "/ui/favicons/favicon.ico",
      "request_count": 271,
      "duration": 3,
      "referrer": "http://example.com/login",
      "remote_ip": "85.181.35.98",
      "frontend_name": "Host-host1",
      "body_sent": {
        "bytes": 0
      },
      "time": "02/Oct/2017:20:22:08 +0000"
    }
  }
}

minimal:

{
  "traefik": {
    "access": {
      "duration": 0,
      "remote_ip": "85.181.35.98",
      "method": "GET",
      "http_version": "1.1",
      "time": "02/Oct/2017:20:22:08 +0000",
      "url": "/ui/favicons/favicon.ico"
    }
  }
}

I have refactored grok expression to have single one instead of two patterns.
Changelog entry added

@kvch
Copy link
Contributor

kvch commented Jan 26, 2018

@ruflin Unfortunately, I am not aware of anything which could help. I use the search API of ES to get the events i fed it.
@e8kor I am not sure if it's only my preference, but I think having multiple patterns is better than having one complex with multiple conditionals. WDYT?

@e8kor
Copy link
Author

e8kor commented Jan 26, 2018

@kvch revert then ? I'm using new one on production and so far so good, I don't know how pipelines behaves in this case, is pipeline choose first matched or its doing processing for each, if you can guide me throw this cases I can come with better case. Have you compered previous expressions with new one ?

@ruflin
Copy link
Member

ruflin commented Jan 28, 2018

I think one expression will probably more efficient on the ES side in case the second pattern is hit. But for readability 2 patterns are probably better. Like this we have one for each "version" of traefik.

@e8kor Did you try to add an additional log entry with expected output to the tests directory?

@kvch
Copy link
Contributor

kvch commented Jan 29, 2018

@e8kor No, definitely not. I agree with @ruflin.

@Infusible
Copy link

maybe this information will help you
I'm using a filebeat with a traefik module, I write logos directly to the ES
and in the kibana I see such mistakes
screen shot 2018-02-20 at 11 12 04

@e8kor
Copy link
Author

e8kor commented Mar 5, 2018

@ruflin what is expected from me now to finish pull request? Sorry for long delay

@ruflin
Copy link
Member

ruflin commented Mar 12, 2018

It would be great to have some tests that validate the change. In #6488 @rhooper added a test example but left out left out other changes in this PR. So marrying the two PR's would be great :-)

@iwex
Copy link

iwex commented Aug 19, 2018

@e8kor any updates?

@iwex
Copy link

iwex commented Aug 19, 2018

Also strange, but I don't have frontend name in Kibana

@e8kor
Copy link
Author

e8kor commented Aug 20, 2018

It was hard to understand how tests should look like and because of lack of explanations I haven't completed pullrequest.

@iwex
Copy link

iwex commented Aug 20, 2018

@ruflin Could you please help him with tests :)?

@kvch
Copy link
Contributor

kvch commented Aug 21, 2018

@e8kor You need to add example logs to filebeat/module/traefik/access/test/test.log. To generate example output you need to run GENERATE=1 INTEGRATION_TESTS=1 TESTING_FILEBEAT_MODULES=traefik nosetests tests/system/test_modules.py. Commit both filebeat/module/traefik/access/test/test.log and filebeat/module/traefik/access/test/test.log-expected.json.

Please, rebase your branch. If you need further help, let me know. :)

@dennybaa
Copy link

it's time of 1.7.x traefik, filebeat still does not have this fixed :(

@ruflin
Copy link
Member

ruflin commented Oct 29, 2018

@dennybaa We started our own PR to get the 1.5 changes in here: #8768

Would you be available to provide some example logs for 1.7 so we could test them against the current patterns? Or do you already know which things don't work?

@iwex
Copy link

iwex commented Oct 29, 2018

@ruflin
11.111.99.25 - - [29/Oct/2018:12:37:16 +0000] "GET /foo HTTP/1.1" 301 460 "-" "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Plus Build/NPNS25.137-93-14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36" 187614 "Host-foo-com-xxx-info-xxx-xx-com" "http://172.22.0.2:80" 312ms

<remote_IP_address> - <client_user_name_if_available> [<timestamp>] "<request_method> <request_path> <request_protocol>" <origin_server_HTTP_status> <origin_server_content_size> "<request_referrer>" "<request_user_agent>" <number_of_requests_received_since_Traefik_started> "<Traefik_frontend_name>" "<Traefik_backend_URL>" <request_duration_in_ms>ms

https://docs.traefik.io/configuration/logs/#clf-common-log-format

@dennybaa
Copy link

dennybaa commented Oct 29, 2018

@ruflin Howdy, yup they look like this

128.199.134.13 - - [29/Oct/2018:12:23:58 +0000] "GET /logos/180x180.png HTTP/1.1" 304 0 "https://app.jojomojo.org/id/sign-up" "Mozilla/5.0 (Linux; Android 7.0; TECNO CX Air Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36" 6 "app.jojomojo.org/id/" "http://111.12.13.4:80" 0ms
202.80.212.30 - - [29/Oct/2018:12:24:32 +0000] "POST /api/console/proxy?path=_mapping&method=GET HTTP/1.1" 200 9797 "https://xxx.jojomojo.org/app/kibana" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/70.0.3538.67 Chrome/70.0.3538.67 Safari/537.36" 6 "xxx.jojomojo.org/" "http://111.12.4.5:5601" 78ms

It's also said in traefik docs that CLF is used: <remote_IP_address> - <client_user_name_if_available> [<timestamp>] "<request_method> <request_path> <request_protocol>" <origin_server_HTTP_status> <origin_server_content_size> "<request_referrer>" "<request_user_agent>" <number_of_requests_received_since_Traefik_started> "<Traefik_frontend_name>" "<Traefik_backend_URL>" <request_duration_in_ms>ms

But personally I would prefer to see the filebeat kibana dashboard with native traefik json format. Since using filebeat modules in clusters such as k8s is pain. Passing the volumes there and back, unneeded complexity and security flaws...
So imo traefik module is not required and all the filebased modules too, it's better to think about neater docker stdout/stderr integration possibly picking up containers by labels or generally smth better what suites orchestrated environments.

@ruflin
Copy link
Member

ruflin commented Oct 29, 2018

@iwex @dennybaa Thanks for the logs. @sayden FYI ^

@dennybaa I would expect the modules to work very well in combination with autodiscovery: https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-autodiscover.html

When you write JSON output you mean the message packed into JSON by docker or actually structured logs as JSON?

@iwex
Copy link

iwex commented Oct 29, 2018

@ruflin traefik also has an option to write logs in json format. I'll find example and attach somewhere here :)

@ruflin
Copy link
Member

ruflin commented Oct 29, 2018

@iwex Perfect thanks. We should probably support both formats.

@ph One more example where we have (at least) two formats ^

@iwex
Copy link

iwex commented Oct 29, 2018

@ruflin Huge amount of data:

{"BackendAddr":"172.19.0.3:80","BackendName":"backend-whoami-root","BackendURL":{"Scheme":"http","Opaque":"","User":null,"Host":"172.19.0.3:80","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"ClientAddr":"172.18.0.1:55576","ClientHost":"172.18.0.1","ClientPort":"55576","ClientUsername":"-","DownstreamContentSize":884,"DownstreamStatus":200,"DownstreamStatusLine":"200 OK","Duration":2217704,"FrontendName":"Path-products-0","OriginContentSize":884,"OriginDuration":2143904,"OriginStatus":200,"OriginStatusLine":"200 OK","Overhead":73800,"RequestAddr":"ip172-18-0-19-bfbmvbs9cs9g00as19gg-80.direct.labs.play-with-docker.com","RequestContentSize":0,"RequestCount":2,"RequestHost":"ip172-18-0-19-bfbmvbs9cs9g00as19gg-80.direct.labs.play-with-docker.com","RequestLine":"GET/products/ HTTP/1.1","RequestMethod":"GET","RequestPath":"/products/","RequestPort":"-","RequestProtocol":"HTTP/1.1","RetryAttempts":0,"StartLocal":"2018-10-29T20:46:31.589247187Z","StartUTC":"2018-10-29T20:46:31.589247187Z","downstream_Content-Length":"884","downstream_Content-Type":"text/plain; charset=utf-8","downstream_Date":"Mon, 29 Oct 2018 20:46:31 GMT","level":"info","msg":"","origin_Content-Length":"884","origin_Content-Type":"text/plain; charset=utf-8","origin_Date":"Mon, 29 Oct 2018 20:46:31 GMT","request_Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","request_Accept-Encoding":"gzip, deflate","request_Accept-Language":"en-US,en;q=0.9,uk-UA;q=0.8,uk;q=0.7,ru-RU;q=0.6,ru;q=0.5","request_Connection":"keep-alive","request_Cookie":"ajs_group_id=null; ajs_anonymous_id=%22f653bc45-9e8b-4692-8bf2-d37445ecfdc1%22; ajs_user_id=%22b9656ead-7a57-49db-a691-bcb831c83bec%22","request_Upgrade-Insecure-Requests":"1","request_User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36","time":"2018-10-29T20:46:31Z"}

@dennybaa
Copy link

@ruflin thank you for the information.

I would expect the modules to work very well in combination with autodiscovery: https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-autodiscover.html

Please any working configuration would be preferable not just the link (and specifically for the metrics beat which is not the one we are talking about, right ?). While https://www.elastic.co/guide/en/beats/filebeat/master/configuration-autodiscover.html#_kubernetes docs are hardly sufficient for a successful setup.

What filebeat can successfully do so far is just grabbing docker stdout/stderr logs without autodiscover module templates kicking in :( Unfortunately it does not feel even like beta now (filebeat 6.4.2).

@ruflin
Copy link
Member

ruflin commented Oct 30, 2018

@dennybaa We probably better take this discussion to discuss instead of here in the issue. Could you open a topic there? https://discuss.elastic.co/c/beats/filebeat

@dennybaa
Copy link

@ruflin Well yup my bad, sorry for ranting.

@sayden
Copy link
Contributor

sayden commented Dec 7, 2018

Finally fixed in #8768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants