Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance traefik integration to also handle JSON-formatted access logs #770

Merged
merged 36 commits into from
Mar 23, 2021

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Mar 9, 2021

What does this PR do?

This PR enhances the traefik integration to parse JSON-formatted access logs. Prior to this PR the traefik integration could only parse commonlog-formatted access logs.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.

Related issues

@ycombinator ycombinator added the enhancement New feature or request label Mar 9, 2021
@ycombinator ycombinator changed the title Migrate traefik format json Enhance traefik integration to also handle JSON-formatted access logs Mar 9, 2021
@ycombinator ycombinator mentioned this pull request Mar 9, 2021
17 tasks
@elasticmachine
Copy link

elasticmachine commented Mar 9, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #770 updated

  • Start Time: 2021-03-23T13:35:51.537+0000

  • Duration: 17 min 48 sec

  • Commit: deb8179

Test stats 🧪

Test Results
Failed 0
Passed 18
Skipped 0
Total 18

Trends 🧪

Image of Build Times

Image of Tests

@ycombinator ycombinator force-pushed the migrate-traefik-format-json branch 2 times, most recently from 9d567da to 4538ac7 Compare March 16, 2021 18:18
@ycombinator ycombinator marked this pull request as ready for review March 17, 2021 23:26
@ycombinator
Copy link
Contributor Author

ycombinator commented Mar 17, 2021

@mtojek I could use a second pair of eyes on why the system tests are failing for this PR. The symptom is that no documents can be found in the data stream:

[2021-03-17T23:20:47.909Z] 2021/03/17 23:20:47 DEBUG checking for expected data in data stream...

[2021-03-17T23:20:47.909Z] 2021/03/17 23:20:47 DEBUG found 0 hits in logs-traefik.access-ep data stream

[2021-03-17T23:20:48.852Z] 2021/03/17 23:20:48 DEBUG found 0 hits in logs-traefik.access-ep data stream

[2021-03-17T23:20:49.796Z] 2021/03/17 23:20:49 DEBUG found 0 hits in logs-traefik.access-ep data stream

[2021-03-17T23:20:51.187Z] 2021/03/17 23:20:50 DEBUG found 0 hits in logs-traefik.access-ep data stream
...

I am able to reproduce this locally.

While the system test is still running, I checked that the Elastic Agent container has the log files mounted as expected:

docker exec elastic-package-stack_elastic-agent_1 ls -al /tmp/service_logs
total 372
drwxr-xr-x 4 elastic-agent elastic-agent    128 Mar 17 23:20 .
drwxrwxrwt 1 root          root            4096 Mar 17 23:13 ..
-rw-r--r-- 1 root          root           41193 Mar 17 23:24 access-common.log
-rw-r--r-- 1 root          root          326844 Mar 17 23:24 access-json.log

I also checked the policy in the Fleet UI and it looks right too:

id: 351e41a0-8779-11eb-b6f2-5b7029a9751c
revision: 2
outputs:
  default:
    type: elasticsearch
    hosts:
      - 'http://elasticsearch:9200'
agent:
  monitoring:
    enabled: false
    logs: false
    metrics: false
inputs:
  - id: 72c2e917-0ad1-437d-b158-3dc216ee9e64
    name: traefik-access
    revision: 1
    type: logfile
    use_output: default
    meta:
      package:
        name: traefik
        version: 0.0.1
    data_stream:
      namespace: ep
    streams:
      - id: logfile-traefik.access-72c2e917-0ad1-437d-b158-3dc216ee9e64
        data_stream:
          dataset: traefik.access
          type: logs
        paths:
          - /tmp/service_logs/access-json.log
        exclude_files:
          - .gz$
        processors:
          - add_fields:
              target: ''
              fields:
                ecs.version: 1.8.0
fleet:
  kibana:
    protocol: http
    hosts:
      - 'kibana:5601'

I also checked that the ingest pipelines are loaded as expected:

GET _ingest/pipeline/*traefik*
{
  "logs-traefik.access-0.0.1" : { ... },
  "logs-traefik.access-0.0.1-format-json" : { ... },
  "logs-traefik.access-0.0.1-format-common" : { ... }
}

Note that pipeline tests are passing.

@ycombinator ycombinator marked this pull request as draft March 17, 2021 23:43
@mtojek
Copy link
Contributor

mtojek commented Mar 18, 2021

I looked into filebeat's logs and found this one (file: /usr/share/elastic-agent/data/elastic-agent-6eac40/logs/default/filebeat-json.log):

{"log.level":"warn","@timestamp":"2021-03-18T07:33:57.349Z","log.logger":"elasticsearch","log.origin":{"file.name":"elasticsearch/client.go","file.line":408},"message":"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc00cde7914066120, ext:83591463901, loc:(*time.Location)(0x637db40)}, Meta:{\"raw_index\":\"logs-traefik.access-ep\"}, Fields:{\"agent\":{\"ephemeral_id\":\"bfc8eb74-4ccb-48db-a0a3-798ec6c4751f\",\"hostname\":\"docker-fleet-agent\",\"id\":\"54d36dcf-c5da-4bbc-b565-6b76eccf5a6c\",\"name\":\"docker-fleet-agent\",\"type\":\"filebeat\",\"version\":\"7.13.0\"},\"data_stream\":{\"dataset\":\"traefik.access\",\"namespace\":\"ep\",\"type\":\"logs\"},\"ecs\":{\"version\":\"1.8.0\"},\"elastic_agent\":{\"id\":\"f5ccd53c-16eb-4897-bc3d-8f36e082bae0\",\"snapshot\":true,\"version\":\"7.13.0\"},\"event\":{\"dataset\":\"traefik.access\"},\"host\":{\"architecture\":\"x86_64\",\"containerized\":true,\"hostname\":\"docker-fleet-agent\",\"id\":\"4fb18e315b1f9797f4b6be5c35a3d150\",\"ip\":[\"192.168.96.6\"],\"mac\":[\"02:42:c0:a8:60:06\"],\"name\":\"docker-fleet-agent\",\"os\":{\"codename\":\"Core\",\"family\":\"redhat\",\"kernel\":\"4.9.184-linuxkit\",\"name\":\"CentOS Linux\",\"platform\":\"centos\",\"type\":\"linux\",\"version\":\"7 (Core)\"}},\"input\":{\"type\":\"log\"},\"log\":{\"file\":{\"path\":\"/tmp/service_logs/access-json.log\"},\"offset\":108959},\"message\":\"{\\\"BackendAddr\\\":\\\"192.168.112.2:80\\\",\\\"BackendName\\\":\\\"backend-backend-elastic-package-service\\\",\\\"BackendURL\\\":{\\\"Scheme\\\":\\\"http\\\",\\\"Opaque\\\":\\\"\\\",\\\"User\\\":null,\\\"Host\\\":\\\"192.168.112.2:80\\\",\\\"Path\\\":\\\"\\\",\\\"RawPath\\\":\\\"\\\",\\\"ForceQuery\\\":false,\\\"RawQuery\\\":\\\"\\\",\\\"Fragment\\\":\\\"\\\"},\\\"ClientAddr\\\":\\\"127.0.0.1:47172\\\",\\\"ClientHost\\\":\\\"127.0.0.1\\\",\\\"ClientPort\\\":\\\"47172\\\",\\\"ClientUsername\\\":\\\"-\\\",\\\"DownstreamContentSize\\\":421,\\\"DownstreamStatus\\\":200,\\\"DownstreamStatusLine\\\":\\\"200 OK\\\",\\\"Duration\\\":2427500,\\\"FrontendName\\\":\\\"Host-backend-elastic-package-service-docker-localhost-2\\\",\\\"OriginContentSize\\\":421,\\\"OriginDuration\\\":2337300,\\\"OriginStatus\\\":200,\\\"OriginStatusLine\\\":\\\"200 OK\\\",\\\"Overhead\\\":90200,\\\"RequestAddr\\\":\\\"backend.elastic-package-service.docker.localhost\\\",\\\"RequestContentSize\\\":0,\\\"RequestCount\\\":79,\\\"RequestHost\\\":\\\"backend.elastic-package-service.docker.localhost\\\",\\\"RequestLine\\\":\\\"GET / HTTP/1.1\\\",\\\"RequestMethod\\\":\\\"GET\\\",\\\"RequestPath\\\":\\\"/\\\",\\\"RequestPort\\\":\\\"-\\\",\\\"RequestProtocol\\\":\\\"HTTP/1.1\\\",\\\"RetryAttempts\\\":0,\\\"StartLocal\\\":\\\"2021-03-18T07:33:55.8656331Z\\\",\\\"StartUTC\\\":\\\"2021-03-18T07:33:55.8656331Z\\\",\\\"downstream_Content-Length\\\":\\\"421\\\",\\\"downstream_Content-Type\\\":\\\"text/plain; charset=utf-8\\\",\\\"downstream_Date\\\":\\\"Thu, 18 Mar 2021 07:33:55 GMT\\\",\\\"level\\\":\\\"info\\\",\\\"msg\\\":\\\"\\\",\\\"origin_Content-Length\\\":\\\"421\\\",\\\"origin_Content-Type\\\":\\\"text/plain; charset=utf-8\\\",\\\"origin_Date\\\":\\\"Thu, 18 Mar 2021 07:33:55 GMT\\\",\\\"request_Accept\\\":\\\"*/*\\\",\\\"request_User-Agent\\\":\\\"curl/7.67.0\\\",\\\"time\\\":\\\"2021-03-18T07:33:55Z\\\"}\"}, Private:file.State{Id:\"native::41512644-75\", PrevId:\"\", Finished:false, Fileinfo:(*os.fileStat)(0xc00056dc70), Source:\"/tmp/service_logs/access-json.log\", Offset:110356, Timestamp:time.Time{wall:0xc00cde645d613934, ext:644084001, loc:(*time.Location)(0x637db40)}, TTL:-1, Type:\"log\", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x2796ec4, Device:0x4b}, IdentifierName:\"native\"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"data stream timestamp field [@timestamp] is missing\"}}","ecs.version":"1.6.0"}

@ycombinator
Copy link
Contributor Author

Thanks for looking into it, @mtojek, appreciate it!

@ycombinator ycombinator marked this pull request as ready for review March 18, 2021 20:42
@ycombinator ycombinator requested review from a team and andrewkroh March 18, 2021 20:42
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the JSON support.

- append:
field: related.ip
value: "{{source.ip}}"
if: "ctx?.source?.ip != null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding allow_duplicates: false to the append processors for "related" fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why not do this for all append processors, e.g. the ones for event.category or event.type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in most cases it probably makes sense to deduplicate the append.

- append:
field: related.user
value: "{{user.name}}"
if: "ctx?.user?.name != null && ctx.user.name != '-'"
on_failure:
Copy link
Member

@andrewkroh andrewkroh Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears we have all the fields populated to support the community_id processor. Can you add that in?

edit: We'd need to set a static network.transport: tcp to make it work. But I think this makes sense for an http proxy.

field: destination.address
copy_from: destination.ip
if: "ctx?.destination?.ip != null"
- rename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestAddr looks like it could be used to populate url.domain.

@ycombinator ycombinator merged commit c83a14c into elastic:master Mar 23, 2021
@ycombinator ycombinator deleted the migrate-traefik-format-json branch March 23, 2021 13:55
@elasticmachine
Copy link

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #770 updated

  • Start Time: 2021-03-23T13:20:33.742+0000

  • Duration: 40 min 6 sec

  • Commit: 1e2f800876beecf0fea57d809e448f753e460ba3

Test stats 🧪

Test Results
Failed 0
Passed 1922
Skipped 3
Total 1925

Trends 🧪

Image of Build Times

Image of Tests

@ycombinator ycombinator restored the migrate-traefik-format-json branch March 26, 2021 16:12
@ycombinator ycombinator deleted the migrate-traefik-format-json branch May 18, 2021 21:45
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
…elastic#770)

* Migrating traefik module

* Formatting package files

* Removing invalid path field

* Adding categories

* Formatting tweaks

* Adding pipeline test files

* Adding YAML header

* Adding system tests

* Renaming pipeline test case files

* Fixing pipeline tests

* Adding sample event for health data set

* Adding system test for access data stream

* Adding README

* Starting to handle JSON formatted logs

* Adding ARG to Dockerfile for log format

* Adding sample JSON logs

* Running elastic-package format

* Removing host field from sample event

* Fix docker compose file

* Splitting in commonlog and json format pipelines

* Making pipeline test pass

* Updating README.md

* Address TODOs in pipeline

* Specify services in system test configs

* Refactoring out common processors into common pipeline

* Add @timestamp field

* Adding service to health data stream system test

* Adding CHANGELOG entries

* Parsing out event.duration

* Regenerating sample events

* Updating README

* Add allow_duplicates: false for related.* fields' append processors

* Adding community_id processor

* Populating url.domain

* Set allow_duplicates: false for other append processors

* Regenerating README
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate traefik integration
4 participants