New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second proposal for JSON support #1143

Merged
merged 1 commit into from Mar 22, 2016

Conversation

Projects
None yet
8 participants
@tsg
Collaborator

tsg commented Mar 12, 2016

I tried another option for #1069. The main change is that JSON processing now happens before multiline, so the order is:

  • Encoding decoding
  • JSON decoding
  • Multiline
  • Line/file filtering
  • Add custom fields
  • Generic filtering

The main advantage of this over #1069 is that it supports uses cases like Docker where normal log lines are wrapped in JSON. It should also work fine for most of the structured logging use cases.

Here is a sample config:

      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true

The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.

Compared to the #1069, this is more complex and contains a bit more corner cases (e.g. what happens if the text key is not a string) but the code is still simple enough I think.

This still requires the JSON objects to be one per line, but I think that's the safer assumption to make anyway (see comment from #1069).

@tsg tsg added the Filebeat label Mar 12, 2016

@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 12, 2016

Collaborator

This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats.

Collaborator

tsg commented Mar 12, 2016

This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats.

@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin Mar 14, 2016

Collaborator

As far as I understand, this is the more powerful option of #1069. It has the same features but more. If no text_key is defined, will it behave like #1069?

Collaborator

ruflin commented Mar 14, 2016

As far as I understand, this is the more powerful option of #1069. It has the same features but more. If no text_key is defined, will it behave like #1069?

Show outdated Hide outdated filebeat/etc/beat.yml
@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 16, 2016

Collaborator

Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR.

Collaborator

tsg commented Mar 16, 2016

Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR.

@andrewkroh

This comment has been minimized.

Show comment
Hide comment
@andrewkroh

andrewkroh Mar 18, 2016

Member

Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases.

Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming.

Member

andrewkroh commented Mar 18, 2016

Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases.

Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming.

@tsg tsg added the review label Mar 18, 2016

@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 18, 2016

Collaborator

This should be ready for reviews now. I want to squash before merging, so let me know when it looks good.

Collaborator

tsg commented Mar 18, 2016

This should be ready for reviews now. I want to squash before merging, so let me know when it looks good.

@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin Mar 21, 2016

Collaborator

There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527

. /Users/travis/gopath/src/github.com/elastic/beats/filebeat/build/python-env/bin/activate; nosetests -w tests/system --process-timeout=90 --with-timer
.....................E........................
======================================================================
ERROR: Should be able to interpret docker logs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 28, in test_docker_logs
    max_timeout=10)
  File "../../../libbeat/tests/system/beat/beat.py", line 277, in wait_until
    "Waited {} seconds.".format(max_timeout))
Exception: Timeout waiting for 'cond' to be true. Waited 10 seconds.
Collaborator

ruflin commented Mar 21, 2016

There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527

. /Users/travis/gopath/src/github.com/elastic/beats/filebeat/build/python-env/bin/activate; nosetests -w tests/system --process-timeout=90 --with-timer
.....................E........................
======================================================================
ERROR: Should be able to interpret docker logs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 28, in test_docker_logs
    max_timeout=10)
  File "../../../libbeat/tests/system/beat/beat.py", line 277, in wait_until
    "Waited {} seconds.".format(max_timeout))
Exception: Timeout waiting for 'cond' to be true. Waited 10 seconds.
Show outdated Hide outdated filebeat/input/file.go
@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin Mar 21, 2016

Collaborator

LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file.

Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline?

Collaborator

ruflin commented Mar 21, 2016

LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file.

Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline?

@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 21, 2016

Collaborator

I think the test failure was due to a miss-placed ignore_older setting. I addressed the comments and squashed the whole thing into 1 commit. Lets wait for green.

Collaborator

tsg commented Mar 21, 2016

I think the test failure was due to a miss-placed ignore_older setting. I addressed the comments and squashed the whole thing into 1 commit. Lets wait for green.

@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin Mar 21, 2016

Collaborator

LGTM. Waiting for green.

Collaborator

ruflin commented Mar 21, 2016

LGTM. Waiting for green.

@urso

This comment has been minimized.

Show comment
Hide comment
@urso

urso Mar 21, 2016

Collaborator

Can we add some more JSON multiline tests?

kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file.

Collaborator

urso commented Mar 21, 2016

Can we add some more JSON multiline tests?

kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file.

@Painyjames

This comment has been minimized.

Show comment
Hide comment
@Painyjames

Painyjames Mar 21, 2016

any news about this being merge to master?

Painyjames commented Mar 21, 2016

any news about this being merge to master?

@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 22, 2016

Collaborator

@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next.

Collaborator

tsg commented Mar 22, 2016

@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next.

return retLine
}
func (mlr *MultiLine) pushLine() Line {
content := mlr.content
sz := mlr.readBytes
fields := mlr.fields

This comment has been minimized.

@urso

urso Mar 22, 2016

Collaborator

when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?

@urso

urso Mar 22, 2016

Collaborator

when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?

This comment has been minimized.

@urso

urso Mar 22, 2016

Collaborator

What if in 'addLine' the next line adds some fields not seen in fist one?

@urso

urso Mar 22, 2016

Collaborator

What if in 'addLine' the next line adds some fields not seen in fist one?

This comment has been minimized.

@tsg

tsg Mar 22, 2016

Collaborator

For simplicity I was thinking that all fields besides the message_key are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.

@tsg

tsg Mar 22, 2016

Collaborator

For simplicity I was thinking that all fields besides the message_key are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.

@urso

This comment has been minimized.

Show comment
Hide comment
@urso

urso Mar 22, 2016

Collaborator

LGTM.

Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future.

Collaborator

urso commented Mar 22, 2016

LGTM.

Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future.

JSON support
JSON decoding happens before multiline, so the order of processing
is:

* Encoding decoding
* JSON decoding
* Multiline
* Line/file filtering
* Add custom fields
* Generic filtering

Here is a sample config:
```
      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true
```

The idea is that when configuring the JSON decoder, you can select a "message"
key that will be used in the next stages (multiline and line filtering). If you
don't choose a "message" key but still try to configure line filtering or
multiline, you will get a configuration error.
@tsg

This comment has been minimized.

Show comment
Hide comment
@tsg

tsg Mar 22, 2016

Collaborator

Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green.

Collaborator

tsg commented Mar 22, 2016

Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green.

monicasarbu added a commit that referenced this pull request Mar 22, 2016

Merge pull request #1143 from tsg/json_support_take_two
Second proposal for JSON support

@monicasarbu monicasarbu merged commit 6a66cc6 into elastic:master Mar 22, 2016

3 checks passed

CLA Commit author is a member of Elasticsearch
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@devinrsmith

This comment has been minimized.

Show comment
Hide comment
@devinrsmith

devinrsmith Mar 22, 2016

Are there any proposals for multiline json support?

I see in #1069 there are some comments about it.

IMO a new input_type is the best course of action.

I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides multiline support, but it's got to be configured on a log by log basis.

Using pretty printed JSON objects as log "lines" is nice because they are human readable.

Limiting the input to single line JSON objects limits the human usefulness of the log.

For example, here is a real-ish log line that I just grabbed:

{
    "primaryType": "ACTION",
    "diagnosticType": "com.example.server.endpoints.MyEndpoint",
    "requestTimestamp": "2016-03-22T20:18:25.281Z",
    "path": "actions/FD0IjHbzKoAkCz_NHr9bB___/messages",
    "method": "POST",
    "queryParams": {},
    "requestHeaders": {
        "Accept": [
            "application/json"
        ],
        "X-Forwarded-Proto": [
            "https",
            "https"
        ],
        "User-Agent": [
            "MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"
        ],
        "Host": [
            "v3-test.example.com",
            "v3-test.example.com"
        ],
        "Accept-Language": [
            "en-CA;q=1"
        ],
        "Content-Length": [
            "17"
        ],
        "Content-Type": [
            "application/json; charset=UTF-8"
        ]
    },
    "userId": "FDxnF4enX8EV1mIxwujCSv__",
    "profileId": "FDxnF4ezX8DV1mIxwujCS___",
    "actions": [],
    "responseTimestamp": "2016-03-22T20:18:25.287Z",
    "status": 204,
    "responseHeaders": {}
}

vs

{"primaryType":"ACTION","diagnosticType":"com.example.server.endpoints.MyEndpoint","requestTimestamp":"2016-03-22T20:18:25.281Z","path":"actions/FD0IjHbzKoAkCz_NHr9bB___/messages","method":"POST","queryParams":{},"requestHeaders":{"Accept":["application/json"],"X-Forwarded-Proto":["https","https"],"User-Agent":["MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"],"Host":["v3-test.example.com","v3-test.example.com"],"Accept-Language":["en-CA;q=1"],"Content-Length":["17"],"Content-Type":["application/json; charset=UTF-8"]},"userId":"FDxnF4enX8EV1mIxwujCSv__","profileId":"FDxnF4ezX8DV1mIxwujCS___","actions":[],"responseTimestamp":"2016-03-22T20:18:25.287Z","status": 204,"responseHeaders":{}}

The pretty printed JSON is much more human readable than the single line format :)

I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it.

devinrsmith commented Mar 22, 2016

Are there any proposals for multiline json support?

I see in #1069 there are some comments about it.

IMO a new input_type is the best course of action.

I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides multiline support, but it's got to be configured on a log by log basis.

Using pretty printed JSON objects as log "lines" is nice because they are human readable.

Limiting the input to single line JSON objects limits the human usefulness of the log.

For example, here is a real-ish log line that I just grabbed:

{
    "primaryType": "ACTION",
    "diagnosticType": "com.example.server.endpoints.MyEndpoint",
    "requestTimestamp": "2016-03-22T20:18:25.281Z",
    "path": "actions/FD0IjHbzKoAkCz_NHr9bB___/messages",
    "method": "POST",
    "queryParams": {},
    "requestHeaders": {
        "Accept": [
            "application/json"
        ],
        "X-Forwarded-Proto": [
            "https",
            "https"
        ],
        "User-Agent": [
            "MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"
        ],
        "Host": [
            "v3-test.example.com",
            "v3-test.example.com"
        ],
        "Accept-Language": [
            "en-CA;q=1"
        ],
        "Content-Length": [
            "17"
        ],
        "Content-Type": [
            "application/json; charset=UTF-8"
        ]
    },
    "userId": "FDxnF4enX8EV1mIxwujCSv__",
    "profileId": "FDxnF4ezX8DV1mIxwujCS___",
    "actions": [],
    "responseTimestamp": "2016-03-22T20:18:25.287Z",
    "status": 204,
    "responseHeaders": {}
}

vs

{"primaryType":"ACTION","diagnosticType":"com.example.server.endpoints.MyEndpoint","requestTimestamp":"2016-03-22T20:18:25.281Z","path":"actions/FD0IjHbzKoAkCz_NHr9bB___/messages","method":"POST","queryParams":{},"requestHeaders":{"Accept":["application/json"],"X-Forwarded-Proto":["https","https"],"User-Agent":["MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"],"Host":["v3-test.example.com","v3-test.example.com"],"Accept-Language":["en-CA;q=1"],"Content-Length":["17"],"Content-Type":["application/json; charset=UTF-8"]},"userId":"FDxnF4enX8EV1mIxwujCSv__","profileId":"FDxnF4ezX8DV1mIxwujCS___","actions":[],"responseTimestamp":"2016-03-22T20:18:25.287Z","status": 204,"responseHeaders":{}}

The pretty printed JSON is much more human readable than the single line format :)

I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it.

@devinrsmith

This comment has been minimized.

Show comment
Hide comment
@devinrsmith

devinrsmith Mar 22, 2016

Created a new issue since I see this request has been merged :)

devinrsmith commented Mar 22, 2016

Created a new issue since I see this request has been merged :)

@asldevi

This comment has been minimized.

Show comment
Hide comment
@asldevi

asldevi May 3, 2016

any idea on when this is going to be released ?

asldevi commented May 3, 2016

any idea on when this is going to be released ?

@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin May 3, 2016

Collaborator

This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat

Collaborator

ruflin commented May 3, 2016

This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat

@asldevi

This comment has been minimized.

Show comment
Hide comment
@asldevi

asldevi May 4, 2016

thank you so much for the info, @ruflin

asldevi commented May 4, 2016

thank you so much for the info, @ruflin

@tsg tsg deleted the tsg:json_support_take_two branch Aug 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment