Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second proposal for JSON support #1143

Merged
merged 1 commit into from Mar 22, 2016

Conversation

Projects
None yet
8 participants
@tsg
Copy link
Collaborator

commented Mar 12, 2016

I tried another option for #1069. The main change is that JSON processing now happens before multiline, so the order is:

  • Encoding decoding
  • JSON decoding
  • Multiline
  • Line/file filtering
  • Add custom fields
  • Generic filtering

The main advantage of this over #1069 is that it supports uses cases like Docker where normal log lines are wrapped in JSON. It should also work fine for most of the structured logging use cases.

Here is a sample config:

      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true

The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.

Compared to the #1069, this is more complex and contains a bit more corner cases (e.g. what happens if the text key is not a string) but the code is still simple enough I think.

This still requires the JSON objects to be one per line, but I think that's the safer assumption to make anyway (see comment from #1069).

@tsg tsg added the Filebeat label Mar 12, 2016

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 12, 2016

This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats.

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Mar 14, 2016

As far as I understand, this is the more powerful option of #1069. It has the same features but more. If no text_key is defined, will it behave like #1069?

@@ -140,6 +140,10 @@ filebeat:
# file is skipped, as the reading starts at the end. We recommend to leave this option on false
# but lower the ignore_older value to release files faster.
#force_close_files: false
json_decoder:

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 14, 2016

Collaborator

Not sure if we perhaps should just call it just json instead of json_decoder. It is shorter and will not get us into the discussion of adding further "decoders" :-)

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 16, 2016

Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR.

@tsg tsg force-pushed the tsg:json_support_take_two branch from f02867b to 59b5910 Mar 17, 2016

@andrewkroh

This comment has been minimized.

Copy link
Member

commented Mar 18, 2016

Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases.

Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming.

@tsg tsg added the review label Mar 18, 2016

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 18, 2016

This should be ready for reviews now. I want to squash before merging, so let me know when it looks good.

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2016

There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527

. /Users/travis/gopath/src/github.com/elastic/beats/filebeat/build/python-env/bin/activate; nosetests -w tests/system --process-timeout=90 --with-timer
.....................E........................
======================================================================
ERROR: Should be able to interpret docker logs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 28, in test_docker_logs
    max_timeout=10)
  File "../../../libbeat/tests/system/beat/beat.py", line 277, in wait_until
    "Waited {} seconds.".format(max_timeout))
Exception: Timeout waiting for 'cond' to be true. Waited 10 seconds.
TextKey string `config:"text_key"`
KeysUnderRoot bool `config:"keys_under_root"`
OverwriteKeys bool `config:"overwrite_keys"`
AddErrorKey bool `config:"add_error_key"`

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 21, 2016

Collaborator

Not sure if ew should shorten the config and just call it add_error, overwrite.

}

type JSONConfig struct {
TextKey string `config:"text_key"`

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 21, 2016

Collaborator

Should we call it MessageKey? Or Field Name? JSON Key? Not sure.

This comment has been minimized.

Copy link
@tsg

tsg Mar 21, 2016

Author Collaborator

I was also not sure about text_key. It's called "text" in the code, but now I realize that in the final document it uses "message", so probably message_key is better.

This comment has been minimized.

Copy link
@tsg

tsg Mar 21, 2016

Author Collaborator

Replaced with message_key.

// respecting the KeysUnderRoot and OverwriteKeys configuration options.
func mergeJSONFields(f *FileEvent, event common.MapStr) {
if f.JSONConfig.KeysUnderRoot {
for k, v := range f.JSONFields {

This comment has been minimized.

Copy link
@ruflin

ruflin Mar 21, 2016

Collaborator

At later stage we could probably reuse the method from the meta fields here.

This comment has been minimized.

Copy link
@tsg

tsg Mar 21, 2016

Author Collaborator

I thought about it, but as it's currently implemented it wouldn't have worked (assumes the container name is fields) and I didn't want to complicate that code.

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2016

LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file.

Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline?

@tsg tsg force-pushed the tsg:json_support_take_two branch 2 times, most recently from 4932b15 to abf4ef0 Mar 21, 2016

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 21, 2016

I think the test failure was due to a miss-placed ignore_older setting. I addressed the comments and squashed the whole thing into 1 commit. Lets wait for green.

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2016

LGTM. Waiting for green.

@urso

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2016

Can we add some more JSON multiline tests?

kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file.

@Painyjames

This comment has been minimized.

Copy link

commented Mar 21, 2016

any news about this being merge to master?

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 22, 2016

@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next.

return retLine
}

func (mlr *MultiLine) pushLine() Line {
content := mlr.content
sz := mlr.readBytes
fields := mlr.fields

This comment has been minimized.

Copy link
@urso

urso Mar 22, 2016

Collaborator

when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?

This comment has been minimized.

Copy link
@urso

urso Mar 22, 2016

Collaborator

What if in 'addLine' the next line adds some fields not seen in fist one?

This comment has been minimized.

Copy link
@tsg

tsg Mar 22, 2016

Author Collaborator

For simplicity I was thinking that all fields besides the message_key are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.

@urso

This comment has been minimized.

Copy link
Collaborator

commented Mar 22, 2016

LGTM.

Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future.

JSON support
JSON decoding happens before multiline, so the order of processing
is:

* Encoding decoding
* JSON decoding
* Multiline
* Line/file filtering
* Add custom fields
* Generic filtering

Here is a sample config:
```
      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true
```

The idea is that when configuring the JSON decoder, you can select a "message"
key that will be used in the next stages (multiline and line filtering). If you
don't choose a "message" key but still try to configure line filtering or
multiline, you will get a configuration error.

@tsg tsg force-pushed the tsg:json_support_take_two branch from 4472ac4 to ceb25bd Mar 22, 2016

@tsg

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 22, 2016

Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green.

monicasarbu added a commit that referenced this pull request Mar 22, 2016

Merge pull request #1143 from tsg/json_support_take_two
Second proposal for JSON support

@monicasarbu monicasarbu merged commit 6a66cc6 into elastic:master Mar 22, 2016

3 checks passed

CLA Commit author is a member of Elasticsearch
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@devinrsmith

This comment has been minimized.

Copy link

commented Mar 22, 2016

Are there any proposals for multiline json support?

I see in #1069 there are some comments about it.

IMO a new input_type is the best course of action.

I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides multiline support, but it's got to be configured on a log by log basis.

Using pretty printed JSON objects as log "lines" is nice because they are human readable.

Limiting the input to single line JSON objects limits the human usefulness of the log.

For example, here is a real-ish log line that I just grabbed:

{
    "primaryType": "ACTION",
    "diagnosticType": "com.example.server.endpoints.MyEndpoint",
    "requestTimestamp": "2016-03-22T20:18:25.281Z",
    "path": "actions/FD0IjHbzKoAkCz_NHr9bB___/messages",
    "method": "POST",
    "queryParams": {},
    "requestHeaders": {
        "Accept": [
            "application/json"
        ],
        "X-Forwarded-Proto": [
            "https",
            "https"
        ],
        "User-Agent": [
            "MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"
        ],
        "Host": [
            "v3-test.example.com",
            "v3-test.example.com"
        ],
        "Accept-Language": [
            "en-CA;q=1"
        ],
        "Content-Length": [
            "17"
        ],
        "Content-Type": [
            "application/json; charset=UTF-8"
        ]
    },
    "userId": "FDxnF4enX8EV1mIxwujCSv__",
    "profileId": "FDxnF4ezX8DV1mIxwujCS___",
    "actions": [],
    "responseTimestamp": "2016-03-22T20:18:25.287Z",
    "status": 204,
    "responseHeaders": {}
}

vs

{"primaryType":"ACTION","diagnosticType":"com.example.server.endpoints.MyEndpoint","requestTimestamp":"2016-03-22T20:18:25.281Z","path":"actions/FD0IjHbzKoAkCz_NHr9bB___/messages","method":"POST","queryParams":{},"requestHeaders":{"Accept":["application/json"],"X-Forwarded-Proto":["https","https"],"User-Agent":["MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"],"Host":["v3-test.example.com","v3-test.example.com"],"Accept-Language":["en-CA;q=1"],"Content-Length":["17"],"Content-Type":["application/json; charset=UTF-8"]},"userId":"FDxnF4enX8EV1mIxwujCSv__","profileId":"FDxnF4ezX8DV1mIxwujCS___","actions":[],"responseTimestamp":"2016-03-22T20:18:25.287Z","status": 204,"responseHeaders":{}}

The pretty printed JSON is much more human readable than the single line format :)

I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it.

@devinrsmith

This comment has been minimized.

Copy link

commented Mar 22, 2016

Created a new issue since I see this request has been merged :)

@asldevi

This comment has been minimized.

Copy link

commented May 3, 2016

any idea on when this is going to be released ?

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2016

This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat

@asldevi

This comment has been minimized.

Copy link

commented May 4, 2016

thank you so much for the info, @ruflin

@tsg tsg deleted the tsg:json_support_take_two branch Aug 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.