Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard way for nested record support #1578

Closed
repeatedly opened this issue May 19, 2017 · 11 comments
Closed

Standard way for nested record support #1578

repeatedly opened this issue May 19, 2017 · 11 comments

Comments

@repeatedly
Copy link
Member

repeatedly commented May 19, 2017

This is for fixing #649, #902 and related issues.

Currently, many plugins can't handle nested record because there is no standard way.
Many users want to access nested record. For example, grep, rewrite-tag-filter, parser and more plugins.
Fluentd core should provide the way to handle these cases.

Here is a starting point. We need the feedback and suggestion!

Features

  • Standard syntax for configuration
  • Provide plugin helper for nested record

Configuration

What is the best syntax for configuration? There are several approaches.
Use grep filter plugin as an example:

jsonpath like synax

regexp1 key1 cool  #record['key1']`
regexp2 $.key1.key2[0] hot # record['key1']['key2'][0]

. separated key

regexp1 key1 cool  #record['key1']`
regexp2 key1.key2.0 hot # record['key1']['key2'][0]

How to handle record["key1"]["key2"] and record["key1.key2"] pattern?

support array

regexp1 key1 cool  #record['key1']`
regexp2 ['key1', 'key2', 0] hot # record['key1']['key2'][0]

record_transformer with enable_ruby syntax

regexp1 record["key1"] cool  #record['key1']`
regexp2 record["key1"]["key2"][0] hot # record['key1']['key2'][0]

key1 is same as record["key"] for backward compatibility

or good syntax?

Plugin helper

For internal implementation, We can use dig method to access nested values since ruby 2.4. For ruby 2.2 or earlier, we can use ruby_dig or backport_dig.

For plugin API, there are several approaches:

Return accessor object

accessor = create_record_accessor(conf_param) # conf_param is $.key1.key2[0]
value = accessor.call(record) # access record['key1']['key2'][0]

Provide access method

This is similar to inject helper.

create_record_accessor(:name, conf_param) # conf_param is $.key1.key2[0]
value = access_record(:name, record) # access record['key1']['key2'][0]

Plugin helper needs to maintain accessor list internally.

or good API?

@sonots
Copy link
Member

sonots commented May 19, 2017

+1 for jsonpath-like syntax, but we need to add bracket–notation to support a key containing . as

regexp1 key1 cool  # record['key1']`
regexp2 $.key1.key2[0] hot # record['key1']['key2'][0]
regexp3 $['key1']['this.is.key3'] warm # record['key1']['this.is.key3']

@chancez
Copy link

chancez commented May 19, 2017

I'm a big fan of the enable_ruby approach because it generally can help solve this, but also offers a lot of other niceties I'd like to see in other plugins.

That said, I think it's probably a bit of a bigger scope and I'd still like to see this handled without the enable_ruby approach which can have potentially much higher overhead if all you want to do is access nested keys.

Overall I think jsonpath looks very reasonable, although feels somewhat inconsistent with how you access fields that aren't nested which I dislike, but I guess there's not many options unless you throw out backwards compatibility.

@repeatedly
Copy link
Member Author

No more comment. So I wrote a patch for it: #1637

@chancez
Copy link

chancez commented Jul 19, 2017

Awesome, I like that it's got the two notations to handle cases with dots, and such.

@jhuenges
Copy link

@repeatedly how can this be used with remove_keys in the record_transformer filter?

@repeatedly
Copy link
Member Author

@jhuenges Now way for now because this helper doesn't provide nested deletion.

@jhuenges
Copy link

Is there any way to delete something that is not on the top level of a json?
Example:

 {
  "request": {
    "headers": {
       ...
       "cookie": "abcdf",
       ...
    }
  }

And I want to remove the request.headers.cookie but not all other elements of request.headers.

@dannyk81
Copy link

dannyk81 commented Dec 19, 2017

@jhuenges Were you able to find a solution for this?

@repeatedly I have a similar issue, we have this structure:

{
        "timestamp": "1513650893625",
        "level": "FATAL",
        "http": {
            "user_agent": "Apache-HttpClient/4.3.3 (java 1.5)",
            "method": "POST",
            "env": {
                "REQUEST_START_TIME": 1513650858,
                "HTTP_ACCEPT_ENCODING": "gzip,deflate",
                "HTTP_CONNECTION": "close",
                "HTTP_CONTENT_LENGTH": "231",
                "HTTP_CONTENT_TYPE": "application/json; charset=UTF-8"
           }
       }
}

From above, I would like remove from the record ["http"].["env"] entirely.

so the result shoule be:

{
        "timestamp": "1513650893625",
        "level": "FATAL",
        "http": {
            "user_agent": "Apache-HttpClient/4.3.3 (java 1.5)",
            "method": "POST",
       }
}

@dannyk81
Copy link

dannyk81 commented Dec 19, 2017

Ok, I figured it out :)

Seems it was failing because not all records have the http key, so I used the trick mentioned by @repeatedly to check if it exists and delete the nested http.env key if true.

<filter udp.php>
  @type record_modifier
  <record>
    _dummy_ ${if record.has_key?('http'); record['http'].delete('env') ; end; nil}
  </record>
  remove_keys _dummy_
</filter>

@repeatedly
Copy link
Member Author

Nested delete support: #1800

@vikranth06
Copy link

can we update nested fields of a json??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants