New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split filter: ability to split arrays within a JSON structure #787

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@20goto10

20goto10 commented Nov 19, 2013

An update to the split filter which allows the user to divide an array within the JSON hash. The contents are split into multiple entries and yielded back for further processing. (Note: LogStash is not successfully refiltering new items yet, so they cannot be filtered a second time, but the results of the split do get pushed into the output stack.)

This is useful when the log data contains multiple event entries as an array.

The "reuse_element" config option determines whether the data is kept within the hash under its original key name (e.g. "Records"), or is converted into a base JSON structure. If "reuse_element" true, the array element within the hash is overwritten with the value of one of its elements. E.g. "Records" => [1,2,3,4] will lead to 4 different events, each one with "Records" => 1 (etc.)... If false, each individual array element would be returned into the results as a single hash with no key (this was my purpose in making the change, as my events are all together in one big array within the JSON).

Sample config section-- in this scenario, log data was returned as a single row of JSON data under the hash key "message". "Message" contained only the "Records" field, pointing to more JSON presented as an array of hashes. This configuration will allow the user to separate those records back into individual events. Since LogStash does not currently correctly re-filter "created events" (new events spawned by a filter, e.g. split) the below configuration's "date" filter section is never reached. I will be addressing this in a separate Pull Request.

filter {
if !("splitted" in [tags])
{
json {
source => "message"
}
split {
field => "Records"
reuse_element => false
add_tag => ["splitted"]
}
}
if ("splitted" in [tags])
{
date {
add_tag => ["dated"]
match => ["eventTime", "ISO8601"]
}
}
}

@untergeek

This comment has been minimized.

Show comment
Hide comment
@untergeek

untergeek Nov 20, 2013

Member

Hi @20goto10,

This is a great idea. :)

That said, we feel that this is a better fit for the split filter. We would be willing to help refactor the split filter to include this functionality. Please let us know how we can help you! :)

Thanks for making Logstash awesome! :)

Member

untergeek commented Nov 20, 2013

Hi @20goto10,

This is a great idea. :)

That said, we feel that this is a better fit for the split filter. We would be willing to help refactor the split filter to include this functionality. Please let us know how we can help you! :)

Thanks for making Logstash awesome! :)

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 20, 2013

To me it seems logical to handle multi-event arrays within JSON when the data is designed that way and perfectly well-formed. The internal JSON parser was seeing the array data correctly before this plugin, but lumped the results together into one giant event entry instead of splitting it (which makes sense since it was never explicitly told to do otherwise). Feeding those arrays into the split filter as text just makes things more complicated... Keeping the array split feature within the JSON plugin makes things vastly simpler for users. I say this after days of wrestling with the configuration.

20goto10 commented Nov 20, 2013

To me it seems logical to handle multi-event arrays within JSON when the data is designed that way and perfectly well-formed. The internal JSON parser was seeing the array data correctly before this plugin, but lumped the results together into one giant event entry instead of splitting it (which makes sense since it was never explicitly told to do otherwise). Feeding those arrays into the split filter as text just makes things more complicated... Keeping the array split feature within the JSON plugin makes things vastly simpler for users. I say this after days of wrestling with the configuration.

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 20, 2013

On second consideration I'm taking a stab at incorporating this into the split filter.

20goto10 commented Nov 20, 2013

On second consideration I'm taking a stab at incorporating this into the split filter.

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 20, 2013

Updated this to use the split plugin instead of the json plugin. It seems to work fine, although this will change the behavior of split when it encounters data that is already an array-- which is probably fine.

I added the "reuse_element" flag. If true, the array element within the hash is overwritten with the value of one of its elements. E.g. "Records" => [1,2,3,4] will lead to 4 different events, each one with "Records" => 1 (etc.)... If false, each individual array element would be returned into the results as a single hash with no key (this was my purpose in creating the plugin).

Events created from the split are not being re-filtered, which is a problem for getting the time stamps from the array elements (for example). I am working on another modification to LogStash that will allow events to be re-filtered after a split event... this will be a separate pull request.

20goto10 commented Nov 20, 2013

Updated this to use the split plugin instead of the json plugin. It seems to work fine, although this will change the behavior of split when it encounters data that is already an array-- which is probably fine.

I added the "reuse_element" flag. If true, the array element within the hash is overwritten with the value of one of its elements. E.g. "Records" => [1,2,3,4] will lead to 4 different events, each one with "Records" => 1 (etc.)... If false, each individual array element would be returned into the results as a single hash with no key (this was my purpose in creating the plugin).

Events created from the split are not being re-filtered, which is a problem for getting the time stamps from the array elements (for example). I am working on another modification to LogStash that will allow events to be re-filtered after a split event... this will be a separate pull request.

Show outdated Hide outdated lib/logstash/filters/split.rb
return if splits.length == 1
#or splits[1].empty?
# Skip filtering if splitting this event resulted in only one thing found
return event if splits.length <= 1

This comment has been minimized.

@jordansissel

jordansissel Nov 22, 2013

Contributor

I might be going nutty, but I'm not sure I understand why you return event here?

@jordansissel

jordansissel Nov 22, 2013

Contributor

I might be going nutty, but I'm not sure I understand why you return event here?

This comment has been minimized.

@20goto10

20goto10 Nov 22, 2013

Er... because of a careless mistake!

@20goto10

20goto10 Nov 22, 2013

Er... because of a careless mistake!

Show outdated Hide outdated lib/logstash/filters/split.rb
event_split = event.clone
@logger.debug("Split event", :value => value, :field => @field)
event_split[@field] = value
event_split = nil

This comment has been minimized.

@jordansissel

jordansissel Nov 22, 2013

Contributor

curious, why set this to nil when you set it below?

@jordansissel

jordansissel Nov 22, 2013

Contributor

curious, why set this to nil when you set it below?

This comment has been minimized.

@20goto10

20goto10 Nov 22, 2013

Actually, it was a Ruby mistake on my part... I was thinking event_split needed to be declared outside the block.

@20goto10

20goto10 Nov 22, 2013

Actually, it was a Ruby mistake on my part... I was thinking event_split needed to be declared outside the block.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 22, 2013

Contributor

Thank you for updating this for the split filter. The code looks good to me with exception to the minor comments I made (which are super minor, just looking for clarification)

Get back to me and we can merge this! :)

Contributor

jordansissel commented Nov 22, 2013

Thank you for updating this for the split filter. The code looks good to me with exception to the minor comments I made (which are super minor, just looking for clarification)

Get back to me and we can merge this! :)

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 22, 2013

Should be good to go now.

20goto10 commented Nov 22, 2013

Should be good to go now.

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Dec 10, 2013

Hi Jordan, I was curious if you've had a chance to look into merging
this...

20goto10 commented Dec 10, 2013

Hi Jordan, I was curious if you've had a chance to look into merging
this...

@electrical

This comment has been minimized.

Show comment
Hide comment
@electrical

electrical Dec 13, 2013

Contributor

@20goto10 we will be having a Merge Party soon again and will take a look to this PR then.
Thanks for making logstash more awesome! :-)

Contributor

electrical commented Dec 13, 2013

@20goto10 we will be having a Merge Party soon again and will take a look to this PR then.
Thanks for making logstash more awesome! :-)

@gposton

This comment has been minimized.

Show comment
Hide comment
@gposton

gposton Jan 22, 2014

+1 on merging this. I'd like to use it to parse cloudtrail logs.

gposton commented Jan 22, 2014

+1 on merging this. I'd like to use it to parse cloudtrail logs.

@wiibaa wiibaa referenced this pull request Apr 30, 2014

Closed

add json array process #1185

@jordansissel jordansissel added the O(1) label Aug 19, 2014

@elasticsearch-release

This comment has been minimized.

Show comment
Hide comment
@elasticsearch-release

elasticsearch-release Aug 26, 2014

Can one of the admins verify this patch?

elasticsearch-release commented Aug 26, 2014

Can one of the admins verify this patch?

@suyograo

This comment has been minimized.

Show comment
Hide comment
Member

suyograo commented Feb 3, 2015

@suyograo suyograo closed this Feb 3, 2015

@suyograo

This comment has been minimized.

Show comment
Hide comment
@suyograo

suyograo Feb 3, 2015

Member

@20goto10 thanks for your contribution. it has been folded into logstash-plugins/logstash-filter-split#1

Member

suyograo commented Feb 3, 2015

@20goto10 thanks for your contribution. it has been folded into logstash-plugins/logstash-filter-split#1

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Feb 4, 2015

Thanks Suyog & all the other people who worked on this. I'm glad to see
this change made it into Logstash.

On Mon, Feb 2, 2015 at 9:22 PM, Suyog Rao notifications@github.com wrote:

@20goto10 https://github.com/20goto10 thanks for your contribution. it
has been folded into logstash-plugins/logstash-filter-split#1
logstash-plugins/logstash-filter-split#1


Reply to this email directly or view it on GitHub
#787 (comment)
.

20goto10 commented Feb 4, 2015

Thanks Suyog & all the other people who worked on this. I'm glad to see
this change made it into Logstash.

On Mon, Feb 2, 2015 at 9:22 PM, Suyog Rao notifications@github.com wrote:

@20goto10 https://github.com/20goto10 thanks for your contribution. it
has been folded into logstash-plugins/logstash-filter-split#1
logstash-plugins/logstash-filter-split#1


Reply to this email directly or view it on GitHub
#787 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment