Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Split filter: ability to split arrays within a JSON structure #787
An update to the split filter which allows the user to divide an array within the JSON hash. The contents are split into multiple entries and yielded back for further processing. (Note: LogStash is not successfully refiltering new items yet, so they cannot be filtered a second time, but the results of the split do get pushed into the output stack.)
This is useful when the log data contains multiple event entries as an array.
The "reuse_element" config option determines whether the data is kept within the hash under its original key name (e.g. "Records"), or is converted into a base JSON structure. If "reuse_element" true, the array element within the hash is overwritten with the value of one of its elements. E.g. "Records" => [1,2,3,4] will lead to 4 different events, each one with "Records" => 1 (etc.)... If false, each individual array element would be returned into the results as a single hash with no key (this was my purpose in making the change, as my events are all together in one big array within the JSON).
Sample config section-- in this scenario, log data was returned as a single row of JSON data under the hash key "message". "Message" contained only the "Records" field, pointing to more JSON presented as an array of hashes. This configuration will allow the user to separate those records back into individual events. Since LogStash does not currently correctly re-filter "created events" (new events spawned by a filter, e.g. split) the below configuration's "date" filter section is never reached. I will be addressing this in a separate Pull Request.
This is a great idea. :)
That said, we feel that this is a better fit for the split filter. We would be willing to help refactor the split filter to include this functionality. Please let us know how we can help you! :)
Thanks for making Logstash awesome! :)
To me it seems logical to handle multi-event arrays within JSON when the data is designed that way and perfectly well-formed. The internal JSON parser was seeing the array data correctly before this plugin, but lumped the results together into one giant event entry instead of splitting it (which makes sense since it was never explicitly told to do otherwise). Feeding those arrays into the split filter as text just makes things more complicated... Keeping the array split feature within the JSON plugin makes things vastly simpler for users. I say this after days of wrestling with the configuration.
Updated this to use the split plugin instead of the json plugin. It seems to work fine, although this will change the behavior of split when it encounters data that is already an array-- which is probably fine.
I added the "reuse_element" flag. If true, the array element within the hash is overwritten with the value of one of its elements. E.g. "Records" => [1,2,3,4] will lead to 4 different events, each one with "Records" => 1 (etc.)... If false, each individual array element would be returned into the results as a single hash with no key (this was my purpose in creating the plugin).
Events created from the split are not being re-filtered, which is a problem for getting the time stamps from the array elements (for example). I am working on another modification to LogStash that will allow events to be re-filtered after a split event... this will be a separate pull request.
referenced this pull request
Jan 29, 2015
Thanks Suyog & all the other people who worked on this. I'm glad to see
On Mon, Feb 2, 2015 at 9:22 PM, Suyog Rao email@example.com wrote: