Add Elasticsearch output plugin (#374)#923
Conversation
max-vogler
left a comment
There was a problem hiding this comment.
Thank you for this amazing pull request! I'm pre-approving, because there are only minor code health changes to be made and I am out of office next week. One of my teammates can merge next week or I can merge the week after when I'm back.
|
Ack on all accounts - debug print is an obvious oopsie. I’ll try to figure
out a... less bad way to decompose the query data. I contemplated making a
custom object just to represent this relationship, but that felt like it
would be taking away from the readability of the test.
I see various python2 interop stuff throughout the code base, so I’ll use
`format` in case there’s someone still relying on that.
…On Fri, Apr 9, 2021 at 4:47 AM Max Vogler ***@***.***> wrote:
***@***.**** approved this pull request.
Thank you for this amazing pull request! I'm pre-approving, because there
are only minor code health changes to be made and I am out of office next
week. One of my teammates can merge next week or I can merge the week after
when I'm back.
------------------------------
In
grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py
<#923 (comment)>:
> + patcher = mock.patch.object(requests, 'post')
+
+ with patcher as patched:
+ plugin.ProcessResponses(plugin_state, messages)
+ plugin.Flush(plugin_state)
+ plugin.UpdateState(plugin_state)
+
+ return patched
+
+ def _ParseEvents(self, patched):
+ request = patched.call_args[KWARGS]['data']
+
+ # Elasticsearch bulk requests are line-deliminated pairs, where the first
+ # line is the index command and the second is the actual document to index
+ split_request = request.split('\n')
+ print(split_request)
Please remove the print() debugging statement.
------------------------------
In
grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py
<#923 (comment)>:
> + plugin.ProcessResponses(plugin_state, messages)
+ plugin.Flush(plugin_state)
+ plugin.UpdateState(plugin_state)
+
+ return patched
+
+ def _ParseEvents(self, patched):
+ request = patched.call_args[KWARGS]['data']
+
+ # Elasticsearch bulk requests are line-deliminated pairs, where the first
+ # line is the index command and the second is the actual document to index
+ split_request = request.split('\n')
+ print(split_request)
+ update_pairs = [(split_request[i], split_request[i + 1])
+ for i in range(0, len(split_request), 2)]
+ parsed_pairs = [(json.Parse(i[0]), json.Parse(i[1])) for i in update_pairs]
Instead of deconstructing the tuple, parsing the elements, and
reconstructing a tuple, could you call json.Parse on the whole list in
split_requests? This might simplify the code a little, e.g.:
split_request = [json.Parse(line) for line in request.split("\n")]
------------------------------
In grr/server/grr_response_server/output_plugins/elasticsearch_plugin.py
<#923 (comment)>:
> + def _SendEvents(self, events: List[JsonDict]) -> None:
+ """Uses the Elasticsearch bulk API to index all the events in a single
+ request.
+ https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docs-bulk.html
+ """
+ if self._token:
+ headers = {"Authorization": "Basic {}".format(self._token)}
+ else:
+ headers = {}
+
+ index_command = json.Dump({"index": {"_index": self._index}}, indent=None)
+
+ # Each index operation is two lines, the first defining the index settings,
+ # the second is the actual document to be indexed
+ data = "\n".join(
+ index_command + "\n" + json.Dump(event, indent=None)
Please use string formatting (f-strings or format()) instead of manual
string concatenation, e.g.:
data = "\n".join(
f"{index_command}\n{json.Dump(event, indent=None)}"
for event in events)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#923 (review)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTCAGLTDMXI32KB3P2S36DTH3SM5ANCNFSM42UGAN2Q>
.
|
panhania
left a comment
There was a problem hiding this comment.
Thanks for the pull request! I left some comments in addition to the @max-vogler's review.
| if patcher is None: | ||
| patcher = mock.patch.object(requests, 'post') |
There was a problem hiding this comment.
You can look into the responses package (used for example here).
There was a problem hiding this comment.
Probably going to leave this alone for now, unless you think it should be blocking.
|
Pushed in the updates discussed above - mostly removing python2 compatibility stuff, adding docs for the new As noted above, I didn't port the tests to use |
|
I suggest merging without the |
|
Sounds reasonable to me. |
|
Thanks a lot for your contribution @micrictor, well done! If you like, send me a PR that adds your name and email to ACKNOWLEDGEMENTS. |
Creates an Elasticsearch output plugin using the
_bulkAPI.In the config file, the GRR Admin sets:
When executing the flow, the operator can specify:
Note that, as a collateral change, I had to add an optional
indentparameter to thejson.Dumpmethod to avoid messing up the line-delimited nature of the Elasticsearch API.This resolves #374