Adding results.JSONReader #435

yaleman · 2022-02-12T03:28:04Z

results.ResultsReader is slow because it's iterating byte-by-byte through the stream to parse the XML in a way the chosen parser will be happy. I've added JSONResultsReader to provide a much more performant option.

Benefits:

Less data transferred over the wire.
~99% faster on most things I can test.

Other changes in this commit

Removed unnecessary imports.
Fixed some places where indenting was 2 spaces, not 4.
Renamed some input variables which aren't used by name, to match Python standards.

Test Data

Running the tests from https://github.com/yaleman/splunk-sdk-games/

(you can just clone the repo, configure it and run ./run_tests.sh)

Generating the local files, so that we're not testing the response of my Splunk instance:

➜ ./run_tests.sh
Generating local JSON data

real	0m25.155s
user	0m0.532s
sys	0m0.661s

Generating local XML data

real	0m29.173s
user	0m0.991s
sys	0m1.462s

File sizes

-rw-r--r--  1 yaleman  staff   161M 12 Feb 13:17 outputfile.json
-rw-r--r--  1 yaleman  staff   374M 12 Feb 13:17 outputfile.xml

Using results.ResultsReader and the job in XML output format:

Running test_file_resultsreader.py
message:  "INFO: Your timerange was substituted based on your search string"
Results: 113625
Preview results: 9963

real	2m2.416s
user	2m0.997s
sys	0m0.405s

Using results.JSONResultsReader and the job in JSON output format:

Running test_file_jsonreader.py
Results: 113624
Preview results: 12500

real	0m1.023s
user	0m0.856s
sys	0m0.090s

The 1 "missing" result is the Message that the JSON export endpoint doesn't return.

AndyXan · 2022-02-18T11:04:45Z

Hmm. Just for clarification:

If I'm just issuing (non-export, just regular) querys via the splunk sdk and getting a job back

job = self.service.jobs.create(search_query, **search_params)
current_results = job.results(output_mode='json', count=50000, offset=0)
reader = results.JSONResultsReader(current_results)

The API returns a dict containing messages = [...] and results [...], is_preview and so on.

Your implementation of

    def _parse_results(self, stream: BufferedReader):
        """Parse results and messages out of *stream*."""
        for line in stream.readlines():

            event = json_loads(line)
            if "preview" in event:
                self.is_preview = event["preview"]
            if "msg" in event:
                msg_type = event.get("type", "Uknown Message Type")
                text = event.get("text")
                yield Message(msg_type, text)
            yield event

Would not return single events. The "single" event would consist of arrays (results) containing the 50.000 events. This is rather different from

current_results = job.results(output_mode='xml', count=50000, offset=0)
reader = results.ResultsReader(io.BufferedReader(ResponseReaderWrapper(current_results)))

for result in reader:
    if isinstance(result, dict):
        log.info("Result: %s" % result)
    elif isinstance(result, results.Message):
        log.info("Message: %s" % result)
log.info("is_preview = %s " % reader.is_preview)

which would print 50.000 log lines. Above solution would log 1 time, giant dict with key results containing 50.000 events

AndyXan · 2022-02-18T12:02:56Z

This is more of what I would expect:

    def _parse_results(self, stream: BufferedReader):
        """Parse results and messages out of *stream*."""
        for line in stream.readlines():

            bulk = json_loads(line)
            if "preview" in bulk:
                self.is_preview = bulk["preview"]

            for message in bulk.get('messages', []):
                msg_type = message.get("type", "Uknown Message Type")
                text = message.get("text")
                yield Message(msg_type, text)

            for event in bulk.get('results', []):
                yield event

…chance of using a modern IDE

yaleman · 2022-02-19T03:51:39Z

I think 78079ae updates the code to something that should work.

I've updated my test code over in https://github.com/yaleman/splunk-sdk-games so that seems to work too.

Running test_file_jsonreader.py
RESULT_COUNT=113624
MESSAGE_COUNT=0
PREVIEW_COUNT=12500

real	0m1.034s
user	0m0.873s
sys	0m0.089s

Running test_file_jsonreader_create.py
RESULT_COUNT=113624
MESSAGE_COUNT=12
PREVIEW_COUNT=0

real	0m1.035s
user	0m0.837s
sys	0m0.140s

Running test_file_resultsreader.py
RESULT_COUNT=113624
MESSAGE_COUNT=1
PREVIEW_COUNT=25000

real	2m17.480s
user	2m16.443s
sys	0m0.706s

If someone can provide a way of testing it in docker or something that'd be cool but I think it works.

I can't run the built-in tests because python 3.7 on an m1 macBook doesn't have _ctypes and I really don't want to spend more hours building VMs to test a library I don't even use.

ashah-splunk · 2022-03-04T09:38:52Z

@yaleman thanks for the PR. We are considering the provided suggestion to use JSONResultsReader and exploring more on the same.

ashah-splunk · 2022-03-30T05:05:39Z

@yaleman we have considered your changes for JSONResultsReader and have added some modifications as well which are now available in the latest Python SDK release 1.6.19.

ncanumalla-splunk and others added 3 commits December 21, 2021 13:57

Create initial pr_template for SDKs

62f62ca

Add metadata for PR template

ef88e9d

adding JSONReader, cleaning up missing packages, indenting

f05a30b

ashah-splunk changed the base branch from master to develop February 15, 2022 16:13

yaleman added 2 commits February 19, 2022 00:03

fixed a typo

f7ad727

trying to add handler for job.create, added typing to try and have a …

78079ae

…chance of using a modern IDE

aligning closer to develop branch

8ddbd43

ashah-splunk changed the base branch from develop to json-resultreader March 4, 2022 09:05

ashah-splunk closed this Mar 30, 2022

yaleman deleted the jsonreader branch April 10, 2022 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding results.JSONReader #435

Adding results.JSONReader #435

Uh oh!

yaleman commented Feb 12, 2022 •

edited

Loading

Uh oh!

AndyXan commented Feb 18, 2022 •

edited

Loading

Uh oh!

AndyXan commented Feb 18, 2022

Uh oh!

yaleman commented Feb 19, 2022

Uh oh!

ashah-splunk commented Mar 4, 2022

Uh oh!

ashah-splunk commented Mar 30, 2022

Uh oh!

Uh oh!

Adding results.JSONReader #435

Adding results.JSONReader #435

Uh oh!

Conversation

yaleman commented Feb 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other changes in this commit

Test Data

Uh oh!

AndyXan commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyXan commented Feb 18, 2022

Uh oh!

yaleman commented Feb 19, 2022

Uh oh!

ashah-splunk commented Mar 4, 2022

Uh oh!

ashah-splunk commented Mar 30, 2022

Uh oh!

Uh oh!

yaleman commented Feb 12, 2022 •

edited

Loading

AndyXan commented Feb 18, 2022 •

edited

Loading