Python ResultsReader class is incredibly slow

_It's possible that I don't understand what all the ResultsReader class is doing, so take this issue with a grain of salt._

When using the `ResultsReader` class to get results from a Splunk search, as indicated here: https://github.com/splunk/splunk-sdk-python/blob/master/splunklib/results.py#L173-L181, it takes an incredibly long time to get all the results. Using the `jobs.results` function is orders of magnitude faster. 

For example, on a search with 175k results, it takes 4+ minutes to get the results with `ResultsReader` objects, and 3.7 seconds with the `results` function. The following snippet shows what I'm talking about:

    import splunklib.results as results
    import splunklib.client as client
    from datetime import datetime
    import json
    
    
    splunk_object = client.connect(
        host="host",
        port="port",
        username="username",
        password="password",
        app="app",
        verify=True,
        autologin=True)
    
    spl = '| makeresults count=175000'
    
    splunk_search_kwargs = {"exec_mode": "blocking",
                            "earliest_time": "-48h",
                            "latest_time": "now",
                            "enable_lookups": "true"}
    
    splunk_search_job = splunk_object.jobs.create(spl, **splunk_search_kwargs)
    
    
    start_time_json = datetime.now()
    # Get the results from the Splunk search
    search_results_json = []
    # log_general.debug("Getting Splunk search results.")
    get_offset = 0
    max_get = 49000
    result_count = int(splunk_search_job['resultCount'])
    while (get_offset < result_count):
        r = splunk_search_job.results(**{"count": max_get, "offset": get_offset, "output_mode": "json"})
        obj = json.loads(r.read())
        search_results_json.extend(obj['results'])
        get_offset += max_get
    # log_general.debug("Found %d results" % len(search_results))
    
    end_time_json = datetime.now()
    
    
    start_time = datetime.now()
    # Get the results from the Splunk search
    search_results = []
    # log_general.debug("Getting Splunk search results.")
    get_offset = 0
    max_get = 49000
    result_count = int(splunk_search_job['resultCount'])
    while (get_offset < result_count):
        rr = results.ResultsReader(splunk_search_job.results(**{"count": max_get, "offset": get_offset}))
        for result in rr:
            if isinstance(result, results.Message):
                # Diagnostic messages may be returned in the results
                print '%s: %s' % (result.type, result.message)
            elif isinstance(result, dict):
                # Normal events are returned as dicts
                search_results.append(result)
        get_offset += max_get
    # log_general.debug("Found %d results" % len(search_results))
    
    end_time = datetime.now()
    
    print ("ResultsReader time: %s" % (end_time-start_time).seconds)
    print ("json_results time: %s" % (end_time_json-start_time_json).seconds)

Is `ResultsReader` doing anything special that I miss out on by just getting the results is json mode directly? I know that `ResultsReader` uses XML under the hood, but that doesn't really matter to me; at the end of the day, I just need the results in a python object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python ResultsReader class is incredibly slow #223

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python ResultsReader class is incredibly slow #223

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions