Support bytearray for final processor output #344

blthayer · 2021-08-06T17:38:21Z

Use case: For performance in concatenating bytes, one might want their final processor to output a bytearray instead of bytes. This blog post illustrates that using a bytearray is the most performant way to concatenate bytes in Python (with all the caveats of sharing results from a blog post that haven't been verified locally, and the experiment can't exactly be considered comprehensive or rigorous).

Use of bytearray output is currently unsupported, as can be seen in _base.py:

...
        if isinstance(event_dict, (str, bytes)):
            return (event_dict,), {}
        elif isinstance(event_dict, tuple):
            # In this case we assume that the last processor returned a tuple
            # of ``(args, kwargs)`` and pass it right through.
            return event_dict  # type: ignore
        elif isinstance(event_dict, dict):
            return (), event_dict
        else:
            raise ValueError(
                "Last processor didn't return an appropriate value.  Valid "
                "return values are a dict, a tuple of (args, kwargs), bytes, "
                "or a str."
            )
...

I'll be honest in that I'm not an expert in byte handling in Python, but it seems like bytearray quacks like bytes according to the docs:

In a very crude and naive local test, I modified this line in structlog from:

if isinstance(event_dict, (str, bytes)):

to:

if isinstance(event_dict, (str, bytes, bytearray)):

and my stdout bytes logging still worked (configured identically to this example in the Performance section of the structlog docs).

The text was updated successfully, but these errors were encountered:

hynek · 2021-08-08T14:18:13Z

Yes, the change should be as simple as you say. May I ask how you'd intend to use it? I've added bytes specifically to support orjson's performance gains. Is there something out there spewing out bytearrays? 🤔

blthayer · 2021-08-09T19:52:26Z

Thanks for the reply, and I'm happy to share my use case, although it may be a little half-baked at this point.

In short, I created a custom structlog processor (specifically, a renderer to use as the last processor) in order to log the HTTP request data for a Flask app. The data comes in as bytes (flask.request.data), so, for performance reasons, my custom processor avoids decoding flask.request.data to string. Since orjson won't serialize a dictionary with bytes as a value, the processor can't simply add the request data to the EventDict.

Here's what the processor looks like, with the hard-coded assumption that flask.request.data is valid JSON:

import flask
import orjson
from structlog.types import EventDict

def _render_request(_, __, event_dict: EventDict) -> bytes:
    """Structlog processor for injecting the Flask request.

    The key will be "request_data"
    """
    bytes_ = orjson.dumps(event_dict)
    # Create bytes that includes the request data.
    return (
        # Everything but the last character (assumed to be b"}")
        bytes_[0:-1]
        # Call the field "request_data"
        + b',"request_data":'
        # Add in the data.
        + flask.request.data
        # Close it out.
        + b"}"
    )

Simple enough, and nice and hard-coded 😉.

I was initially thinking (but have since proved myself wrong) that using a bytearray might prevent creating a copy of the request data since bytearrays are mutable, thus save memory compared to the concatenation using the + operator above. Like I said, I was mistaken, and one does not actually save memory by re-implementing _render_request above using bytearray instead of plain old bytes. I suppose technically you could gain some performance in the actual concatenation, but I doubt it's any sort of meaningful gain.

hynek · 2021-08-28T13:56:19Z

OK, now I'm confused. 😅 Did you come to the conclusion that bytearrays are helpful or not? :D

hynek · 2021-08-29T08:01:07Z

Ah whatever, does #350 fix your needs?

blthayer · 2021-08-31T22:52:48Z

It does, thank you! 😄

hynek mentioned this issue Aug 29, 2021

Support bytearrays as last return value #350

Merged

hynek closed this as completed in #350 Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bytearray for final processor output #344

Support bytearray for final processor output #344

blthayer commented Aug 6, 2021 •

edited

Loading

hynek commented Aug 8, 2021

blthayer commented Aug 9, 2021 •

edited

Loading

hynek commented Aug 28, 2021

hynek commented Aug 29, 2021

blthayer commented Aug 31, 2021

Support bytearray for final processor output #344

Support bytearray for final processor output #344

Comments

blthayer commented Aug 6, 2021 • edited Loading

hynek commented Aug 8, 2021

blthayer commented Aug 9, 2021 • edited Loading

hynek commented Aug 28, 2021

hynek commented Aug 29, 2021

blthayer commented Aug 31, 2021

blthayer commented Aug 6, 2021 •

edited

Loading

blthayer commented Aug 9, 2021 •

edited

Loading