Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bytearray for final processor output #344

Closed
blthayer opened this issue Aug 6, 2021 · 5 comments · Fixed by #350
Closed

Support bytearray for final processor output #344

blthayer opened this issue Aug 6, 2021 · 5 comments · Fixed by #350

Comments

@blthayer
Copy link

blthayer commented Aug 6, 2021

Use case: For performance in concatenating bytes, one might want their final processor to output a bytearray instead of bytes. This blog post illustrates that using a bytearray is the most performant way to concatenate bytes in Python (with all the caveats of sharing results from a blog post that haven't been verified locally, and the experiment can't exactly be considered comprehensive or rigorous).

Use of bytearray output is currently unsupported, as can be seen in _base.py:

...
        if isinstance(event_dict, (str, bytes)):
            return (event_dict,), {}
        elif isinstance(event_dict, tuple):
            # In this case we assume that the last processor returned a tuple
            # of ``(args, kwargs)`` and pass it right through.
            return event_dict  # type: ignore
        elif isinstance(event_dict, dict):
            return (), event_dict
        else:
            raise ValueError(
                "Last processor didn't return an appropriate value.  Valid "
                "return values are a dict, a tuple of (args, kwargs), bytes, "
                "or a str."
            )
...

I'll be honest in that I'm not an expert in byte handling in Python, but it seems like bytearray quacks like bytes according to the docs:

In a very crude and naive local test, I modified this line in structlog from:

if isinstance(event_dict, (str, bytes)):

to:

if isinstance(event_dict, (str, bytes, bytearray)):

and my stdout bytes logging still worked (configured identically to this example in the Performance section of the structlog docs).

@hynek
Copy link
Owner

hynek commented Aug 8, 2021

Yes, the change should be as simple as you say. May I ask how you'd intend to use it? I've added bytes specifically to support orjson's performance gains. Is there something out there spewing out bytearrays? 🤔

@blthayer
Copy link
Author

blthayer commented Aug 9, 2021

Thanks for the reply, and I'm happy to share my use case, although it may be a little half-baked at this point.

In short, I created a custom structlog processor (specifically, a renderer to use as the last processor) in order to log the HTTP request data for a Flask app. The data comes in as bytes (flask.request.data), so, for performance reasons, my custom processor avoids decoding flask.request.data to string. Since orjson won't serialize a dictionary with bytes as a value, the processor can't simply add the request data to the EventDict.

Here's what the processor looks like, with the hard-coded assumption that flask.request.data is valid JSON:

import flask
import orjson
from structlog.types import EventDict

def _render_request(_, __, event_dict: EventDict) -> bytes:
    """Structlog processor for injecting the Flask request.

    The key will be "request_data"
    """
    bytes_ = orjson.dumps(event_dict)
    # Create bytes that includes the request data.
    return (
        # Everything but the last character (assumed to be b"}")
        bytes_[0:-1]
        # Call the field "request_data"
        + b',"request_data":'
        # Add in the data.
        + flask.request.data
        # Close it out.
        + b"}"
    )

Simple enough, and nice and hard-coded 😉.

I was initially thinking (but have since proved myself wrong) that using a bytearray might prevent creating a copy of the request data since bytearrays are mutable, thus save memory compared to the concatenation using the + operator above. Like I said, I was mistaken, and one does not actually save memory by re-implementing _render_request above using bytearray instead of plain old bytes. I suppose technically you could gain some performance in the actual concatenation, but I doubt it's any sort of meaningful gain.

@hynek
Copy link
Owner

hynek commented Aug 28, 2021

OK, now I'm confused. 😅 Did you come to the conclusion that bytearrays are helpful or not? :D

@hynek
Copy link
Owner

hynek commented Aug 29, 2021

Ah whatever, does #350 fix your needs?

@blthayer
Copy link
Author

It does, thank you! 😄

@hynek hynek closed this as completed in #350 Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants