Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top-level byte array isn't destructured, but other byte sequences are #1954

Open
cmeeren opened this issue Sep 13, 2023 · 9 comments
Open

Top-level byte array isn't destructured, but other byte sequences are #1954

cmeeren opened this issue Sep 13, 2023 · 9 comments

Comments

@cmeeren
Copy link

cmeeren commented Sep 13, 2023

Repro:

open System.Collections.Generic
open Serilog
open Serilog.Core

type TestDestructuringPolicy() =
    interface IDestructuringPolicy with
        member this.TryDestructure(_value, propertyValueFactory, result) =
            result <- propertyValueFactory.CreatePropertyValue("TEST")
            true

Log.Logger <-
    LoggerConfiguration()
        .Destructure.With(TestDestructuringPolicy())
        .WriteTo.Console()
        .CreateLogger()

Log.Information("Byte list: {@Data}", List([ 1uy ]))
Log.Information("Byte array: {@Data}", [| 1uy |])

Log.CloseAndFlush()

Expected output:

[10:49:50 INF] Byte list: TEST
[10:49:50 INF] Byte array: TEST

Actual output:

[10:49:50 INF] Byte list: TEST
[10:49:50 INF] Byte array: 01

Top-level byte arrays aren't passed to my destructuring policy. This seems inconsistent. Could this inconsistency be fixed, so that byte arrays are passed to the destructuring policy?

@cmeeren cmeeren added the bug label Sep 13, 2023
@nblumhardt
Copy link
Member

I think your second statement:

Log.Information("Byte array: {@Data}", [| 1uy |])

isn't calling the overload of Information() that you're expecting - its first arg is a params array.

Let me know if this helps :-)

@cmeeren
Copy link
Author

cmeeren commented Sep 18, 2023

No, that's not it. Here is a modification to the repro that uses multiple values, thereby avoiding that ambiguity. The output is the same as the original example.

Log.Information("Byte {Kind}: {@Data}", "list", List([ 1uy ]))
Log.Information("Byte {Kind}: {@Data}", "array", [| 1uy |])

@nblumhardt
Copy link
Member

🤔 hmmmmmm... I think then it will be that byte arrays are processed as scalars (converted to hex strings):

https://github.com/serilog/serilog/blob/dev/src/Serilog/Capturing/PropertyValueConverter.cs#L68

but other enumerable types (FSharp.Core.List?) don't get that special treatment and pass through:

https://github.com/serilog/serilog/blob/dev/src/Serilog/Capturing/PropertyValueConverter.cs#L174

I don't think it's likely to be a bug per se, just the outcome of different types being processed differently - not sure whether/where we'd make any changes to enable a different approach, but if you're able to share some more info about your scenario/preferences it'd be good data to have. Thanks!

@cmeeren
Copy link
Author

cmeeren commented Sep 18, 2023

FSharp.Core.List?

The repro uses System.Collections.Generic.List (typically aliased as ResizeArray in F#), though that's irrelevant. The only requirement is that it is not an array.

if you're able to share some more info about your scenario/preferences it'd be good data to have.

Certainly!

Destructuring per se doesn't help me, so I don't use that. In fact, it would actively hurt me, at least in the first case below.

  • A lot of my logging goes to Application Insights, and I really don't want to duplicate big objects between log messages and properties (which significantly increases ingestion costs). In fact, I have a custom wrapper sink for Application Insights that (while preserving the message template itself) replaces the tokens in the message with the rendered values, thereby only sending the final rendered message and the message template to AI, and not any of the message tokens as separate properties.
  • A lot of my other logging is done on edge/IoT devices where I log to syslog/logcat but (for data usage reasons) do not transfer the logs out of the device at all (except when there's a problem and I sign into the device to check the logs). Destructuring is pointless here; only the rendered message is shown.

In both of these cases, the only thing that matter to me is the log message. I therefore simply want the rendered log message to be as nicely formatted as possible.

To achieve this, I am serializing most values using custom settings, so they are easy to read. (I used JSON first, but recently switched to YAML because it's even easier to read.)

I use this not only for complex objects, but also single-case discriminated unions ("wrapped primitives"), which are unwrapped using my serialization settings. In general, no matter what I serialize using my "log JSON setting", the output is either better or the same, never worse.

Of course, relative to other performance costs of logging, serialization is a heavy hitter. I used to serialize synchronously and insert the string directly into the log message (Log.Information("Foo {Bar}", logSerialize myObject)), but this is wasteful if the log level is not enabled (and in many cases, I logged complex objects at a normally disabled log level). I found out that I could make use of a custom destructuring policy and instead use the destructure operator @, which meant that the objects would not be serialized if the log level was not enabled. Also, it's less verbose. So (Log.Information("Foo {@Bar}", myObject) would be equivalent to the previous statement, but only serialize if the log level was enabled.

I have not found any other way to accomplish 1) nice custom-formatted complex objects in log messages, and 2) not serializing for disabled log levels.

In terms of this issue, the point is that this destructuring-based approach works wonderfully for everything except byte arrays. Specifically, I'd like to represent them with spaces between the bytes, like FA 2E C9, because that makes it easier for me to know "what is byte 5", and easier to paste into other tools we use that accept this format. For byte lists and other byte sequences, it works fine, but the different handling of byte arrays means it didn't work there. This was surprising to me, because there is not just one accepted way to render byte arrays.

@nblumhardt
Copy link
Member

Thanks for taking the time to write that! 👍

I've never been entirely comfortable with our opinionated byte array formatting - the current system came about though because treating byte arrays as lists of bytes was ...... problematic to say the least 😅 ... so something (useful) needed to be done.

Unfortunately, byte arrays are something of an anomaly; most/all "scalar" types Serilog cares about are structs/value objects, which have copy semantics and so can be passed as-is through to the asynchronous side of the logging pipeline, or in the case of string, are immutable. Because of this there isn't a great extension point for dealing with byte arrays.

In the past we've considered making IScalarConversionPolicy a public/pluggable extension point, along the same lines as IDestructuringPolicy, which would potentially sort out your scenario, but it's a nontrival bit of infrastructure to design and maintain without a broadly-relevant set of use cases. This could be something to explore more deeply but probably requires quite a lot of effort.

Depending on how much control you have on the application side, constructing a proxy type such as:

record struct ByteArrayProxy(byte[] Bytes);

Log.Information("Some {Data}", new ByteArrayProxy(myBytes));

and then targeting it with an IDestructuringPolicy would be a possible workaround. (Forgive the C# syntax :-))

Hope this helps.

@cmeeren
Copy link
Author

cmeeren commented Sep 18, 2023

Thanks for the clarification!

I am indeed wrapping the byte array as a workaround (in the Some case of F#'s option type, which using my serialization settings is unwrapped).

In any case, what I really want is to be able to control how Serilog formats values of arbitrary types. The fact that I'm using destructuring is just because that's the only way I found. If it would be possible to add a way to let users specify formatting of arbitrary types (e.g., pass obj and let the user decide how to transform/stringify it), then that would solve my problem in a less hacky way, would allow me to use destructuring for actual destructuring (if I ever need that), and would also mean I don't have to remember to use @ everywhere (easy to forget, leading to bad output, and I don't see that before I actually see an example of the relevant log message, which is in some ways the worst kind of error).

@nblumhardt nblumhardt added discussion and removed bug labels Sep 18, 2023
@nblumhardt
Copy link
Member

The reason to use destructuring or enrichment is that Serilog does capturing on the foreground thread (at the point the event is logged), but formatting on a (potentially) background thread. By the time the data reaches the formatting stage it's already been snapshotted; this is how mutable things like arrays end up logged with the right contents (and race conditions/threading bugs are avoided). HTH!

@cmeeren
Copy link
Author

cmeeren commented Sep 19, 2023

Thanks, though I'm not sure what you intend for me to do with that information?

@nblumhardt
Copy link
Member

Just filling in the picture, unfortunately it doesn't get us closer to a solution, just narrows down the possible places to investigate :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants