Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support indented JSON formatting in C# #9391

Merged

Conversation

boukeversteegh
Copy link
Contributor

@boukeversteegh boukeversteegh commented Jan 8, 2022

This adds indented formatting (pretty printing) support for c#.

Notes

  • Added a method JsonFormatter.Settings.WithIndentation(string indentation = " ").
  • Added overloads to JsonFormatter.Format that specify the initial indentation level, for embedding the json inside a nested json string.
  • 100% compatible with previous implementation. No difference in output unless WithIndentation is called.
  • Supports WellKnownTypes.

Usage

var formatter = new JsonFormatter(JsonFormatter.Settings.Default.WithIndentation())
var json = formatter.Format(new SomeTestMessage() {
  Field1 = "value",
  Field2 = "value2",
});

Output:

{
  "field1": "value",
  "field2": "value2"
}

Feedback welcome!

@boukeversteegh
Copy link
Contributor Author

boukeversteegh commented Jan 11, 2022

Tests failed on Windows build server, due to string constants containing New Lines, which are checked out as CRLF when using git autocrlf=true (the default on windows afaik).

Pushed a fix. The workflow may be run again, if it's ok @elharo.

@boukeversteegh
Copy link
Contributor Author

I've run a benchmark to measure performance differences compared to the previous implementation:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=6.0.101
  [Host]     : .NET Core 5.0.9 (CoreCLR 5.0.921.35908, CoreFX 5.0.921.35908), X64 RyuJIT
  DefaultJob : .NET Core 5.0.9 (CoreCLR 5.0.921.35908, CoreFX 5.0.921.35908), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
JsonFormatter (previous) 11.32 us
+0%
0.078 us 0.073 us 0.5341 - - 8.85 KB
JsonFormatter (new) 11.92 us
+5.3%
0.083 us 0.078 us 0.5341 - - 8.85 KB
JsonFormatter (new) WithIndentation 11.57 us
+2.2%
0.065 us 0.058 us 0.5493 - - 9.17 KB
Benchmark code

I copied the previous implementation into the project and renamed the class JsonFormatterBefore

JsonFormatterBefore
#region Copyright notice and license
// Protocol Buffers - Google's data interchange format
// Copyright 2015 Google Inc.  All rights reserved.
// https://developers.google.com/protocol-buffers/
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
//     * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//     * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
//     * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#endregion

using System;
using System.Collections;
using System.Globalization;
using System.Text;
using Google.Protobuf.Reflection;
using Google.Protobuf.WellKnownTypes;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Reflection;
using System.Diagnostics.CodeAnalysis;

namespace Google.Protobuf
{
    /// <summary>
    /// Reflection-based converter from messages to JSON.
    /// </summary>
    /// <remarks>
    /// <para>
    /// Instances of this class are thread-safe, with no mutable state.
    /// </para>
    /// <para>
    /// This is a simple start to get JSON formatting working. As it's reflection-based,
    /// it's not as quick as baking calls into generated messages - but is a simpler implementation.
    /// (This code is generally not heavily optimized.)
    /// </para>
    /// </remarks>
    public sealed class JsonFormatterBefore
    {
        internal const string AnyTypeUrlField = "@type";
        internal const string AnyDiagnosticValueField = "@value";
        internal const string AnyWellKnownTypeValueField = "value";
        private const string TypeUrlPrefix = "type.googleapis.com";
        private const string NameValueSeparator = ": ";
        private const string PropertySeparator = ", ";

        /// <summary>
        /// Returns a formatter using the default settings.
        /// </summary>
        public static JsonFormatterBefore Default { get; } = new JsonFormatterBefore(Settings.Default);

        // A JSON formatter which *only* exists
        private static readonly JsonFormatterBefore diagnosticFormatter = new JsonFormatterBefore(Settings.Default);

        /// <summary>
        /// The JSON representation of the first 160 characters of Unicode.
        /// Empty strings are replaced by the static constructor.
        /// </summary>
        private static readonly string[] CommonRepresentations = {
            // C0 (ASCII and derivatives) control characters
            "\\u0000", "\\u0001", "\\u0002", "\\u0003",  // 0x00
          "\\u0004", "\\u0005", "\\u0006", "\\u0007",
          "\\b",     "\\t",     "\\n",     "\\u000b",
          "\\f",     "\\r",     "\\u000e", "\\u000f",
          "\\u0010", "\\u0011", "\\u0012", "\\u0013",  // 0x10
          "\\u0014", "\\u0015", "\\u0016", "\\u0017",
          "\\u0018", "\\u0019", "\\u001a", "\\u001b",
          "\\u001c", "\\u001d", "\\u001e", "\\u001f",
            // Escaping of " and \ are required by www.json.org string definition.
            // Escaping of < and > are required for HTML security.
            "", "", "\\\"", "", "",        "", "",        "",  // 0x20
          "", "", "",     "", "",        "", "",        "",
          "", "", "",     "", "",        "", "",        "",  // 0x30
          "", "", "",     "", "\\u003c", "", "\\u003e", "",
          "", "", "",     "", "",        "", "",        "",  // 0x40
          "", "", "",     "", "",        "", "",        "",
          "", "", "",     "", "",        "", "",        "",  // 0x50
          "", "", "",     "", "\\\\",    "", "",        "",
          "", "", "",     "", "",        "", "",        "",  // 0x60
          "", "", "",     "", "",        "", "",        "",
          "", "", "",     "", "",        "", "",        "",  // 0x70
          "", "", "",     "", "",        "", "",        "\\u007f",
            // C1 (ISO 8859 and Unicode) extended control characters
            "\\u0080", "\\u0081", "\\u0082", "\\u0083",  // 0x80
          "\\u0084", "\\u0085", "\\u0086", "\\u0087",
          "\\u0088", "\\u0089", "\\u008a", "\\u008b",
          "\\u008c", "\\u008d", "\\u008e", "\\u008f",
          "\\u0090", "\\u0091", "\\u0092", "\\u0093",  // 0x90
          "\\u0094", "\\u0095", "\\u0096", "\\u0097",
          "\\u0098", "\\u0099", "\\u009a", "\\u009b",
          "\\u009c", "\\u009d", "\\u009e", "\\u009f"
        };

        static JsonFormatterBefore()
        {
            for (int i = 0; i < CommonRepresentations.Length; i++)
            {
                if (CommonRepresentations[i] == "")
                {
                    CommonRepresentations[i] = ((char) i).ToString();
                }
            }
        }

        private readonly Settings settings;

        private bool DiagnosticOnly => ReferenceEquals(this, diagnosticFormatter);

        /// <summary>
        /// Creates a new formatted with the given settings.
        /// </summary>
        /// <param name="settings">The settings.</param>
        public JsonFormatterBefore(Settings settings)
        {
            this.settings = ProtoPreconditions.CheckNotNull(settings, nameof(settings));
        }

        /// <summary>
        /// Formats the specified message as JSON.
        /// </summary>
        /// <param name="message">The message to format.</param>
        /// <returns>The formatted message.</returns>
        public string Format(IMessage message)
        {
            var writer = new StringWriter();
            Format(message, writer);
            return writer.ToString();
        }

        /// <summary>
        /// Formats the specified message as JSON.
        /// </summary>
        /// <param name="message">The message to format.</param>
        /// <param name="writer">The TextWriter to write the formatted message to.</param>
        /// <returns>The formatted message.</returns>
        public void Format(IMessage message, TextWriter writer)
        {
            ProtoPreconditions.CheckNotNull(message, nameof(message));
            ProtoPreconditions.CheckNotNull(writer, nameof(writer));

            if (message.Descriptor.IsWellKnownType)
            {
                WriteWellKnownTypeValue(writer, message.Descriptor, message);
            }
            else
            {
                WriteMessage(writer, message);
            }
        }

        /// <summary>
        /// Converts a message to JSON for diagnostic purposes with no extra context.
        /// </summary>
        /// <remarks>
        /// <para>
        /// This differs from calling <see cref="Format(IMessage)"/> on the default JSON
        /// formatter in its handling of <see cref="Any"/>. As no type registry is available
        /// in <see cref="object.ToString"/> calls, the normal way of resolving the type of
        /// an <c>Any</c> message cannot be applied. Instead, a JSON property named <c>@value</c>
        /// is included with the base64 data from the <see cref="Any.Value"/> property of the message.
        /// </para>
        /// <para>The value returned by this method is only designed to be used for diagnostic
        /// purposes. It may not be parsable by <see cref="JsonParser"/>, and may not be parsable
        /// by other Protocol Buffer implementations.</para>
        /// </remarks>
        /// <param name="message">The message to format for diagnostic purposes.</param>
        /// <returns>The diagnostic-only JSON representation of the message</returns>
        public static string ToDiagnosticString(IMessage message)
        {
            ProtoPreconditions.CheckNotNull(message, nameof(message));
            return diagnosticFormatter.Format(message);
        }

        private void WriteMessage(TextWriter writer, IMessage message)
        {
            if (message == null)
            {
                WriteNull(writer);
                return;
            }
            if (DiagnosticOnly)
            {
                ICustomDiagnosticMessage customDiagnosticMessage = message as ICustomDiagnosticMessage;
                if (customDiagnosticMessage != null)
                {
                    writer.Write(customDiagnosticMessage.ToDiagnosticString());
                    return;
                }
            }
            writer.Write("{ ");
            bool writtenFields = WriteMessageFields(writer, message, false);
            writer.Write(writtenFields ? " }" : "}");
        }

        private bool WriteMessageFields(TextWriter writer, IMessage message, bool assumeFirstFieldWritten)
        {
            var fields = message.Descriptor.Fields;
            bool first = !assumeFirstFieldWritten;
            // First non-oneof fields
            foreach (var field in fields.InFieldNumberOrder())
            {
                var accessor = field.Accessor;
                var value = accessor.GetValue(message);
                if (!ShouldFormatFieldValue(message, field, value))
                {
                    continue;
                }

                if (!first)
                {
                    writer.Write(PropertySeparator);
                }

                WriteString(writer, accessor.Descriptor.JsonName);
                writer.Write(NameValueSeparator);
                WriteValue(writer, value);

                first = false;
            }
            return !first;
        }

        /// <summary>
        /// Determines whether or not a field value should be serialized according to the field,
        /// its value in the message, and the settings of this formatter.
        /// </summary>
        private bool ShouldFormatFieldValue(IMessage message, FieldDescriptor field, object value) =>
            field.HasPresence
            // Fields that support presence *just* use that
            ? field.Accessor.HasValue(message)
            // Otherwise, format if either we've been asked to format default values, or if it's
            // not a default value anyway.
            : settings.FormatDefaultValues || !IsDefaultValue(field, value);

        // Converted from java/core/src/main/java/com/google/protobuf/Descriptors.java
        internal static string ToJsonName(string name)
        {
            StringBuilder result = new StringBuilder(name.Length);
            bool isNextUpperCase = false;
            foreach (char ch in name)
            {
                if (ch == '_')
                {
                    isNextUpperCase = true;
                }
                else if (isNextUpperCase)
                {
                    result.Append(char.ToUpperInvariant(ch));
                    isNextUpperCase = false;
                }
                else
                {
                    result.Append(ch);
                }
            }
            return result.ToString();
        }

        internal static string FromJsonName(string name)
        {
            StringBuilder result = new StringBuilder(name.Length);
            foreach (char ch in name)
            {
                if (char.IsUpper(ch))
                {
                    result.Append('_');
                    result.Append(char.ToLowerInvariant(ch));
                }
                else
                {
                    result.Append(ch);
                }
            }
            return result.ToString();
        }

        private static void WriteNull(TextWriter writer)
        {
            writer.Write("null");
        }

        private static bool IsDefaultValue(FieldDescriptor descriptor, object value)
        {
            if (descriptor.IsMap)
            {
                IDictionary dictionary = (IDictionary) value;
                return dictionary.Count == 0;
            }
            if (descriptor.IsRepeated)
            {
                IList list = (IList) value;
                return list.Count == 0;
            }
            switch (descriptor.FieldType)
            {
                case FieldType.Bool:
                    return (bool) value == false;
                case FieldType.Bytes:
                    return (ByteString) value == ByteString.Empty;
                case FieldType.String:
                    return (string) value == "";
                case FieldType.Double:
                    return (double) value == 0.0;
                case FieldType.SInt32:
                case FieldType.Int32:
                case FieldType.SFixed32:
                case FieldType.Enum:
                    return (int) value == 0;
                case FieldType.Fixed32:
                case FieldType.UInt32:
                    return (uint) value == 0;
                case FieldType.Fixed64:
                case FieldType.UInt64:
                    return (ulong) value == 0;
                case FieldType.SFixed64:
                case FieldType.Int64:
                case FieldType.SInt64:
                    return (long) value == 0;
                case FieldType.Float:
                    return (float) value == 0f;
                case FieldType.Message:
                case FieldType.Group: // Never expect to get this, but...
                    return value == null;
                default:
                    throw new ArgumentException("Invalid field type");
            }
        }

        /// <summary>
        /// Writes a single value to the given writer as JSON. Only types understood by
        /// Protocol Buffers can be written in this way. This method is only exposed for
        /// advanced use cases; most users should be using <see cref="Format(IMessage)"/>
        /// or <see cref="Format(IMessage, TextWriter)"/>.
        /// </summary>
        /// <param name="writer">The writer to write the value to. Must not be null.</param>
        /// <param name="value">The value to write. May be null.</param>
        public void WriteValue(TextWriter writer, object value)
        {
            if (value == null || value is NullValue)
            {
                WriteNull(writer);
            }
            else if (value is bool)
            {
                writer.Write((bool)value ? "true" : "false");
            }
            else if (value is ByteString)
            {
                // Nothing in Base64 needs escaping
                writer.Write('"');
                writer.Write(((ByteString)value).ToBase64());
                writer.Write('"');
            }
            else if (value is string)
            {
                WriteString(writer, (string)value);
            }
            else if (value is IDictionary)
            {
                WriteDictionary(writer, (IDictionary)value);
            }
            else if (value is IList)
            {
                WriteList(writer, (IList)value);
            }
            else if (value is int || value is uint)
            {
                IFormattable formattable = (IFormattable) value;
                writer.Write(formattable.ToString("d", CultureInfo.InvariantCulture));
            }
            else if (value is long || value is ulong)
            {
                writer.Write('"');
                IFormattable formattable = (IFormattable) value;
                writer.Write(formattable.ToString("d", CultureInfo.InvariantCulture));
                writer.Write('"');
            }
            else if (value is System.Enum)
            {
                if (settings.FormatEnumsAsIntegers)
                {
                    WriteValue(writer, (int)value);
                }
                else
                {
                    string name = OriginalEnumValueHelper.GetOriginalName(value);
                    if (name != null)
                    {
                        WriteString(writer, name);
                    }
                    else
                    {
                        WriteValue(writer, (int)value);
                    }
                }
            }
            else if (value is float || value is double)
            {
                string text = ((IFormattable) value).ToString("r", CultureInfo.InvariantCulture);
                if (text == "NaN" || text == "Infinity" || text == "-Infinity")
                {
                    writer.Write('"');
                    writer.Write(text);
                    writer.Write('"');
                }
                else
                {
                    writer.Write(text);
                }
            }
            else if (value is IMessage)
            {
                Format((IMessage)value, writer);
            }
            else
            {
                throw new ArgumentException("Unable to format value of type " + value.GetType());
            }
        }

        /// <summary>
        /// Central interception point for well-known type formatting. Any well-known types which
        /// don't need special handling can fall back to WriteMessage. We avoid assuming that the
        /// values are using the embedded well-known types, in order to allow for dynamic messages
        /// in the future.
        /// </summary>
        private void WriteWellKnownTypeValue(TextWriter writer, MessageDescriptor descriptor, object value)
        {
            // Currently, we can never actually get here, because null values are always handled by the caller. But if we *could*,
            // this would do the right thing.
            if (value == null)
            {
                WriteNull(writer);
                return;
            }
            // For wrapper types, the value will either be the (possibly boxed) "native" value,
            // or the message itself if we're formatting it at the top level (e.g. just calling ToString on the object itself).
            // If it's the message form, we can extract the value first, which *will* be the (possibly boxed) native value,
            // and then proceed, writing it as if we were definitely in a field. (We never need to wrap it in an extra string...
            // WriteValue will do the right thing.)
            if (descriptor.IsWrapperType)
            {
                if (value is IMessage)
                {
                    var message = (IMessage) value;
                    value = message.Descriptor.Fields[WrappersReflection.WrapperValueFieldNumber].Accessor.GetValue(message);
                }
                WriteValue(writer, value);
                return;
            }
            if (descriptor.FullName == Timestamp.Descriptor.FullName)
            {
                WriteTimestamp(writer, (IMessage)value);
                return;
            }
            if (descriptor.FullName == Duration.Descriptor.FullName)
            {
                WriteDuration(writer, (IMessage)value);
                return;
            }
            if (descriptor.FullName == FieldMask.Descriptor.FullName)
            {
                WriteFieldMask(writer, (IMessage)value);
                return;
            }
            if (descriptor.FullName == Struct.Descriptor.FullName)
            {
                WriteStruct(writer, (IMessage)value);
                return;
            }
            if (descriptor.FullName == ListValue.Descriptor.FullName)
            {
                var fieldAccessor = descriptor.Fields[ListValue.ValuesFieldNumber].Accessor;
                WriteList(writer, (IList)fieldAccessor.GetValue((IMessage)value));
                return;
            }
            if (descriptor.FullName == Value.Descriptor.FullName)
            {
                WriteStructFieldValue(writer, (IMessage)value);
                return;
            }
            if (descriptor.FullName == Any.Descriptor.FullName)
            {
                WriteAny(writer, (IMessage)value);
                return;
            }
            WriteMessage(writer, (IMessage)value);
        }

        private void WriteTimestamp(TextWriter writer, IMessage value)
        {
            // TODO: In the common case where this *is* using the built-in Timestamp type, we could
            // avoid all the reflection at this point, by casting to Timestamp. In the interests of
            // avoiding subtle bugs, don't do that until we've implemented DynamicMessage so that we can prove
            // it still works in that case.
            int nanos = (int) value.Descriptor.Fields[Timestamp.NanosFieldNumber].Accessor.GetValue(value);
            long seconds = (long) value.Descriptor.Fields[Timestamp.SecondsFieldNumber].Accessor.GetValue(value);
            writer.Write(Timestamp.ToJson(seconds, nanos, DiagnosticOnly));
        }

        private void WriteDuration(TextWriter writer, IMessage value)
        {
            // TODO: Same as for WriteTimestamp
            int nanos = (int) value.Descriptor.Fields[Duration.NanosFieldNumber].Accessor.GetValue(value);
            long seconds = (long) value.Descriptor.Fields[Duration.SecondsFieldNumber].Accessor.GetValue(value);
            writer.Write(Duration.ToJson(seconds, nanos, DiagnosticOnly));
        }

        private void WriteFieldMask(TextWriter writer, IMessage value)
        {
            var paths = (IList<string>) value.Descriptor.Fields[FieldMask.PathsFieldNumber].Accessor.GetValue(value);
            writer.Write(FieldMask.ToJson(paths, DiagnosticOnly));
        }

        private void WriteAny(TextWriter writer, IMessage value)
        {
            if (DiagnosticOnly)
            {
                WriteDiagnosticOnlyAny(writer, value);
                return;
            }

            string typeUrl = (string) value.Descriptor.Fields[Any.TypeUrlFieldNumber].Accessor.GetValue(value);
            ByteString data = (ByteString) value.Descriptor.Fields[Any.ValueFieldNumber].Accessor.GetValue(value);
            string typeName = Any.GetTypeName(typeUrl);
            MessageDescriptor descriptor = settings.TypeRegistry.Find(typeName);
            if (descriptor == null)
            {
                throw new InvalidOperationException($"Type registry has no descriptor for type name '{typeName}'");
            }
            IMessage message = descriptor.Parser.ParseFrom(data);
            writer.Write("{ ");
            WriteString(writer, AnyTypeUrlField);
            writer.Write(NameValueSeparator);
            WriteString(writer, typeUrl);

            if (descriptor.IsWellKnownType)
            {
                writer.Write(PropertySeparator);
                WriteString(writer, AnyWellKnownTypeValueField);
                writer.Write(NameValueSeparator);
                WriteWellKnownTypeValue(writer, descriptor, message);
            }
            else
            {
                WriteMessageFields(writer, message, true);
            }
            writer.Write(" }");
        }

        private void WriteDiagnosticOnlyAny(TextWriter writer, IMessage value)
        {
            string typeUrl = (string) value.Descriptor.Fields[Any.TypeUrlFieldNumber].Accessor.GetValue(value);
            ByteString data = (ByteString) value.Descriptor.Fields[Any.ValueFieldNumber].Accessor.GetValue(value);
            writer.Write("{ ");
            WriteString(writer, AnyTypeUrlField);
            writer.Write(NameValueSeparator);
            WriteString(writer, typeUrl);
            writer.Write(PropertySeparator);
            WriteString(writer, AnyDiagnosticValueField);
            writer.Write(NameValueSeparator);
            writer.Write('"');
            writer.Write(data.ToBase64());
            writer.Write('"');
            writer.Write(" }");
        }

        private void WriteStruct(TextWriter writer, IMessage message)
        {
            writer.Write("{ ");
            IDictionary fields = (IDictionary) message.Descriptor.Fields[Struct.FieldsFieldNumber].Accessor.GetValue(message);
            bool first = true;
            foreach (DictionaryEntry entry in fields)
            {
                string key = (string) entry.Key;
                IMessage value = (IMessage) entry.Value;
                if (string.IsNullOrEmpty(key) || value == null)
                {
                    throw new InvalidOperationException("Struct fields cannot have an empty key or a null value.");
                }

                if (!first)
                {
                    writer.Write(PropertySeparator);
                }
                WriteString(writer, key);
                writer.Write(NameValueSeparator);
                WriteStructFieldValue(writer, value);
                first = false;
            }
            writer.Write(first ? "}" : " }");
        }

        private void WriteStructFieldValue(TextWriter writer, IMessage message)
        {
            var specifiedField = message.Descriptor.Oneofs[0].Accessor.GetCaseFieldDescriptor(message);
            if (specifiedField == null)
            {
                throw new InvalidOperationException("Value message must contain a value for the oneof.");
            }

            object value = specifiedField.Accessor.GetValue(message);

            switch (specifiedField.FieldNumber)
            {
                case Value.BoolValueFieldNumber:
                case Value.StringValueFieldNumber:
                case Value.NumberValueFieldNumber:
                    WriteValue(writer, value);
                    return;
                case Value.StructValueFieldNumber:
                case Value.ListValueFieldNumber:
                    // Structs and ListValues are nested messages, and already well-known types.
                    var nestedMessage = (IMessage) specifiedField.Accessor.GetValue(message);
                    WriteWellKnownTypeValue(writer, nestedMessage.Descriptor, nestedMessage);
                    return;
                case Value.NullValueFieldNumber:
                    WriteNull(writer);
                    return;
                default:
                    throw new InvalidOperationException("Unexpected case in struct field: " + specifiedField.FieldNumber);
            }
        }

        internal void WriteList(TextWriter writer, IList list)
        {
            writer.Write("[ ");
            bool first = true;
            foreach (var value in list)
            {
                if (!first)
                {
                    writer.Write(PropertySeparator);
                }
                WriteValue(writer, value);
                first = false;
            }
            writer.Write(first ? "]" : " ]");
        }

        internal void WriteDictionary(TextWriter writer, IDictionary dictionary)
        {
            writer.Write("{ ");
            bool first = true;
            // This will box each pair. Could use IDictionaryEnumerator, but that's ugly in terms of disposal.
            foreach (DictionaryEntry pair in dictionary)
            {
                if (!first)
                {
                    writer.Write(PropertySeparator);
                }
                string keyText;
                if (pair.Key is string)
                {
                    keyText = (string) pair.Key;
                }
                else if (pair.Key is bool)
                {
                    keyText = (bool) pair.Key ? "true" : "false";
                }
                else if (pair.Key is int || pair.Key is uint | pair.Key is long || pair.Key is ulong)
                {
                    keyText = ((IFormattable) pair.Key).ToString("d", CultureInfo.InvariantCulture);
                }
                else
                {
                    if (pair.Key == null)
                    {
                        throw new ArgumentException("Dictionary has entry with null key");
                    }
                    throw new ArgumentException("Unhandled dictionary key type: " + pair.Key.GetType());
                }
                WriteString(writer, keyText);
                writer.Write(NameValueSeparator);
                WriteValue(writer, pair.Value);
                first = false;
            }
            writer.Write(first ? "}" : " }");
        }

        /// <summary>
        /// Writes a string (including leading and trailing double quotes) to a builder, escaping as required.
        /// </summary>
        /// <remarks>
        /// Other than surrogate pair handling, this code is mostly taken from src/google/protobuf/util/internal/json_escaping.cc.
        /// </remarks>
        internal static void WriteString(TextWriter writer, string text)
        {
            writer.Write('"');
            for (int i = 0; i < text.Length; i++)
            {
                char c = text[i];
                if (c < 0xa0)
                {
                    writer.Write(CommonRepresentations[c]);
                    continue;
                }
                if (char.IsHighSurrogate(c))
                {
                    // Encountered first part of a surrogate pair.
                    // Check that we have the whole pair, and encode both parts as hex.
                    i++;
                    if (i == text.Length || !char.IsLowSurrogate(text[i]))
                    {
                        throw new ArgumentException("String contains low surrogate not followed by high surrogate");
                    }
                    HexEncodeUtf16CodeUnit(writer, c);
                    HexEncodeUtf16CodeUnit(writer, text[i]);
                    continue;
                }
                else if (char.IsLowSurrogate(c))
                {
                    throw new ArgumentException("String contains high surrogate not preceded by low surrogate");
                }
                switch ((uint) c)
                {
                    // These are not required by json spec
                    // but used to prevent security bugs in javascript.
                    case 0xfeff:  // Zero width no-break space
                    case 0xfff9:  // Interlinear annotation anchor
                    case 0xfffa:  // Interlinear annotation separator
                    case 0xfffb:  // Interlinear annotation terminator

                    case 0x00ad:  // Soft-hyphen
                    case 0x06dd:  // Arabic end of ayah
                    case 0x070f:  // Syriac abbreviation mark
                    case 0x17b4:  // Khmer vowel inherent Aq
                    case 0x17b5:  // Khmer vowel inherent Aa
                        HexEncodeUtf16CodeUnit(writer, c);
                        break;

                    default:
                        if ((c >= 0x0600 && c <= 0x0603) ||  // Arabic signs
                            (c >= 0x200b && c <= 0x200f) ||  // Zero width etc.
                            (c >= 0x2028 && c <= 0x202e) ||  // Separators etc.
                            (c >= 0x2060 && c <= 0x2064) ||  // Invisible etc.
                            (c >= 0x206a && c <= 0x206f))
                        {
                            HexEncodeUtf16CodeUnit(writer, c);
                        }
                        else
                        {
                            // No handling of surrogates here - that's done earlier
                            writer.Write(c);
                        }
                        break;
                }
            }
            writer.Write('"');
        }

        private const string Hex = "0123456789abcdef";
        private static void HexEncodeUtf16CodeUnit(TextWriter writer, char c)
        {
            writer.Write("\\u");
            writer.Write(Hex[(c >> 12) & 0xf]);
            writer.Write(Hex[(c >> 8) & 0xf]);
            writer.Write(Hex[(c >> 4) & 0xf]);
            writer.Write(Hex[(c >> 0) & 0xf]);
        }

        /// <summary>
        /// Settings controlling JSON formatting.
        /// </summary>
        public sealed class Settings
        {
            /// <summary>
            /// Default settings, as used by <see cref="JsonFormatterBefore.Default"/>
            /// </summary>
            public static Settings Default { get; }

            // Workaround for the Mono compiler complaining about XML comments not being on
            // valid language elements.
            static Settings()
            {
                Default = new Settings(false);
            }

            /// <summary>
            /// Whether fields which would otherwise not be included in the formatted data
            /// should be formatted even when the value is not present, or has the default value.
            /// This option only affects fields which don't support "presence" (e.g.
            /// singular non-optional proto3 primitive fields).
            /// </summary>
            public bool FormatDefaultValues { get; }

            /// <summary>
            /// The type registry used to format <see cref="Any"/> messages.
            /// </summary>
            public TypeRegistry TypeRegistry { get; }

            /// <summary>
            /// Whether to format enums as ints. Defaults to false.
            /// </summary>
            public bool FormatEnumsAsIntegers { get; }


            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified formatting of default values
            /// and an empty type registry.
            /// </summary>
            /// <param name="formatDefaultValues"><c>true</c> if default values (0, empty strings etc) should be formatted; <c>false</c> otherwise.</param>
            public Settings(bool formatDefaultValues) : this(formatDefaultValues, TypeRegistry.Empty)
            {
            }

            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified formatting of default values
            /// and type registry.
            /// </summary>
            /// <param name="formatDefaultValues"><c>true</c> if default values (0, empty strings etc) should be formatted; <c>false</c> otherwise.</param>
            /// <param name="typeRegistry">The <see cref="TypeRegistry"/> to use when formatting <see cref="Any"/> messages.</param>
            public Settings(bool formatDefaultValues, TypeRegistry typeRegistry) : this(formatDefaultValues, typeRegistry, false)
            {
            }

            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified parameters.
            /// </summary>
            /// <param name="formatDefaultValues"><c>true</c> if default values (0, empty strings etc) should be formatted; <c>false</c> otherwise.</param>
            /// <param name="typeRegistry">The <see cref="TypeRegistry"/> to use when formatting <see cref="Any"/> messages. TypeRegistry.Empty will be used if it is null.</param>
            /// <param name="formatEnumsAsIntegers"><c>true</c> to format the enums as integers; <c>false</c> to format enums as enum names.</param>
            private Settings(bool formatDefaultValues,
                            TypeRegistry typeRegistry,
                            bool formatEnumsAsIntegers)
            {
                FormatDefaultValues = formatDefaultValues;
                TypeRegistry = typeRegistry ?? TypeRegistry.Empty;
                FormatEnumsAsIntegers = formatEnumsAsIntegers;
            }

            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified formatting of default values and the current settings.
            /// </summary>
            /// <param name="formatDefaultValues"><c>true</c> if default values (0, empty strings etc) should be formatted; <c>false</c> otherwise.</param>
            public Settings WithFormatDefaultValues(bool formatDefaultValues) => new Settings(formatDefaultValues, TypeRegistry, FormatEnumsAsIntegers);

            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified type registry and the current settings.
            /// </summary>
            /// <param name="typeRegistry">The <see cref="TypeRegistry"/> to use when formatting <see cref="Any"/> messages.</param>
            public Settings WithTypeRegistry(TypeRegistry typeRegistry) => new Settings(FormatDefaultValues, typeRegistry, FormatEnumsAsIntegers);

            /// <summary>
            /// Creates a new <see cref="Settings"/> object with the specified enums formatting option and the current settings.
            /// </summary>
            /// <param name="formatEnumsAsIntegers"><c>true</c> to format the enums as integers; <c>false</c> to format enums as enum names.</param>
            public Settings WithFormatEnumsAsIntegers(bool formatEnumsAsIntegers) => new Settings(FormatDefaultValues, TypeRegistry, formatEnumsAsIntegers);
        }

        // Effectively a cache of mapping from enum values to the original name as specified in the proto file,
        // fetched by reflection.
        // The need for this is unfortunate, as is its unbounded size, but realistically it shouldn't cause issues.
        private static class OriginalEnumValueHelper
        {
            // TODO: In the future we might want to use ConcurrentDictionary, at the point where all
            // the platforms we target have it.
            private static readonly Dictionary<System.Type, Dictionary<object, string>> dictionaries
                = new Dictionary<System.Type, Dictionary<object, string>>();

            [UnconditionalSuppressMessage("Trimming", "IL2072",
                Justification = "The field for the value must still be present. It will be returned by reflection, will be in this collection, and its name can be resolved.")]
            internal static string GetOriginalName(object value)
            {
                var enumType = value.GetType();
                Dictionary<object, string> nameMapping;
                lock (dictionaries)
                {
                    if (!dictionaries.TryGetValue(enumType, out nameMapping))
                    {
                        nameMapping = GetNameMapping(enumType);
                        dictionaries[enumType] = nameMapping;
                    }
                }

                string originalName;
                // If this returns false, originalName will be null, which is what we want.
                nameMapping.TryGetValue(value, out originalName);
                return originalName;
            }

            private static Dictionary<object, string> GetNameMapping(
                [DynamicallyAccessedMembers(
                    DynamicallyAccessedMemberTypes.PublicFields |
                    DynamicallyAccessedMemberTypes.NonPublicFields)]
                System.Type enumType)
            {
                return enumType.GetTypeInfo().DeclaredFields
                    .Where(f => f.IsStatic)
                    .Where(f => f.GetCustomAttributes<OriginalNameAttribute>()
                                 .FirstOrDefault()?.PreferredAlias ?? true)
                    .ToDictionary(f => f.GetValue(null),
                                  f => f.GetCustomAttributes<OriginalNameAttribute>()
                                        .FirstOrDefault()
                                        // If the attribute hasn't been applied, fall back to the name of the field.
                                        ?.Name ?? f.Name);
            }
        }
    }
}
using BenchmarkDotNet.Attributes;
using Google.Protobuf.TestProtos;

namespace Google.Protobuf.Benchmarks
{
    [MemoryDiagnoser]
    public class JsonFormatterBenchmarks
    {
        private static readonly JsonFormatter jsonFormatterPretty = new JsonFormatter(JsonFormatter.Settings.Default.WithIndentation());

        [Benchmark]
        public string JsonFormatterTest()
        {
            var value = GetMessage();
            return JsonFormatter.Default.Format(value);
        }

        [Benchmark]
        public string JsonFormatterPrettyTest()
        {
            var value = GetMessage();
            return jsonFormatterPretty.Format(value);
        }

        [Benchmark]
        public string JsonFormatterBeforeTest()
        {
            var value = GetMessage();
            return JsonFormatterBefore.Default.Format(value);
        }

        private static NestedTestAllTypes GetMessage()
        {
            return new NestedTestAllTypes
            {
                Payload = new TestAllTypes
                {
                    SingleBool = true,
                    SingleInt32 = 100,
                    SingleString = "multiple fields",
                    RepeatedString = { "string1", "string2" },
                },
                Child = new NestedTestAllTypes
                {
                    Payload = new TestAllTypes
                    {
                        SingleString = "single field",
                    },
                },
                RepeatedChild =
                {
                    new NestedTestAllTypes { Payload = new TestAllTypes { SingleString = "child 1", RepeatedString = { "string" } } },
                    new NestedTestAllTypes { Payload = new TestAllTypes { SingleString = "child 2" } },
                },
            };
        }
    }
}

@boukeversteegh
Copy link
Contributor Author

Hi @elharo, thank you for running the pipelines. Is there anything I should still do, or will a reviewer be assigned at some point from now?

@boukeversteegh
Copy link
Contributor Author

Hey @jskeet!

Would you be interested in reviewing or merging this PR?
It should be rather straightforward.

Cheers~

@jskeet
Copy link
Contributor

jskeet commented Jan 31, 2022

@boukeversteegh: This is something for the protobuf team to consider. In particular, I would look at whether it's supported by other languages (and particularly Java as the closest cousin).

@boukeversteegh
Copy link
Contributor Author

@boukeversteegh: This is something for the protobuf team to consider. In particular, I would look at whether it's supported by other languages (and particularly Java as the closest cousin).

Alright! Here are the languages that seem to support indentation at the moment:

  • Java
    /**
    * A TextGenerator adds indentation when writing formatted text.
    */
    private static final class PrettyTextGenerator implements TextGenerator {
  • C
    // Whether to add spaces, line breaks and indentation to make the JSON output
    // easy to read.
    bool add_whitespace;
  • C++
    // An ObjectWriter implementation that outputs JSON. This ObjectWriter
    // supports writing a compact form or a pretty printed form.
    //
    // Sample usage:
    // string output;
    // StringOutputStream* str_stream = new StringOutputStream(&output);
    // CodedOutputStream* out_stream = new CodedOutputStream(str_stream);
    // JsonObjectWriter* ow = new JsonObjectWriter(" ", out_stream);
    // ow->StartObject("")
    // ->RenderString("name", "value")
    // ->RenderString("emptystring", string())
    // ->StartObject("nested")
    // ->RenderInt64("light", 299792458);
    // ->RenderDouble("pi", 3.141592653589793);
    // ->EndObject()
    // ->StartList("empty")
    // ->EndList()
    // ->EndObject();
    //
    // And then the output string would become:
    // {
    // "name": "value",
    // "emptystring": "",
    // "nested": {
    // "light": "299792458",
    // "pi": 3.141592653589793
    // },
    // "empty": []
    // }
    //
    // JsonObjectWriter does not validate if calls actually result in valid JSON.
    // For example, passing an empty name when one would be required won't result
    // in an error, just an invalid output.
    //
    // Note that all int64 and uint64 are rendered as strings instead of numbers.
    // This is because JavaScript parses numbers as 64-bit float thus int64 and
    // uint64 would lose precision if rendered as numbers.
    //
    // JsonObjectWriter is thread-unsafe.
    class PROTOBUF_EXPORT JsonObjectWriter : public StructuredObjectWriter {
  • Python
    def MessageToJson(
    message,
    including_default_value_fields=False,
    preserving_proto_field_name=False,
    indent=2,
    sort_keys=False,
    use_integers_for_enums=False,
    descriptor_pool=None,
    float_precision=None):
    """Converts protobuf message to JSON format.

@jskeet
Copy link
Contributor

jskeet commented Feb 1, 2022

Okay, that's good to know. I'll have a look at the implementation when I can, but I can't make any promises. (This isn't a library I work on day-to-day.)

FormatInternal(message, writer, depth: 0);
}

private void FormatInternal(IMessage message, TextWriter writer, int depth)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To calculate indentation, Format needs to know the current depth.
I added a private overload to make sure the public API is not changed.

However, it would be useful to make it public, to allow embedding the json inside another json string (which is what I need personally). If the reviewer sees no objection I would prefer to add a public overload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be okay, yes - the docs for depth will just need to be really clear. I think it's also entirely reasonable for the docs for the existing method to say (probably in a remarks element) that it delegates to this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the overload public, with <remark> explaining the delegation.
Renamed depth to indentationLevel for clarity

@@ -356,7 +362,8 @@ private static bool IsDefaultValue(FieldDescriptor descriptor, object value)
/// </summary>
/// <param name="writer">The writer to write the value to. Must not be null.</param>
/// <param name="value">The value to write. May be null.</param>
public void WriteValue(TextWriter writer, object value)
/// <param name="depth">The current indentation depth. Optional when writing non-indented values.</param>
public void WriteValue(TextWriter writer, object value, int depth = 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of writing, I had not realized this method was public, so the public API is changed unintentionally.
This could be breaking when WriteValue is called through reflection.

Let me know if this should be avoided, and a new private overload is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please introduce a new private overload - or even a public one (without the parameter being optional). We definitely, definitely don't want to break binary compatibility. (If it were only a problem calling via reflection, I wouldn't be too bothered - but code using the new version uses a library which in turn uses the old version, this could fail without any reflection being involved.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary compatibility is a great point that I hadn't even thought of. Fixed it!

@boukeversteegh
Copy link
Contributor Author

Hi @jskeet, thank you for remembering this!

  • Squashed and rebased everything (without squashing I had to merge conflicts at every commit)
  • Added the string Format(message, indentationLevel) overload
  • Added remarks, the string overload includes a code example. For the TextWriter overload, an example would be somewhat verbose.
  • Added a test case for it

@jskeet
Copy link
Contributor

jskeet commented Jul 22, 2022

Apologies for not responding earlier - I didn't get notified of your comment for some reason. I'll try to get this finalized ASAP.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits - looks great otherwise. (Apologies if they're nits I should have caught before.) Given that they're all formatting etc, let me know if you'd prefer me to get this merged then create a new PR that just fixes the nits.

csharp/src/Google.Protobuf/JsonFormatter.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonFormatter.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonFormatter.cs Outdated Show resolved Hide resolved
@jskeet
Copy link
Contributor

jskeet commented Jul 22, 2022

Argh, I failed to notice before - as you've added a file (JsonFormatterSettingsTest.cs) you'll need to modify Makefile.am in the root directory to list that file, otherwise it'll break various CI tests (that's what's causing the Distcheck failures).

@boukeversteegh
Copy link
Contributor Author

Fixed the nits and the missing file. I don't mind to keep iterating until it's good, but if there's anything trivial left, and you find it easier to apply the change yourself, please don't refrain out of courtesy. I'd be glad if this branch can get cleared out one way or another :-)

@jskeet
Copy link
Contributor

jskeet commented Jul 25, 2022

Rerunning Kokoro now - fingers crossed. And thanks for the go-ahead to add fixes myself - will do that if there's anything more. I've worked out why I wasn't being notified, too - I'd accidentally muted the thread in gmail. Now unmuted, so you shouldn't get radio silence from me any more :)

@jskeet
Copy link
Contributor

jskeet commented Jul 25, 2022

Hmm... I suspect some of the failures may be due to the branch being out of date and hard to rebase.

My plan is to wait until tomorrow (as I've only got an hour now, and this might take longer than that), reset and recreate the change as a single commit, then create a new PR against my own repo just so that I don't cause any problems on your branch first. If that works, I'll push the "single commit rebased against main" commit on your branch (hopefully so it shows us both as authors) and we can both double check it in this PR. Once we're both happy and it's green, we can get it in.

Does that sound about right?

@boukeversteegh
Copy link
Contributor Author

Strange, I had already squashed and rebased everything from the previous review round, as it became hard to preserve every commit.

I'm fine with squashing everything once again and then rebasing, if you think it helps.

Tomorrow evening I'll have time, but no need to wait for me to do it. If you proceed though, please do make sure to preserve the commit author, as for me it's a great honor to contribute to a google repository and I would like the event recorded in its history.

@jskeet jskeet mentioned this pull request Jul 26, 2022
@jskeet
Copy link
Contributor

jskeet commented Jul 26, 2022

Strange, I had already squashed and rebased everything from the previous review round, as it became hard to preserve every commit.

Indeed - my local copy of your branch was out of date. I thought I'd fetched it, but a conflict kept it from being updated.

It looks like it's still 75 commits behind main - but it does rebase cleanly. Unfortunately I don't have permission to push to your branch, but I've created #10315 which is literally just "rebase main, git push" and which I'm using to run Kokoro test. I'll try to get that green (it may already go green from the start). We'll get there :)

@jskeet
Copy link
Contributor

jskeet commented Jul 26, 2022

Okay, the good news is that my other PR is green - so hopefully if you just fetch main, rebase to it, then force push again, then I should be able to rerun kokoro and get green lights. Sorry for all the administrivia involved, but we're getting there!

@boukeversteegh
Copy link
Contributor Author

Alright! Squashed and rebased again. Still as optimistic as you are!

@jskeet
Copy link
Contributor

jskeet commented Jul 26, 2022

Kokoro is green - hooray! @fowles please could you run the maintainer workflow and merge this if everything passes?

@fowles fowles merged commit b43e8cf into protocolbuffers:main Jul 26, 2022
@boukeversteegh
Copy link
Contributor Author

Yay! Thank you so much for your help getting this through!

@kwaclaw
Copy link

kwaclaw commented Oct 16, 2022

Any ideas when this will arrive in a published Nuget package? I am currently on 3.21.7 and it is not there.

@jskeet
Copy link
Contributor

jskeet commented Oct 16, 2022

@kwaclaw: I wouldn't expect it until 3.22.0. I don't have a timeframe for when that will come out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants