Skip to content

User controllable multiline string serialization for YAML#657

Merged
liuzicheng1987 merged 6 commits intogetml:mainfrom
microdee:feature/multiline-strings
May 3, 2026
Merged

User controllable multiline string serialization for YAML#657
liuzicheng1987 merged 6 commits intogetml:mainfrom
microdee:feature/multiline-strings

Conversation

@microdee
Copy link
Copy Markdown
Contributor

@microdee microdee commented Apr 27, 2026

This PR introduces

  • an opt-in feature where string values can be serialized to | multiline literal blocks, if they contain new-line characters.
  • Dealing with trailing new lines when reading from | multiline literal blocks.

Writing multiline blocks

By default this feature is not enabled to preserve previous behaviour, and because it costs a call to std::basic_string::find on all string fields.

In order to enable the feature the user needs to call rfl::yaml::write with respective flag:

  const auto test = MultilineTestStruct{.normal_string = "The normal string",
                                        .multiline_string =
R"(Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo)"
  };
  auto yaml = rfl::yaml::write(test, rfl::yaml::Writer::string_multiline_literal);

Which produces

normal_string: The normal string
multiline_string: |
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
  quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo

in contrast to what it would produce without the flag:

normal_string: The normal string
multiline_string: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo"

I've added another flag string_all_literal which forces all string values to be a multiline literal whether it actually has new-lines or not. This may be less commonly useful, but this will spare a call to std::basic_string::find, in case the user may have mostly long strings in their fields. When using that with the same example the output looks like this:

normal_string: |
  The normal string
multiline_string: |
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
  quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo

Reading multiline blocks

While testing the latter case I've noticed that this style of multiline-string in YAML leaves a potentially unintentional new line at the end, which breaks re-serialization consistency checks. For this reason rfl::yaml::Reader::to_basic_type now trims the trailing new-lines from | literal block scalars. If given scalar is not a | literal block, then that's not affected.

For this trick rfl::yaml::Reader and in turn rfl::yaml::read now needs to know about the entire YAML input, so the reader can work with offset data provided by YAML::Node::Mark(). This may be a breaking change for the user, if they were directly working with YAML::Node inputs. Otherwise we wouldn't know if we could safely trim trailing new-lines, as yaml-cpp doesn't expose this information directly on the node.

To illustrate:

foo: "String with important trailing new lines\n\n" # those two new-lines are preserved
bar: |
  Text with a bit of space afterwards
  (with a second line)

# above new lines are trimmed from the very-end

Misc

It would be interesting to borrow the wisdom of the library author and/or the community in the following topics:

  • Control these flags on field/struct basis instead of the entire serialized object.
    • I'm not yet knowledgeable enough for the internals of this library to see where that would be feasible to do.

I've run all YAML tests, introduced a new one for this feature, and just for safety I've run all JSON tests as well, however I didn't modify anything which would affect other serialization methods or reflections. All tests passed on my side.

Thanks for consideration and I'll be happy to address your feedbacks.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiline YAML literals by adding a Flags enum to the Writer class and updating the serialization logic to optionally use YAML::Literal. It also includes a change to the Reader to trim trailing newlines from strings to mitigate issues with yaml-cpp's handling of multiline blocks. Feedback focuses on the potentially destructive and inconsistent nature of the unconditional string trimming in the Reader, as well as code duplication in the Writer's value insertion methods that should be refactored into a helper function.

Comment thread include/rfl/yaml/Reader.hpp Outdated
Comment thread include/rfl/yaml/Writer.hpp Outdated
Comment thread include/rfl/yaml/Reader.hpp Outdated
Comment on lines +88 to +89
// This is only done for literal blocks which doesn't have tags or anchors
if (_var.node_.Tag() == "!" && yaml_str[_var.node_.Mark().pos] == '|') {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reflect-cpp aspire to support yaml-anchors and/or tags? I could handle those here with an extra private function if they're needed, I just didn't want to bloat this PR further.

@microdee
Copy link
Copy Markdown
Contributor Author

The check failures seem to be an issue with VCPKG or how CMake cannot find ninja. Can this be caused by my edits in this PR? The only project structural change I had is the addition of a new test cpp file.

@liuzicheng1987
Copy link
Copy Markdown
Contributor

@microdee, sorry it took me a couple of days to get to this. Unfortunately, the Github Actions pipeline is a bit unstable lately, it has nothing to do with your PR.

Your PR looks great, I will merge it. Thank you for your contribution.

@liuzicheng1987 liuzicheng1987 merged commit ee0dbc6 into getml:main May 3, 2026
168 of 182 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants