Skip to content

Commit

Permalink
[clang-format] Break long string literals in C#, etc.
Browse files Browse the repository at this point in the history
Now strings that are too long for one line in C#, Java, JavaScript, and
Verilog get broken into several lines.  C# and JavaScript interpolated
strings are not broken.

A new subclass BreakableStringLiteralUsingOperators is used to handle
the logic for adding plus signs and commas.  The updateAfterBroken
method was added because now parentheses or braces may be required after
the parentheses or commas are added.  In order to decide whether the
added plus sign should be unindented in the BreakableToken object, the
logic for it is taken out into a separate function
shouldUnindentNextOperator.

The logic for finding the continuation indentation when the option
AlignAfterOpenBracket is set to DontAlign is not implemented yet.  So in
that case the new line may have the wrong indentation, and the parts may
have the wrong length if the string needs to be broken more than once
because finding where to break the string depends on where the string
starts.

The preambles for the C# and Java unit tests are changed to the newer
style in order to allow the 3-argument verifyFormat macro.  Some cases
are changed from verifyFormat to verifyImcompleteFormat because those
use incomplete code and the new verifyFormat function checks that the
code is complete.

The line in the doc was changed to being indented by 4 spaces, that is,
the default continuation indentation.  It has always been the case.  It
was probably a mistake that the doc showed 2 spaces previously.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D154093
  • Loading branch information
sstwcw committed Aug 24, 2023
1 parent 825cec2 commit 16ccba5
Show file tree
Hide file tree
Showing 13 changed files with 609 additions and 76 deletions.
31 changes: 30 additions & 1 deletion clang/docs/ClangFormatStyleOptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2722,6 +2722,8 @@ the configuration (without a prefix: ``Auto``).
**BreakStringLiterals** (``Boolean``) :versionbadge:`clang-format 3.9` :ref:`<BreakStringLiterals>`
Allow breaking string literals when formatting.

In C, C++, and Objective-C:

.. code-block:: c++

true:
Expand All @@ -2731,7 +2733,34 @@ the configuration (without a prefix: ``Auto``).

false:
const char* x =
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";

In C#, Java, and JavaScript:

.. code-block:: c++

true:
var x = "veryVeryVeryVeryVeryVe" +
"ryVeryVeryVeryVeryVery" +
"VeryLongString";

false:
var x =
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
C# and JavaScript interpolated strings are not broken.

In Verilog:

.. code-block:: c++

true:
string x = {"veryVeryVeryVeryVeryVe",
"ryVeryVeryVeryVeryVery",
"VeryLongString"};

false:
string x =
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";

.. _ColumnLimit:

Expand Down
30 changes: 29 additions & 1 deletion clang/include/clang/Format/Format.h
Original file line number Diff line number Diff line change
Expand Up @@ -2008,6 +2008,8 @@ struct FormatStyle {
bool BreakAfterJavaFieldAnnotations;

/// Allow breaking string literals when formatting.
///
/// In C, C++, and Objective-C:
/// \code
/// true:
/// const char* x = "veryVeryVeryVeryVeryVe"
Expand All @@ -2016,8 +2018,34 @@ struct FormatStyle {
///
/// false:
/// const char* x =
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
/// \endcode
///
/// In C#, Java, and JavaScript:
/// \code
/// true:
/// var x = "veryVeryVeryVeryVeryVe" +
/// "ryVeryVeryVeryVeryVery" +
/// "VeryLongString";
///
/// false:
/// var x =
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
/// \endcode
/// C# and JavaScript interpolated strings are not broken.
///
/// In Verilog:
/// \code
/// true:
/// string x = {"veryVeryVeryVeryVeryVe",
/// "ryVeryVeryVeryVeryVery",
/// "VeryLongString"};
///
/// false:
/// string x =
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
/// \endcode
///
/// \version 3.9
bool BreakStringLiterals;

Expand Down
114 changes: 114 additions & 0 deletions clang/lib/Format/BreakableToken.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,120 @@ void BreakableStringLiteral::insertBreak(unsigned LineIndex,
Prefix, InPPDirective, 1, StartColumn);
}

BreakableStringLiteralUsingOperators::BreakableStringLiteralUsingOperators(
const FormatToken &Tok, QuoteStyleType QuoteStyle, bool UnindentPlus,
unsigned StartColumn, unsigned UnbreakableTailLength, bool InPPDirective,
encoding::Encoding Encoding, const FormatStyle &Style)
: BreakableStringLiteral(
Tok, StartColumn, /*Prefix=*/QuoteStyle == SingleQuotes ? "'"
: QuoteStyle == AtDoubleQuotes ? "@\""
: "\"",
/*Postfix=*/QuoteStyle == SingleQuotes ? "'" : "\"",
UnbreakableTailLength, InPPDirective, Encoding, Style),
BracesNeeded(Tok.isNot(TT_StringInConcatenation)),
QuoteStyle(QuoteStyle) {
// Find the replacement text for inserting braces and quotes and line breaks.
// We don't create an allocated string concatenated from parts here because it
// has to outlive the BreakableStringliteral object. The brace replacements
// include a quote so that WhitespaceManager can tell it apart from whitespace
// replacements between the string and surrounding tokens.

// The option is not implemented in JavaScript.
bool SignOnNewLine =
!Style.isJavaScript() &&
Style.BreakBeforeBinaryOperators != FormatStyle::BOS_None;

if (Style.isVerilog()) {
// In Verilog, all strings are quoted by double quotes, joined by commas,
// and wrapped in braces. The comma is always before the newline.
assert(QuoteStyle == DoubleQuotes);
LeftBraceQuote = Style.Cpp11BracedListStyle ? "{\"" : "{ \"";
RightBraceQuote = Style.Cpp11BracedListStyle ? "\"}" : "\" }";
Postfix = "\",";
Prefix = "\"";
} else {
// The plus sign may be on either line. And also C# and JavaScript have
// different quoting styles.
if (QuoteStyle == SingleQuotes) {
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( '" : "('";
RightBraceQuote = Style.SpacesInParensOptions.Other ? "' )" : "')";
Postfix = SignOnNewLine ? "'" : "' +";
Prefix = SignOnNewLine ? "+ '" : "'";
} else {
if (QuoteStyle == AtDoubleQuotes) {
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( @" : "(@";
Prefix = SignOnNewLine ? "+ @\"" : "@\"";
} else {
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( \"" : "(\"";
Prefix = SignOnNewLine ? "+ \"" : "\"";
}
RightBraceQuote = Style.SpacesInParensOptions.Other ? "\" )" : "\")";
Postfix = SignOnNewLine ? "\"" : "\" +";
}
}

// Following lines are indented by the width of the brace and space if any.
ContinuationIndent = BracesNeeded ? LeftBraceQuote.size() - 1 : 0;
// The plus sign may need to be unindented depending on the style.
// FIXME: Add support for DontAlign.
if (!Style.isVerilog() && SignOnNewLine && !BracesNeeded && UnindentPlus &&
Style.AlignOperands == FormatStyle::OAS_AlignAfterOperator) {
ContinuationIndent -= 2;
}
}

unsigned BreakableStringLiteralUsingOperators::getRemainingLength(
unsigned LineIndex, unsigned Offset, unsigned StartColumn) const {
return UnbreakableTailLength + (BracesNeeded ? RightBraceQuote.size() : 1) +
encoding::columnWidthWithTabs(Line.substr(Offset), StartColumn,
Style.TabWidth, Encoding);
}

unsigned
BreakableStringLiteralUsingOperators::getContentStartColumn(unsigned LineIndex,
bool Break) const {
return std::max(
0,
static_cast<int>(StartColumn) +
(Break ? ContinuationIndent + static_cast<int>(Prefix.size())
: (BracesNeeded ? static_cast<int>(LeftBraceQuote.size()) - 1
: 0) +
(QuoteStyle == AtDoubleQuotes ? 2 : 1)));
}

void BreakableStringLiteralUsingOperators::insertBreak(
unsigned LineIndex, unsigned TailOffset, Split Split,
unsigned ContentIndent, WhitespaceManager &Whitespaces) const {
Whitespaces.replaceWhitespaceInToken(
Tok, /*Offset=*/(QuoteStyle == AtDoubleQuotes ? 2 : 1) + TailOffset +
Split.first,
/*ReplaceChars=*/Split.second, /*PreviousPostfix=*/Postfix,
/*CurrentPrefix=*/Prefix, InPPDirective, /*NewLines=*/1,
/*Spaces=*/
std::max(0, static_cast<int>(StartColumn) + ContinuationIndent));
}

void BreakableStringLiteralUsingOperators::updateAfterBroken(
WhitespaceManager &Whitespaces) const {
// Add the braces required for breaking the token if they are needed.
if (!BracesNeeded)
return;

// To add a brace or parenthesis, we replace the quote (or the at sign) with a
// brace and another quote. This is because the rest of the program requires
// one replacement for each source range. If we replace the empty strings
// around the string, it may conflict with whitespace replacements between the
// string and adjacent tokens.
Whitespaces.replaceWhitespaceInToken(
Tok, /*Offset=*/0, /*ReplaceChars=*/1, /*PreviousPostfix=*/"",
/*CurrentPrefix=*/LeftBraceQuote, InPPDirective, /*NewLines=*/0,
/*Spaces=*/0);
Whitespaces.replaceWhitespaceInToken(
Tok, /*Offset=*/Tok.TokenText.size() - 1, /*ReplaceChars=*/1,
/*PreviousPostfix=*/RightBraceQuote,
/*CurrentPrefix=*/"", InPPDirective, /*NewLines=*/0, /*Spaces=*/0);
}

BreakableComment::BreakableComment(const FormatToken &Token,
unsigned StartColumn, bool InPPDirective,
encoding::Encoding Encoding,
Expand Down
43 changes: 43 additions & 0 deletions clang/lib/Format/BreakableToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,11 @@ class BreakableToken {
/// as a unit and is responsible for the formatting of the them.
virtual void updateNextToken(LineState &State) const {}

/// Adds replacements that are needed when the token is broken. Such as
/// wrapping a JavaScript string in parentheses after it gets broken with plus
/// signs.
virtual void updateAfterBroken(WhitespaceManager &Whitespaces) const {}

protected:
BreakableToken(const FormatToken &Tok, bool InPPDirective,
encoding::Encoding Encoding, const FormatStyle &Style)
Expand Down Expand Up @@ -283,6 +288,44 @@ class BreakableStringLiteral : public BreakableToken {
unsigned UnbreakableTailLength;
};

class BreakableStringLiteralUsingOperators : public BreakableStringLiteral {
public:
enum QuoteStyleType {
DoubleQuotes, // The string is quoted with double quotes.
SingleQuotes, // The JavaScript string is quoted with single quotes.
AtDoubleQuotes, // The C# verbatim string is quoted with the at sign and
// double quotes.
};
/// Creates a breakable token for a single line string literal for C#, Java,
/// JavaScript, or Verilog.
///
/// \p StartColumn specifies the column in which the token will start
/// after formatting.
BreakableStringLiteralUsingOperators(
const FormatToken &Tok, QuoteStyleType QuoteStyle, bool UnindentPlus,
unsigned StartColumn, unsigned UnbreakableTailLength, bool InPPDirective,
encoding::Encoding Encoding, const FormatStyle &Style);
unsigned getRemainingLength(unsigned LineIndex, unsigned Offset,
unsigned StartColumn) const override;
unsigned getContentStartColumn(unsigned LineIndex, bool Break) const override;
void insertBreak(unsigned LineIndex, unsigned TailOffset, Split Split,
unsigned ContentIndent,
WhitespaceManager &Whitespaces) const override;
void updateAfterBroken(WhitespaceManager &Whitespaces) const override;

protected:
// Whether braces or parentheses should be inserted around the string to form
// a concatenation.
bool BracesNeeded;
QuoteStyleType QuoteStyle;
// The braces or parentheses along with the first character which they
// replace, either a quote or at sign.
StringRef LeftBraceQuote;
StringRef RightBraceQuote;
// Width added to the left due to the added. Does not apply to the first line.
int ContinuationIndent;
};

class BreakableComment : public BreakableToken {
protected:
/// Creates a breakable token for a comment.
Expand Down
63 changes: 44 additions & 19 deletions clang/lib/Format/ContinuationIndenter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ static bool shouldIndentWrappedSelectorName(const FormatStyle &Style,
return Style.IndentWrappedFunctionNames || LineType == LT_ObjCMethodDecl;
}

// Returns true if a binary operator following \p Tok should be unindented when
// the style permits it.
static bool shouldUnindentNextOperator(const FormatToken &Tok) {
const FormatToken *Previous = Tok.getPreviousNonComment();
return Previous && (Previous->getPrecedence() == prec::Assignment ||
Previous->isOneOf(tok::kw_return, TT_RequiresClause));
}

// Returns the length of everything up to the first possible line break after
// the ), ], } or > matching \c Tok.
static unsigned getLengthToMatchingParen(const FormatToken &Tok,
Expand Down Expand Up @@ -1616,11 +1624,10 @@ void ContinuationIndenter::moveStatePastFakeLParens(LineState &State,
if (Previous && Previous->endsSequence(tok::l_paren, tok::kw__Generic))
NewParenState.Indent = CurrentState.LastSpace;

if (Previous &&
(Previous->getPrecedence() == prec::Assignment ||
Previous->isOneOf(tok::kw_return, TT_RequiresClause) ||
(PrecedenceLevel == prec::Conditional && Previous->is(tok::question) &&
Previous->is(TT_ConditionalExpr))) &&
if ((shouldUnindentNextOperator(Current) ||
(Previous &&
(PrecedenceLevel == prec::Conditional &&
Previous->is(tok::question) && Previous->is(TT_ConditionalExpr)))) &&
!Newline) {
// If BreakBeforeBinaryOperators is set, un-indent a bit to account for
// the operator and keep the operands aligned.
Expand Down Expand Up @@ -2183,14 +2190,9 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
LineState &State, bool AllowBreak) {
unsigned StartColumn = State.Column - Current.ColumnWidth;
if (Current.isStringLiteral()) {
// FIXME: String literal breaking is currently disabled for C#, Java, Json
// and JavaScript, as it requires strings to be merged using "+" which we
// don't support.
if (Style.Language == FormatStyle::LK_Java || Style.isJavaScript() ||
Style.isCSharp() || Style.isJson() || !Style.BreakStringLiterals ||
!AllowBreak) {
// Strings in JSON can not be broken.
if (Style.isJson() || !Style.BreakStringLiterals || !AllowBreak)
return nullptr;
}

// Don't break string literals inside preprocessor directives (except for
// #define directives, as their contents are stored in separate lines and
Expand All @@ -2209,6 +2211,33 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
return nullptr;

StringRef Text = Current.TokenText;
// We need this to address the case where there is an unbreakable tail only
// if certain other formatting decisions have been taken. The
// UnbreakableTailLength of Current is an overapproximation is that case and
// we need to be correct here.
unsigned UnbreakableTailLength = (State.NextToken && canBreak(State))
? 0
: Current.UnbreakableTailLength;

if (Style.isVerilog() || Style.Language == FormatStyle::LK_Java ||
Style.isJavaScript() || Style.isCSharp()) {
BreakableStringLiteralUsingOperators::QuoteStyleType QuoteStyle;
if (Style.isJavaScript() && Text.startswith("'") && Text.endswith("'")) {
QuoteStyle = BreakableStringLiteralUsingOperators::SingleQuotes;
} else if (Style.isCSharp() && Text.startswith("@\"") &&
Text.endswith("\"")) {
QuoteStyle = BreakableStringLiteralUsingOperators::AtDoubleQuotes;
} else if (Text.startswith("\"") && Text.endswith("\"")) {
QuoteStyle = BreakableStringLiteralUsingOperators::DoubleQuotes;
} else {
return nullptr;
}
return std::make_unique<BreakableStringLiteralUsingOperators>(
Current, QuoteStyle,
/*UnindentPlus=*/shouldUnindentNextOperator(Current), StartColumn,
UnbreakableTailLength, State.Line->InPPDirective, Encoding, Style);
}

StringRef Prefix;
StringRef Postfix;
// FIXME: Handle whitespace between '_T', '(', '"..."', and ')'.
Expand All @@ -2221,13 +2250,6 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
Text.startswith(Prefix = "u8\"") ||
Text.startswith(Prefix = "L\""))) ||
(Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))) {
// We need this to address the case where there is an unbreakable tail
// only if certain other formatting decisions have been taken. The
// UnbreakableTailLength of Current is an overapproximation is that case
// and we need to be correct here.
unsigned UnbreakableTailLength = (State.NextToken && canBreak(State))
? 0
: Current.UnbreakableTailLength;
return std::make_unique<BreakableStringLiteral>(
Current, StartColumn, Prefix, Postfix, UnbreakableTailLength,
State.Line->InPPDirective, Encoding, Style);
Expand Down Expand Up @@ -2628,6 +2650,9 @@ ContinuationIndenter::breakProtrudingToken(const FormatToken &Current,
Current.UnbreakableTailLength;

if (BreakInserted) {
if (!DryRun)
Token->updateAfterBroken(Whitespaces);

// If we break the token inside a parameter list, we need to break before
// the next parameter on all levels, so that the next parameter is clearly
// visible. Line comments already introduce a break.
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/Format/FormatToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ namespace format {
TYPE(StartOfName) \
TYPE(StatementAttributeLikeMacro) \
TYPE(StatementMacro) \
/* A string that is part of a string concatenation. For C#, JavaScript, and \
* Java, it is used for marking whether a string needs parentheses around it \
* if it is to be split into parts joined by `+`. For Verilog, whether \
* braces need to be added to split it. Not used for other languages. */ \
TYPE(StringInConcatenation) \
TYPE(StructLBrace) \
TYPE(StructuredBindingLSquare) \
TYPE(TemplateCloser) \
Expand Down

0 comments on commit 16ccba5

Please sign in to comment.