Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement trailing_semicolon = "Preserve" #6149

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

compiler-errors
Copy link
Member

Implement trailing_semicolon = "Preserve" to preserve existing semicolons but also don't add new ones even if, e.g., we break a block.

I discovered some bugs in the formatting of return REALLY_LONG_EXPR; where the expression itself would fit onto one line, but due to double counting (subtracting the size of the semicolon in both format_stmt AND in format_expr) we end up breaking up the statement at LIMIT - 1 rather than LIMIT.

See the comment on semicolon_for_expr_extra_hacky_double_counted_spacing. Bikeshed that name if you want.

src/expr.rs Outdated
Comment on lines 67 to 69
let shape = if expr_type == ExprType::Statement
&& semicolon_for_expr_extra_hacky_double_counted_spacing(context, expr)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot remember what the bug policy here is, but ideally we'd just remove this adjustment to shape since it's already accounted for in format_stmt. That causes self-formatting test to fail because some expressions actually fit onto one line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can fix the shape adjustment if the fix is version gated to version=Two. You could updated the logic so that we only sub 1 when version=One is set,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it in Version=Two, added a test for one&two, and fixed the cases that changed rustfmt's own formatting (since rustfmt apparently uses Two! 😆)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responded to this in #6149 (comment)

Configurations.md Outdated Show resolved Hide resolved
src/expr.rs Outdated
Comment on lines 67 to 69
let shape = if expr_type == ExprType::Statement
&& semicolon_for_expr_extra_hacky_double_counted_spacing(context, expr)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can fix the shape adjustment if the fix is version gated to version=Two. You could updated the logic so that we only sub 1 when version=One is set,

Configurations.md Outdated Show resolved Hide resolved
src/config/options.rs Outdated Show resolved Hide resolved
ast::StmtKind::Expr(ref ex) | ast::StmtKind::Semi(ref ex) => {
ast::StmtKind::Semi(ref ex) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this the cause of the double semicolon counting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment below; I pointed out the reason for the the double semicolon counting in more detail.

I split these two branches because we need to preserve semicolons only if we have a StmtKind::Semi, and not if we have a StmtKind::Expr.

src/utils.rs Outdated
Comment on lines 293 to 301
/// Previously, we used to have `trailing_semicolon = always` enabled, and due to
/// a bug between `format_stmt` and `format_expr`, we used to subtract the size of
/// `;` *TWICE* from the shape. This means that an expr that would fit onto a line
/// of, e.g. 99 (limit 100) after subtracting one for the semicolon would still be
/// wrapped.
///
/// This function reimplements the old heuristic of double counting the "phantom"
/// semicolon that should have already been accounted for, to not break existing
/// formatting with the `trailing_semicolon = preserve` behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind showing an example where things are wrapping early, maybe even adding a version=One test case if we go the route of fixing things for version=Two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn main() {
    return hellooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo(arg);
}

The return statement line is exactly 100 chars long. When formatting the expression, we first subtract one from the shape here:

rustfmt/src/stmt.rs

Lines 119 to 128 in 7289391

ast::StmtKind::Expr(ref ex) | ast::StmtKind::Semi(ref ex) => {
let suffix = if semicolon_for_stmt(context, stmt, is_last_expr) {
";"
} else {
""
};
let shape = shape.sub_width(suffix.len())?;
format_expr(ex, expr_type, context, shape).map(|s| s + suffix)
}

(See let shape = shape.sub_width(suffix.len())?; )

And then once again here:

rustfmt/src/expr.rs

Lines 67 to 71 in 7289391

let shape = if expr_type == ExprType::Statement && semicolon_for_expr(context, expr) {
shape.sub_width(1)?
} else {
shape
};

(in format_expr).

In practice, for example, we need to format return helloo...o(arg) which is an expression that is 95 characters long (100 - 4 identation - 1 semicolon) but after subtracting from the shape twice, we only have have a budget of 94 characters, even though the whole expression should fit in the line.

I can add it as a version=One test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, i discovered this because it happens twice in practice in the rustfmt codebase, so the self-formatting test failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we use version=Two within rustfmt as a way to dogfood unstable formatting (at least that's my assumption)

/// (`return`/`break`/`continue`).
pub enum TrailingSemicolon {
/// Always rewrite `return;` and `break;` expressions to have a trailing semicolon,
/// unless the block is a single-line block, e.g. `let PAT = e else { return }`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of weird, but preexisting behavior, I believe.

I don't exactly understand what rustfmt does to avoid adding a semicolon for single-line-style blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different kinds of blocks. What kinds of blocks are you referencing? Just let-else blocks?

To the best of my knowledge all let-else blocks are handled when rewriting ast::Local nodes. The code removes 1 from the shape to account for the trailing semicolon (line 127). Maybe it shouldn't?

rustfmt/src/items.rs

Lines 49 to 188 in 4b1596f

impl Rewrite for ast::Local {
fn rewrite(&self, context: &RewriteContext<'_>, shape: Shape) -> Option<String> {
debug!(
"Local::rewrite {:?} {} {:?}",
self, shape.width, shape.indent
);
skip_out_of_file_lines_range!(context, self.span);
if contains_skip(&self.attrs) {
return None;
}
let attrs_str = self.attrs.rewrite(context, shape)?;
let mut result = if attrs_str.is_empty() {
"let ".to_owned()
} else {
combine_strs_with_missing_comments(
context,
&attrs_str,
"let ",
mk_sp(
self.attrs.last().map(|a| a.span.hi()).unwrap(),
self.span.lo(),
),
shape,
false,
)?
};
let let_kw_offset = result.len() - "let ".len();
// 4 = "let ".len()
let pat_shape = shape.offset_left(4)?;
// 1 = ;
let pat_shape = pat_shape.sub_width(1)?;
let pat_str = self.pat.rewrite(context, pat_shape)?;
result.push_str(&pat_str);
// String that is placed within the assignment pattern and expression.
let infix = {
let mut infix = String::with_capacity(32);
if let Some(ref ty) = self.ty {
let separator = type_annotation_separator(context.config);
let ty_shape = if pat_str.contains('\n') {
shape.with_max_width(context.config)
} else {
shape
}
.offset_left(last_line_width(&result) + separator.len())?
// 2 = ` =`
.sub_width(2)?;
let rewrite = ty.rewrite(context, ty_shape)?;
infix.push_str(separator);
infix.push_str(&rewrite);
}
if self.kind.init().is_some() {
infix.push_str(" =");
}
infix
};
result.push_str(&infix);
if let Some((init, else_block)) = self.kind.init_else_opt() {
// 1 = trailing semicolon;
let nested_shape = shape.sub_width(1)?;
result = rewrite_assign_rhs(
context,
result,
init,
&RhsAssignKind::Expr(&init.kind, init.span),
nested_shape,
)?;
if let Some(block) = else_block {
let else_kw_span = init.span.between(block.span);
// Strip attributes and comments to check if newline is needed before the else
// keyword from the initializer part. (#5901)
let init_str = if context.config.version() == Version::Two {
&result[let_kw_offset..]
} else {
result.as_str()
};
let force_newline_else = pat_str.contains('\n')
|| !same_line_else_kw_and_brace(init_str, context, else_kw_span, nested_shape);
let else_kw = rewrite_else_kw_with_comments(
force_newline_else,
true,
context,
else_kw_span,
shape,
);
result.push_str(&else_kw);
// At this point we've written `let {pat} = {expr} else' into the buffer, and we
// want to calculate up front if there's room to write the divergent block on the
// same line. The available space varies based on indentation so we clamp the width
// on the smaller of `shape.width` and `single_line_let_else_max_width`.
let max_width =
std::cmp::min(shape.width, context.config.single_line_let_else_max_width());
// If available_space hits zero we know for sure this will be a multi-lined block
let assign_str_with_else_kw = if context.config.version() == Version::Two {
&result[let_kw_offset..]
} else {
result.as_str()
};
let available_space = max_width.saturating_sub(assign_str_with_else_kw.len());
let allow_single_line = !force_newline_else
&& available_space > 0
&& allow_single_line_let_else_block(assign_str_with_else_kw, block);
let mut rw_else_block =
rewrite_let_else_block(block, allow_single_line, context, shape)?;
let single_line_else = !rw_else_block.contains('\n');
// +1 for the trailing `;`
let else_block_exceeds_width = rw_else_block.len() + 1 > available_space;
if allow_single_line && single_line_else && else_block_exceeds_width {
// writing this on one line would exceed the available width
// so rewrite the else block over multiple lines.
rw_else_block = rewrite_let_else_block(block, false, context, shape)?;
}
result.push_str(&rw_else_block);
};
}
result.push(';');
Some(result)
}
}

Copy link
Contributor

@ytmimi ytmimi Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure all block rewriting eventually calls into rewrite_block_inner, which eventually delegates to a few other block rewrite functions, e.g rewrite_empty_block or rewrite_single_line_block, and I'm fairly certain none of the block rewriting code handles trailing semicolons.

/// Always rewrite `return;` and `break;` expressions to have a trailing semicolon,
/// unless the block is a single-line block, e.g. `let PAT = e else { return }`.
Always,
/// Always return `return` and `break` expressions to remove the trailing semicolon.
Copy link
Contributor

@ytmimi ytmimi Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to avoid any confusion with the docs. Since this is the Never variant it might be better to not use the word "Always" in the description. One potential rewording:

/// Never add a trailing semicolon. This removes trailing semicolon's from `return` and `break` expressions.

Also, this won't remove them from continue;?

Comment on lines +2788 to +2796
```rust
fn foo() -> usize {
return 0;
}

fn bar() -> usize {
return 0
}
```
Copy link
Contributor

@ytmimi ytmimi Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't changes we necessarily need to make, but it might also be nice to include examples with break; and continue; to further demonstrate how this will work. Possibly adding a let-else example.

/// semicolon that should have already been accounted for, to not break existing
/// formatting with the `trailing_semicolon = Preserve` behavior for Version = One.
#[inline]
pub(crate) fn semicolon_for_expr_extra_hacky_double_counted_spacing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this function if we're version gating the fix and only applying the old behavior for version=One?

@@ -0,0 +1,49 @@
// rustfmt-trailing_semicolon: Never
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value should be Preserve here, or filename should be different if not

@calebcartwright
Copy link
Member

Overall lgtm, my impression is that there's only a couple outstanding nits so this is definitely a change we can and will make for 2024.

From a sequencing pov, I'd like to make sure we get the style guide text tweaked and (once merged) this PR in a subtree sync so that the style and formatter changes are as together as possible. Doesn't have to literally be the same day but close proximity would be ideal. AFAIK we're still blocked on the next sync (upstream issue in a dependency) so this will be on-hold for a short period.

The off-by-one bug fix is something we should capture on #5577 as well to ensure we're tracking and can announce it as part of 2024 style edition changes, but I'd also like to see how this changes other codebases on the diff check to get a feel for the impact.

Related comment albeit not salient to merging this PR.. the original off by one issue looks like another case where one part of the rustfmt codebase was doing something it probably shouldn't have had the responsibility for doing in the first place, and then another part of the codebase was trying to reuse it and needed some internal logic accounting for that extra step. There's too many cases where functions are doing things adding/removing spaces or other tokens that other functions are then trying to account for or peel back, so let's use this as a reminder to ourselves to try to avoid taking routes that appear more expedient in the moment. If it feels/smells wrong, then it probably is and some broader refactoring is likely warranted

@ytmimi
Copy link
Contributor

ytmimi commented May 1, 2024

AFAIK we're still blocked on the next sync (upstream issue in a dependency) so this will be on-hold for a short period.

@calebcartwright I think you might have missed this #6140 (comment), but we're not blocked on dependencies anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants