Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve encoding on truncate_bytes #51313

Merged
merged 1 commit into from Mar 12, 2024

Conversation

rosa
Copy link
Member

@rosa rosa commented Mar 12, 2024

Motivation / Background

Under some circumstances, String#truncate_bytes can return a string with a different encoding than the one being truncated. This is because String.new with no arguments returns the empty string with ASCII-8BIT encoding. Then, depending on each grapheme cluster of the string and on the omission string, the resulting string might keep the ASCII-8BIT encoding. With this change, we preserve the encoding of the original string instead.

Note that String.new accepts an encoding keyword argument, so we could do something like:

String.new(encoding: Encoding::UTF_8)

However, instead of using that, we rely on force_encoding to set the original encoding. This is so that String subclasses don't need to preserve this keyword argument. For example, SafeBuffer doesn't. Thanks to @jeremy for catching this!

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

String.new with no arguments returns the empty string with ASCII-8BIT
encoding. Then, depending on each grapheme cluster of the string and
on the omission string, the resulting string might keep the ASCII-8BIT
encoding. With this change, we preserve the encoding of the original
string instead.

Note that String.new accepts an `encoding` keyword argument, like
```
String.new(encoding: Encoding::UTF_8)
```
However, instead of using that, we rely on `force_encoding` to set the
original encoding. This is so that String subclasses don't need to
preserve this keyword argument. For example, SafeBuffer doesn't.
Thanks to @jeremy for catching this!
@rosa rosa force-pushed the preserve-encoding-truncate-bytes branch from 74d898b to 1095415 Compare March 12, 2024 20:42
@jeremy jeremy added this to the 7.2.0 milestone Mar 12, 2024
@jeremy jeremy merged commit 71a74ad into rails:main Mar 12, 2024
4 checks passed
@rosa rosa deleted the preserve-encoding-truncate-bytes branch March 13, 2024 10:21
viralpraxis pushed a commit to viralpraxis/rails that referenced this pull request Mar 24, 2024
String.new with no arguments returns the empty string with ASCII-8BIT
encoding. Then, depending on each grapheme cluster of the string and
on the omission string, the resulting string might keep the ASCII-8BIT
encoding. With this change, we preserve the encoding of the original
string instead.

Note that String.new accepts an `encoding` keyword argument, like
```
String.new(encoding: Encoding::UTF_8)
```
However, instead of using that, we rely on `force_encoding` to set the
original encoding. This is so that String subclasses don't need to
preserve this keyword argument. For example, SafeBuffer doesn't.
Thanks to @jeremy for catching this!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants