Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mruby-io #4712

Merged
merged 4 commits into from Sep 15, 2019
Merged

Fix mruby-io #4712

merged 4 commits into from Sep 15, 2019

Conversation

dearblue
Copy link
Contributor

"Fix broken UTF-8 characters by IO#getc" (992ba47) increases the object code size without MRB_UTF8_STRING too, so I would like to improve if there is a better solution (but no idea).

`IO#readline` and `IO#readchar` process in character units.
Character (multi-byte UTF-8) is destroyed when character spanning
`IO::BUF_SIZE` (4096 bytes) exist.

- Prepare file:

  ```ruby
  File.open("sample", "wb") { |f| f << "●" * 1370 }
  ```

- Before patched:

  ```ruby
  File.open("sample") { |f| a = []; while ch = f.getc; a << ch; end; p a }
  # => ["●", "●", ..., "●", "\xe2", "\x97", "\x8f", "●", "●", "●", "●"]

- After patched:

  ```ruby
  File.open("sample") { |f| a = []; while ch = f.getc; a << ch; end; p a }
  # => ["●", "●", ..., "●", "●", "●", "●", "●", "●"]
@matz matz merged commit 992ba47 into mruby:master Sep 15, 2019
matz added a commit that referenced this pull request Sep 15, 2019
@dearblue dearblue deleted the mruby-io branch September 16, 2019 10:02
matz added a commit that referenced this pull request Apr 28, 2020
The bug was introduced by #4712. The `getc' problem resurrected.
It should be addressed soon.
matz added a commit that referenced this pull request Apr 28, 2020
- mrb_utf8len() - returns the size of a UTF-8 char (in bytes)
- mrb_utf8_strlen() - returns the length of a UTF-8 string (in char)
matz added a commit that referenced this pull request Apr 28, 2020
This fix only effective when `MRB_UTF8_STRING` is set.
mimaki pushed a commit to mruby-Forum/mruby that referenced this pull request May 7, 2020
The bug was introduced by mruby#4712. The `getc' problem resurrected.
It should be addressed soon.
mimaki pushed a commit to mruby-Forum/mruby that referenced this pull request May 7, 2020
- mrb_utf8len() - returns the size of a UTF-8 char (in bytes)
- mrb_utf8_strlen() - returns the length of a UTF-8 string (in char)
mimaki pushed a commit to mruby-Forum/mruby that referenced this pull request May 7, 2020
This fix only effective when `MRB_UTF8_STRING` is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants