Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert source file encoding to UTF-8 #865

Closed
wants to merge 1 commit into from

Conversation

Tietew
Copy link
Contributor

@Tietew Tietew commented Feb 14, 2020

Fixes simplecov-ruby/simplecov-html#91

Summary

  • Open the source file in encoding UTF-8 instead of ASCII-8BIT.
  • Check for magic comment and call IO#set_encoding to convert encoding to UTF-8.

Background

File.open(..., "rb") will set its encoding to ASCII-8BIT instead of script encoding (UTF-8).
This means all lines are ASCII-8BIT.

In simplecov-html, following line
https://github.com/colszowka/simplecov-html/blob/7540373ed44ccd43d1347b775a69673297cc8f90/views/source_file.erb#L48
will try to convert src from ASCII-8BIT to UTF-8!
Therefore, all non-ASCII characters will be replaced by Unicode Replacement Character.

@PragTob
Copy link
Collaborator

PragTob commented Feb 14, 2020

👋

Hi there, thanks for your fix! 💚

I'll have to mull the best approach over in my head a bit. I think this is pretty great and sophisticated but I think I need some time to think about and play with it.

Thanks!

@PragTob PragTob mentioned this pull request Feb 16, 2020
@PragTob
Copy link
Collaborator

PragTob commented Feb 16, 2020

Thanks a lot, I think this is the right approach! I created #866 to build upon this and maybe still add some more specs (as files can have many forms I'm worried about something crashing while we try to read it 😅 ).

Goal is to write the cuke this evening and package up a release for early next week or maybe even this evening depending on how long I stay out today ;)

@PragTob
Copy link
Collaborator

PragTob commented Feb 16, 2020

Closing in favor of #866

@PragTob PragTob closed this Feb 16, 2020
@Tietew Tietew deleted the convert-utf8 branch February 27, 2020 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: non-ASCII characters are broken
3 participants