Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding.name() vs. WHATWG encoding name #13

Closed
SimonSapin opened this issue Aug 20, 2013 · 4 comments
Closed

Encoding.name() vs. WHATWG encoding name #13

SimonSapin opened this issue Aug 20, 2013 · 4 comments

Comments

@SimonSapin
Copy link
Collaborator

whatwg::encoding_from_label returns a tuple of an Encoding object and the encoding name as a string, while the object also has a .name() that returns the same as a string. This seems redundant.

I would like to remove the former and only keep the latter, which should use names from the spec. The requires changes are:

  1. Rename shift-jis to shift_jis.
  2. Add iso-8859-8-i, identical to iso-8859-8 but with a different name.
  3. Rename windows-949 to euc-kr

1 and 2 are harmless, but 3 seems to have been deliberate. Is there a difference between windows-949 and euc-kr, or a reason to prefer the first name?

@lifthrasiir
Copy link
Owner

Encoding is not intended to be a direct interface to the WHATWG-compatible encoding. That's why Text{En,De}coder has a separate encoding method. Maybe we need whatwg_name in the future, when #4 is complete.

Regarding your examples, 1 is my mistake (shift_jis is correct, it is a rare case of using underscores in the standardized encoding name), 2 is implemented separately in whatwg module, and 3 is intentional and indeed it is a different encoding than EUC-KR. In fact, Shift_JIS should really have been windows-932 or windows-31j (JIS X 0208 alone is not enough to implement it), but I'm still not decided on which standard should name be based as no single standard covers all major encodings. For example, Windows code page 949 is missing in IANA Character Sets registration and that's why many browsers use x-windows-949 instead of a typical windows-949. I'm even worried about encodings with the asymmetric encoder and decoder, which seems very common in WHATWG encodings for the compatibility purpose.

@SimonSapin
Copy link
Collaborator Author

Other than "get an encoding form a label" and its quirks such as mapping latin1 to windows1252, I don’t think that "WHATWG-compatible" should be a different API for the actual decoding/encoding. Or is there a reason I’m missing that would prevent the APIs to be unified?

@lifthrasiir
Copy link
Owner

@SimonSapin It is not a matter of the actual decoding/encoding, and these APIs won't be changed whatsoever. It is rather a matter of mapping the string label to the actual decoder/encoder, and in this respect WHATWG only considers Web browsers. So I think it is better to rename name to make sure that the name is specific to WHATWG's quirks.

@lifthrasiir
Copy link
Owner

ae41ef4 and 97b4005 directly fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants