Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/ianaindex: add ASCII, document Index.Encoding #10

Open
wants to merge 2 commits into
base: master
from

Conversation

@emersion
Copy link

emersion commented Dec 19, 2019

Index.Encoding returns a nil Encoding in case the charset is valid but
unsupported by the library. Document this behavior.

Because of this, US-ASCII is seen as unsupported.
Register it as a regular encoding. The decoder replaces non-ASCII bytes
with the unicode replacement character. The encoder returns a
RepertoireError when a non-ASCII rune is encountered.

Fixes golang/go#19421

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 19, 2019

This PR (HEAD: d9070f1) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/text/+/212077 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 19, 2019

Message from Daniel Martí:

Patch Set 1:

(2 comments)

I also assume we want an added test or test case, unless there's a magic test that already covers the entire global "encodings" table.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

Index.Encoding returns a nil Encoding in case the charset is valid but
unsupported by the library.

Fixes golang/go#19421

Change-Id: I4c24ba2114a5012be88488e63aa6e57df955eb96
@emersion emersion force-pushed the emersion:ianaindex-fixes branch from d9070f1 to 8b5d375 Dec 19, 2019
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 19, 2019

This PR (HEAD: 8b5d375) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/text/+/212077 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 19, 2019

Message from Simon Ser:

Patch Set 2:

(1 comment)

Thanks for the review!


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@emersion emersion force-pushed the emersion:ianaindex-fixes branch from 8b5d375 to c8cfe3a Dec 20, 2019
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

This PR (HEAD: c8cfe3a) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/text/+/212077 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

Message from Simon Ser:

Patch Set 2:

I updated this change to use a proper ASCII encoding instead of encoding.Nop, per 1.

I wasn't sure where to add the new encoding, so I just put it unexported in ianaindex. We can't put it in the encoding package, because the ASCII encoding depends on encoding/internal. So we can either leave it here or put it in a separate encoding/ascii package.

Gerrit has merge the two commits in a single one. See the GitHub pull request for the clean split. Let me know if you want me to open two CLs, or if you want me to expand the description on Gerrit.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

Index.Encoding returns a nil Encoding when the charset is unsupported.
Because of this, US-ASCII is seen as unsupported.

Register it as a regular encoding. The decoder replaces non-ASCII bytes
with the unicode replacement character. The encoder replaces non-ASCII
runes with the ASCII substitution character.

Change-Id: Ie9ff316d7f9c744441bf08722e720b99217939a5
@emersion emersion force-pushed the emersion:ianaindex-fixes branch from c8cfe3a to 5534c55 Dec 20, 2019
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

This PR (HEAD: 5534c55) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/text/+/212077 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

Message from Simon Ser:

Patch Set 4:

Updated again to make the ASCII charset return an internal.RepertoireError as expected.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

Message from Daniel Martí:

Patch Set 4: Run-TryBot+1

(1 comment)

It's fine that the bot is squashing your commits. In gerrit, each CL is exactly one commit.

You could do a chain of CLs for multiple commits if you really wanted, but I think one commit here is okay.

Kicking the tests too.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

Message from Gobot Gobot:

Patch Set 4:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=5561cfac


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 20, 2019

Message from Gobot Gobot:

Patch Set 4: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 16, 2020

Message from Simon Ser:

Patch Set 5:

CL message fixed, sorry about that.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 16, 2020

Message from Daniel Martí:

Patch Set 5: Run-TryBot+1 Code-Review+1

LGTM. I'm not familiar with this package, so I'll wait another week or two to see if anyone gives a +2. If not, I'll just give it myself, given that the change doesn't seem risky.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 16, 2020

Message from Gobot Gobot:

Patch Set 5:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=9e0dcfd9


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 16, 2020

Message from Gobot Gobot:

Patch Set 5: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 17, 2020

Message from Marcel van Lohuizen:

Patch Set 5: Code-Review+2

For completeness, I looked at whether WhatWG (implemented in htmlindex) supports ASCII. It turns out they support an "ascii" identifier, but subsequently map it to Windows1252 (Latin1), a superset of ASCII. This is not appropriate for the ianaindex; the implementation provided in this CL is the proper one.

One question that remains is whether ASCII should be included as an exported encoding. I believe it has come up before. I would be okay to keep the current implementation, move it to charmap and export it as ASCII. Alternatively there can be a TODO with this suggestion. The disadvantage is that it may be used when people are better off with Latin1. But a comment could be to suggest use Latin1 for common cases.

Regarding the error reporting, the proper thing to do would probably have been to return a sentinel error for Unsupported encodings. This would be a backwards incompatible change, but here is a suggestion:

htmlindex defines a function called Get which always returns non-nil for any known encoding supported by this index (as it is fully defined). One could add a Get method to ianaindex, similar to Encoding, but returning an Unsupported error if non-existing and deprecate Encoding. This would cast two birds with one stone: improve the API and harmonize the API between htmlindex and ianaindex. Just a thought.


Please don’t reply on this GitHub thread. Visit golang.org/cl/212077.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.