-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes, strings: should have minimal dependency on unicode #54098
Comments
Perhaps we should just fix map initialization deadcode elimination issue... (We have thoughts, and there were some ongoing work.) |
Change https://go.dev/cl/419756 mentions this issue: |
Unless map deadcode elimination is going to happen in the near future, CL/419756 is perhaps a palatable solution for the interim period. It would actually be a simplification of the code base, since we could delete sub-copies of "unicode" functionality such as in "strconv" and in "fmt". Switching from "internal/unicode" back to "unicode" would be a trivial search and replace after #2559 is fixed. |
Change https://go.dev/cl/423376 mentions this issue: |
Another place were we could potentially avoid pulling in all of unicode tables would be "reflect". |
@cherrymui, I haven't seen any mentions about DCE for unused maps in the runtime/compiler meeting notes (#43930). Is it fair to say that this is not going to be something that's going to be worked on in the near future? If so, what are our thoughts on doing something like https://go.dev/cl/419756 where we move the essential parts of "unicode" to "internal/unicode"? I've noticed quite a bit of technical debt throughout the Go source tree because stdlib packages are avoiding "unicode" and so it resorts to duplicating Unicode functionality or re-implementing functionality since it can't depend on a package (e.g., "bytes" or "strings") that transitively depends on "unicode". In my opinion, the technical debt of splitting out part of "unicode" into "internal/unicode" is less than the technical debt of avoiding "unicode" all-together. |
It's not something we're working on, or planning on working on, at the moment. But see #2559, there are some external contributors that made some progress in this area, but it was never completed. |
I can give it a try this cycle, if I have a chance. I'd think it shouldn't be too hard to do, but I may be wrong. |
Change https://go.dev/cl/528799 mentions this issue: |
Importing "unicode" immediately bloats a binary by ~100k. This is unfortunately unavoidable since the
unicode.Categories
map contains a reference to every Unicode category in existence (see #7600 or #2559).We should make it such that only referencing
bytes
functions (e.g.,bytes.HasPrefix
) that do not depend onunicode
should not result inunicode
being linked into the binary.Here's a list of functions that depend on
unicode
:Fields
->unicode.IsSpace
ToUpper
->unicode.ToUpper
ToLower
->unicode.ToLower
ToTitle
->unicode.ToTitle
ToUpperSpecial
->unicode.SpecialCase
ToLowerSpecial
->unicode.SpecialCase
ToTitleSpecial
->unicode.SpecialCase
Title
->unicode.{ToTitle,IsLetter,IsDigit,IsSpace}
TrimSpace
->unicode.IsSpace
EqualFold
->unicode.SimpleFold
Of all of these, only
Fields
andTrimSpace
are used to any significant degree. Even still, the implementation ofunicode.IsSpace
is fairly small and references a relatively small table.Perhaps we should create a
internal/unicodetables
package that contains every table. Theunicode
package can depend onunicodetables
, and other stdlib packages can depend onunicodetables
directly.The text was updated successfully, but these errors were encountered: