-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Importing "unicode" immediately bloats a binary by ~100k. This is unfortunately unavoidable since the unicode.Categories map contains a reference to every Unicode category in existence (see #7600 or #2559).
We should make it such that only referencing bytes functions (e.g., bytes.HasPrefix) that do not depend on unicode should not result in unicode being linked into the binary.
Here's a list of functions that depend on unicode:
Fields->unicode.IsSpaceToUpper->unicode.ToUpperToLower->unicode.ToLowerToTitle->unicode.ToTitleToUpperSpecial->unicode.SpecialCaseToLowerSpecial->unicode.SpecialCaseToTitleSpecial->unicode.SpecialCaseTitle->unicode.{ToTitle,IsLetter,IsDigit,IsSpace}TrimSpace->unicode.IsSpaceEqualFold->unicode.SimpleFold
Of all of these, only Fields and TrimSpace are used to any significant degree. Even still, the implementation of unicode.IsSpace is fairly small and references a relatively small table.
Perhaps we should create a internal/unicodetables package that contains every table. The unicode package can depend on unicodetables, and other stdlib packages can depend on unicodetables directly.