Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: unicode: add emoji properties #45264

Closed
gudvinr opened this issue Mar 27, 2021 · 34 comments
Closed

proposal: unicode: add emoji properties #45264

gudvinr opened this issue Mar 27, 2021 · 34 comments

Comments

@gudvinr
Copy link

gudvinr commented Mar 27, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.2 linux/amd64

What do you propose?

There are number of range tables in unicode package of stdlib which define some of character properties from Unicode Character Database.

Unicode also has additional sets of properties besides ones defined in core standard. These properties described in technical reports.
Notably, UTS#51 defines sets of properties to determine which unicode characters are emojis:

  • Emoji property - These characters are recommended for use as emoji
  • Extended_Pictographic property - These characters are pictographic
  • Emoji_Component property - These characters are used in emoji sequences
  • Emoji_Presentation property - A character that, by default, should appear with an emoji presentation
  • Emoji_Modifier - A character that can be used to modify the appearance of a preceding emoji
  • Emoji_Modifier_Base - A character whose appearance can be modified by a subsequent emoji modifier

Character property Regional_Indicator already present in unicode package.

Data source

At the time of writing, go1.16 contains range tables from Unicode 13.0.0.
Thus, properties for emoji data also should be taken from UCD emoji data 13.0.0.

Package changes

New RangeTable variables (order follows emoji-data.txt):

  • Emoji = _Emoji
  • Emoji_Component = _Emoji_Component
  • Emoji_Presentation = _Emoji_Presentation
  • Emoji_Modifier = _Emoji_Modifier
  • Emoji_Modifier_Base = _Emoji_Modifier_Base
  • Emoji_Component = _Emoji_Component
  • Extended_Pictographic = _Extended_Pictographic

Inclusion of functions for checking character properties like IsEmoji, IsEmojiModifier, IsEmojiModifierBase, etc doesn't make a lot of sense since there's already unicode.In function.
However, some kind of function in form of IsEmojiData that checks range tables for all emoji-related properties might be useful to e.g. filter out all emoji components from text.

To make these properties usable in regexp package, their names (or corresponding abbreviations) should be included into Categories or Scripts.

Additional notes

Although UTS#51 defines emoji sequences, this issue does not cover this topic since emoji sequence consists of multiple characters and unicode package doesn't have a concept of "character sequence".

Examples in other languages

@seankhliao seankhliao changed the title unicode: add emoji properties proposal: unicode: add emoji properties Mar 27, 2021
@gopherbot gopherbot added this to the Proposal milestone Mar 27, 2021
@seankhliao
Copy link
Member

cc @robpike

@ianlancetaylor
Copy link
Contributor

CC @mpvl

@robpike
Copy link
Contributor

robpike commented Mar 29, 2021

I suspect this would be better done in a separate package, probably not in the standard library, as I believe that the data set will grow substantial over time and most programs won't need it.

@Kimwing222
Copy link

#40724
Duplicate of @ @

@robpike
Copy link
Contributor

robpike commented Mar 29, 2021

That's not the right issue number for the duplicate, or else not the right issue for that duplicate number.

@smasher164
Copy link
Member

I agree that this should be done outside the stdlib. The Unicode technical reports (UAX, UAS, UTR) make it clear that they are independent specifications, and that conformance to the Unicode standard does not imply conformance to the technical report. It's also unclear if emoji is all we add, and not one of the other reports, like identifiers, script properties, etc.

I will say however, that the current API for constructing range tables is not very ergonomic. I ended up forgoing efficiency and used functions just so I had access to set union and set difference when combining range tables: https://github.com/smasher164/xid/blob/560c18f776900eb8c8b061d155309097f2f68545/xid.go.

@gudvinr
Copy link
Author

gudvinr commented Mar 29, 2021

smasher164: The Unicode technical reports (UAX, UAS, UTR) make it clear that they are independent specifications, and that conformance to the Unicode standard does not imply conformance to the technical report

This isn't exactly true. While UTS and UTR indeed aren't required to implement, Unicode Standard Annex, however, might be required by Standard. There's a list of such in Chapter 3 of Standard.

robpike: I suspect this would be better done in a separate package, probably not in the standard library
smasher164: I agree that this should be done outside the stdlib

As pointed out, UTS isn't part of unicode standard, so I think this is quite reasonable to not put emoji stuff in unicode.
While natural languges do not usually grow over time, emoji data will grow and forcing people to import data they don't want doesn't look good.

Although I feel like at least range tables (without emoji sequences and other trickery) should be somewhere close to unicode to keep those tables in sync with rest of unicode tables. There's couple of reasons to do so:

  • While UTS#51 isn't part of Unicode Standard, it mentions that number of Technical Standards (including UTS#51) "are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire".
    At the same time, x/text/unicode uses different version scheme and has a separate release schedule. Not to mention it isn't v1 yet.
    Right now it has packages with UnicodeVersion = "12.0.0" while unicode has Version = "13.0.0". I think at some point external package might just have a different unicode version and this is kind of confusing.
  • golang.org/x/text/unicode mostly contains tools to work with unicode (except maybe /runenames which contains, well, rune names) and I only mentioned data for range tables.
    Although I think this is the right place for emoji-related tools such as methods to work with emoji sequences if there's need to do so as they do not depend on any standard version.

@smasher164
Copy link
Member

I agree, I've often wanted unicode to be versioned separately from the standard library, especially when it came to keeping these properties in sync with newer versions of unicode. I ended up using build tags to hack around that:

unicodeTestVersion_114.go:

// +build go1.14,!go1.16

package xid

const unicodeTestVersion = "12.1.0"

unicodeTestVersion_116.go:

// +build go1.16

package xid

const unicodeTestVersion = "13.0.0"

@mpvl
Copy link
Contributor

mpvl commented Apr 1, 2021

A natural place to put these properties would be x/text/unicode/emoji.

This repo already has a runenames package, which could naturally hold emoji names as well (although I'm unsure how this fits with emoji sequences).

The argument for putting it in core so it can be used in regexp is valid. However, this would mean including more tables in core that are currently missing. A more reasonable solution to support emoji in regexp would be to allow user-defined character classes, allowing users to add classes from x/text, for instance.

It should also be mentioned what the goal is of these tables. Depending on the application, rangetables may not be the best representation. Judging from UTS #51, for instance, a UTF-8 trie, which allows associating a set of related properties with a single rune, seems more appropriate. The x/text repo has all the infrastructure in place to generate such tries conveniently.

@mpvl
Copy link
Contributor

mpvl commented Apr 1, 2021

@smasher164: the x/text repo uses a similar trick. The generators are multi-version aware and will automatically add to/modify build tags for generated tables.

It uses it, however, to ensure that the versions align with the latest Go. Your comment, however, suggest that you would want the other way around: have Go adopt a later version. This gives rise to the idea that core could use tables from x/text directly. Core already uses x/text for various packages and x/text already generates the tables for core. So if instead core would use the tables from x/text, x/text could advance the unicode version ahead of core, while ensuring consistency between packages.

This obviously would require a separate proposal. There are some serious implications for this. Also, there are packages with hardcoded range tables. But all this could be worked around.

@gudvinr
Copy link
Author

gudvinr commented Apr 1, 2021

I don't want to be "that guy" but which policies are applied for decisions to where include what? Earlier it was said that "conformance to the Unicode standard does not imply conformance to the technical report".
Does it mean that something that conforms to the Unicode Standard should be (or at least considered worthy) included in stdlib?

@mpvl I see that x/text/runenames already contains names for everything from UCD. That includes emojis. UCD essentially is UAX#41. From the document I mentioned earlier it is clear that UAX#41 "is considered part of Version 13.0 of the Unicode Standard" yet it not included in stdlib. Same for UAX#15 (normalization) and UAX#9 (bidirectional). But UAX#38 (unihan) is a part of stdlib unicode package.

While x/text/unicode contains number of annexes included in Standard, it uses different unicode version which makes it somewhat incompatible with stdlib.

Despite of UTS being "independent specification" by definition, Standard itself clearly mentions that only a few UTS synchronized with its version: UTS#10 (collation), UTS#39 (security), UTS#46 (idna) and UTS#51 (emoji).
It may be fine to be compatible for earlier version for tool packages but very frustrating for packages that contain databases. That includes emojis and runenames too.

For example, conformance to Unicode 13 means that it should contain "Khitan Small Script". And unicode indeed does. But not runenames. Also in documentation of runenames link to UCD points to "latest" version but tables are from 12.0.0.
I can't say that Khitan is popular language but that made me thinking that unicode-related packages now in a little bit messy state.

However, this would mean including more tables in core that are currently missing

Since other tables probably represented by other Technical Reports, addition of single one doesn't imply that other ones should be included too. These are still independent specs.

A more reasonable solution to support emoji in regexp would be to allow user-defined character classes, allowing users to add classes from x/text, for instance.

I agree that it should be more flexible way to do so. Although I don't think that I would be able to write a proper proposal for that.

It should also be mentioned what the goal is of these tables.

I can't say for others but I ended up in a situation where I need to be aware about emojis in text. First, to correctly remove such characters from text or replace them with non-graphical representations (e.g. using names from UCD). Second, to count number of characters when single emoji or emoji sequence represents single "character".
It seemed that range table and regexp support should be sufficient enough and already used by go library to represent language scripts/

@mpvl
Copy link
Contributor

mpvl commented Apr 1, 2021

While x/text/unicode contains number of annexes included in Standard, it uses different unicode version which makes it somewhat incompatible with stdlib.

The stdlib tables are generated from x/text. Core even depends on x/text and build tags in x/text ensure that the Unicode version of x/text is matched to that of core. So tip of x/text is ahead in Unicode version compared to core.

For example, conformance to Unicode 13 means that it should contain "Khitan Small Script". And unicode indeed does. But not runenames. Also in documentation of runenames link to UCD points to "latest" version but tables are from 12.0.0.

That seems like a bug in runenames' generate script if true. It should update automatically with a Unicode upgrade. @nigeltao.

@smasher164
Copy link
Member

smasher164 commented Apr 3, 2021

@mpvl

A more reasonable solution to support emoji in regexp would be to allow user-defined character classes, allowing users to add classes from x/text, for instance.

I could imagine an API like

func RegisterClass(name string, table *unicode.RangeTable)

in either regexp or regexp/syntax. Or if it needed to be scoped per *regexp.Regexp, an alternative constructor like

func WithClass(name string, table *unicode.RangeTable, expr string) (*Regexp, error)

Either way, this would be a separate proposal.

Your comment, however, suggest that you would want the other way around: have Go adopt a later version.

I could imagine the stdlib being behind the supported version in x/text. That way, for example, someone who wanted to use unicode 13 functionality on Go 1.15 could simply import x/text/unicode.


@gudvinr

Maybe the way forward here is to either define these properties in x/text, and file a proposal for regexp?

@mpvl
Copy link
Contributor

mpvl commented Apr 3, 2021

@gudvinr

At the same time, x/text/unicode uses different version scheme and has a separate release schedule.

Core Unicode tables are generated from x/text and core even imports x/text for various use cases, like normalization. Also, the x/text tables use build tags to keep these tables in sync.
It's a bug for core Unicode packages in x/text to not be updated to the right version.

Theoretically, core could refer to x/text for all its tables, which would allow getting rid of the build tag trick and would allow using newer Unicode versions independently from the Go version. That needs some serious thought and some adjustment to existing packages like strconv IIRC.

@mpvl
Copy link
Contributor

mpvl commented Apr 3, 2021

@smasher164

I could imagine an API like

Something like that. Passing a function with a signature func(rune) bool instead of a range table makes more sense to me, though. It doesn't always make sense to represent rune properties as a range table (for instance for bidi and, I suspect, emoji).

@gudvinr
Copy link
Author

gudvinr commented Apr 3, 2021

It's a bug for core Unicode packages in x/text to not be updated to the right version.

I figured out what's wrong. It is not a bug in x/text and not a package issue per se. I suppose build environment for pkgsite uses some older Go release and takes older table which has // +build go1.14,!go1.16. There's no indication of that on pkgsite and it pulls latest stable release for Go itself.
And since browsing local package cache isn't very convenient, I never tried to look there. But after you mentioned build tag trick I dug up commit history and that became clear.

Theoretically, core could refer to x/text for all its tables

I personally do not like the idea of pulling v0 packages for use in somewhat stable releases of Go.
However, is it possible to use emojis and their properties as experimental playground first, and based on the results of this experiment make changes to rest of the tables later?

Whether you plan on using range tables or not for these kind of characters, I suppose it is now decided to put them in separate package within x/text repository. This is a good thing in a sense that it makes possible to also add other emoji-related properties and functionalities described in UTS#51 in the future.

@gudvinr
Copy link
Author

gudvinr commented Apr 3, 2021

Maybe the way forward here is to either define these properties in x/text, and file a proposal for regexp?

I think that makes sense, yes. API for pluggable character classes for regexp is fine for me and probably covers other use cases too. It will be wise to fill separate proposal and discuss details of the implementations there.

@rsc
Copy link
Contributor

rsc commented Apr 7, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Apr 8, 2021

Even if we added these to unicode.Properties, regexp only does Categories and Scripts.
And these emoji properties are properties, not categories or scripts.

Do you need emoji things in regexp, or was that just brought up for completeness?

@gudvinr
Copy link
Author

gudvinr commented Apr 8, 2021

Do you need emoji things in regexp

I think I do not need regexp support. At least for me regexp isn't a top priority.
If emoji support eventually land to either unicode or x/text and will be at least as convenient to use as properties for regexp, then I can live with it.

But it's not a simple question, to be honest. In a short time span I had to solve multiple unrelated problems with emojis. I feel that some of them can be solved easier using some sort of property handles in regexp.

regexp only does Categories and Scripts

Is there any reason for that? I found that when I looked through regexp sources but it's not clear why \p{Dash}, \p{Hyphen} and such are ignored. If I'm not wrong, ICU library doesn't have such limitations, for example.
I do not imply that "if some other %thing% does then Go should too", though.

@rsc
Copy link
Contributor

rsc commented Apr 8, 2021

I don't remember why I left Property out. Possibly it just seemed like too much for too little benefit.
Category and Script are more clearly useful.

@smasher164
Copy link
Member

As an anecdote, the python regex package I used to test my identifier validating library does support properties. I suppose if the regexp package supports user-definable properties, it wouldn't have the burden of adding them all.

@rsc
Copy link
Contributor

rsc commented Apr 14, 2021

As long as regexp is not a requirement, then adding these to unicode.Properties probably makes sense.
The thing I don't know is what else is missing from unicode.Properties.
Can someone cross-check against the full Unicode property list and see what else is missing besides these emoji properties?

@gudvinr
Copy link
Author

gudvinr commented Apr 14, 2021

Can someone cross-check against the full Unicode property list and see what else is missing besides these emoji properties?

I took a look at UAX#44 and marked with + what's in Properties since it's not much:

General

Name
Name_Alias
Block
Age
General_Category
Script
Script_Extensions
+White_Space (binary)
Alphabetic (binary)
Hangul_Syllable_Type
+Noncharacter_Code_Point (binary)
Default_Ignorable_Code_Point (binary)
+Deprecated (binary)
+Logical_Order_Exception (binary)
+Variation_Selector (binary)


Case

Uppercase (binary)
Lowercase (binary)
Lowercase_Mapping
Titlecase_Mapping
Uppercase_Mapping
Case_Folding
Simple_Lowercase_Mapping
Simple_Titlecase_Mapping
Simple_Uppercase_Mapping
Simple_Case_Folding
+Soft_Dotted (binary)
Cased (binary)
Case_Ignorable (binary)
Changes_When_Lowercased (binary)
Changes_When_Uppercased (binary)
Changes_When_Titlecased (binary)
Changes_When_Casefolded (binary)
Changes_When_Casemapped (binary)


Emoji (all binary)

Emoji
Emoji_Presentation
Emoji_Modifier
Emoji_Modifier_Base
Emoji_Component
Extended_Pictographic


Numeric

Numeric_Value
Numeric_Type
+Hex_Digit (binary)
+ASCII_Hex_Digit (binary)


Normalization

Canonical_Combining_Class
Decomposition_Mapping (not recommended)
Composition_Exclusion (binary) (not recommended)
Full_Composition_Exclusion (binary) (not recommended)
Decomposition_Type
FC_NFKC_Closure (deprecated)
NFC_Quick_Check
NFKC_Quick_Check
NFD_Quick_Check
NFKD_Quick_Check
Expands_On_NFC (binary) (deprecated)
Expands_On_NFD (binary) (deprecated)
Expands_On_NFKC (binary) (deprecated)
Expands_On_NFKD (binary) (deprecated)
NFKC_Casefold
Changes_When_NFKC_Casefolded (binary)


Shaping and Rendering

+Join_Control (binary)
Joining_Group
Joining_Type
Vertical_Orientation
East_Asian_Width
+Prepended_Concatenation_Mark (binary)


Bidirectional

Bidi_Class
+Bidi_Control (binary)
Bidi_Mirrored (binary)
Bidi_Mirroring_Glyph
Bidi_Paired_Bracket
Bidi_Paired_Bracket_Type


Identifiers (all binary)

ID_Continue
ID_Start
XID_Continue
XID_Start
+Pattern_Syntax
+Pattern_White_Space


Segmentation

Line_Break
Grapheme_Cluster_Break
Sentence_Break
Word_Break


CJK

+Ideographic (binary)
+Unified_Ideograph (binary)
+Radical (binary)
+IDS_Binary_Operator (binary)
+IDS_Trinary_Operator (binary)
Unicode_Radical_Stroke
Equivalent_Unified_Ideograph


Miscellaneous

Math (binary)
+Quotation_Mark (binary)
+Dash (binary)
+Hyphen (binary) (deprecated, stabilized)
+Sentence_Terminal (binary)
+Terminal_Punctuation (binary)
+Diacritic (binary)
+Extender (binary)
Grapheme_Base (binary)
Grapheme_Extend (binary)
Grapheme_Link (binary) (deprecated)
Unicode_1_Name
ISO_Comment (deprecated, stabilized)
+Regional_Indicator (binary)
Indic_Positional_Category
Indic_Syllabic_Category


Contributory Properties (not recommended)

+Other_Alphabetic (binary)
+Other_Default_Ignorable_Code_Point (binary)
+Other_Grapheme_Extend (binary)
+Other_ID_Start (binary)
+Other_ID_Continue (binary)
+Other_Lowercase (binary)
+Other_Math (binary)
+Other_Uppercase (binary)
Jamo_Short_Name

@ZekeLu
Copy link
Contributor

ZekeLu commented Apr 15, 2021

This is a copy of @gudvinr 's answer above, with missing properties highlighted.

 General
 
 Name
 Name_Alias
 Block
 Age
 General_Category
 Script
 Script_Extensions
+White_Space
 Alphabetic
 Hangul_Syllable_Type
+Noncharacter_Code_Point
 Default_Ignorable_Code_Point
+Deprecated
+Logical_Order_Exception
+Variation_Selector
 
 
 Case
 
 Uppercase
 Lowercase
 Lowercase_Mapping
 Titlecase_Mapping
 Uppercase_Mapping
 Case_Folding
 Simple_Lowercase_Mapping
 Simple_Titlecase_Mapping
 Simple_Uppercase_Mapping
 Simple_Case_Folding
+Soft_Dotted
 Cased
 Case_Ignorable
 Changes_When_Lowercased
 Changes_When_Uppercased
 Changes_When_Titlecased
 Changes_When_Casefolded
 Changes_When_Casemapped
 
 
 Emoji
 
 Emoji
 Emoji_Presentation
 Emoji_Modifier
 Emoji_Modifier_Base
 Emoji_Component
 Extended_Pictographic
 
 
 Numeric
 
 Numeric_Value
 Numeric_Type
+Hex_Digit
+ASCII_Hex_Digit
 
 
 Normalization
 
 Canonical_Combining_Class
 Decomposition_Mapping (not recommended)
 Composition_Exclusion (not recommended)
 Full_Composition_Exclusion (not recommended)
 Decomposition_Type
 FC_NFKC_Closure (deprecated)
 NFC_Quick_Check
 NFKC_Quick_Check
 NFD_Quick_Check
 NFKD_Quick_Check
 Expands_On_NFC (deprecated)
 Expands_On_NFD (deprecated)
 Expands_On_NFKC (deprecated)
 Expands_On_NFKD (deprecated)
 NFKC_Casefold
 Changes_When_NFKC_Casefolded
 
 
 Shaping and Rendering
 
+Join_Control
 Joining_Group
 Joining_Type
 Vertical_Orientation
 East_Asian_Width
+Prepended_Concatenation_Mark
 
 
 Bidirectional
 
 Bidi_Class
+Bidi_Control
 Bidi_Mirrored
 Bidi_Mirroring_Glyph
 Bidi_Paired_Bracket
 Bidi_Paired_Bracket_Type
 
 
 Identifiers
 
 ID_Continue
 ID_Start
 XID_Continue
 XID_Start
+Pattern_Syntax
+Pattern_White_Space
 
 
 Segmentation
 
 Line_Break
 Grapheme_Cluster_Break
 Sentence_Break
 Word_Break
 
 
 CJK
 
+Ideographic
+Unified_Ideograph
+Radical
+IDS_Binary_Operator
+IDS_Trinary_Operator
 Unicode_Radical_Stroke
 Equivalent_Unified_Ideograph
 
 
 Miscellaneous
 
 Math
+Quotation_Mark
+Dash
+Hyphen (deprecated, stabilized)
+Sentence_Terminal
+Terminal_Punctuation
+Diacritic
+Extender
 Grapheme_Base
 Grapheme_Extend
 Grapheme_Link (deprecated)
 Unicode_1_Name
 ISO_Comment (deprecated, stabilized)
+Regional_Indicator
 Indic_Positional_Category
 Indic_Syllabic_Category
 
 
 Contributory Properties (not recommended)
 
+Other_Alphabetic
+Other_Default_Ignorable_Code_Point
+Other_Grapheme_Extend
+Other_ID_Start
+Other_ID_Continue
+Other_Lowercase
+Other_Math
+Other_Uppercase
 Jamo_Short_Name

@beoran
Copy link

beoran commented Apr 15, 2021

Just to chip in: the missing properties would be useful for Go GUI libraries, in particular for implementing bi-directional and complex script rendering. But x/text might be just as well a place to keep them as unicode/ for them.

@mpvl
Copy link
Contributor

mpvl commented Apr 15, 2021

To add my view: especially if there is not going to be regexp support for properties, it doesn't make sense to add these properties to the set of properties for package unicode.

Many of the "unsupported" properties as already supported in x/text, just not as RangeTables. Some of these tables, such Normalization and Bidi related tables, are even included in core. Adding these to Properties would just bloat the unicode package.

The reason why x/text didn't use RangeTables for many of these properties is because such properties are often not useful in isolation. This holds true for Case-, Normalization-, Bidi-, Grapheme-, Identifier-, and I suspect also Emoji-related properties. Folding these properties in a single per-rune/per-topic trie data structure, has proven to give significant performance benefits. The packages cases, norm, bidi, precis, and idna, for instance, all follow this pattern.

I could imagine that a selection of these properties would be useful for regexp, though.

@mpvl
Copy link
Contributor

mpvl commented Apr 17, 2021

Note, btw, that the list of unsupported properties includes non-boolean properties (such as EastAsianWidth, included inx/text/unicode/width). These are not conveniently represented as range tables.

@gudvinr
Copy link
Author

gudvinr commented Apr 18, 2021

Note, btw, that the list of unsupported properties includes non-boolean properties

Good point. Here's the list of only unsupported boolean properties:

Alphabetic
Default_Ignorable_Code_Point
Uppercase
Lowercase
Cased
Case_Ignorable
Changes_When_Lowercased
Changes_When_Uppercased
Changes_When_Titlecased
Changes_When_Casefolded 
Changes_When_Casemapped 
Emoji
Emoji_Presentation
Emoji_Modifier
Emoji_Modifier_Base
Emoji_Component
Extended_Pictographic
Changes_When_NFKC_Casefolded
Bidi_Mirrored
ID_Continue
ID_Start
XID_Continue
XID_Start
Math
Grapheme_Base
Grapheme_Extend

@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@gudvinr
Copy link
Author

gudvinr commented Apr 22, 2021

So, properties can't be added to properties list.
You can't match properties using character classes in regexp and can't add custom character classes either.

What is the recommended way to go then?

@beoran
Copy link

beoran commented Apr 22, 2021

Perhaps we could still include these properties in x/text? But I suppose that should be a new issue?

@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

@mpvl has some ideas about how to provide some info in x/text, but that would be a separate package.
It would probably still not hook up to regexp.

@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Apr 28, 2021
@golang golang locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests