Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
x/text: add grapheme cluster iteration #14820
Hi, I'm in the middle of implementing support for iterating over grapheme clusters in a project that I am working on and it seems like something that would be a good fit for the
package grapheme // Decode reads the first grapheme cluster out of s and return it. To get the length of the // grapheme simply take the len() of the return value. func Decode(s string) string
I didn't want to go through the whole proposal process until I get an idea of whether there might be interest for this. I hope this is the right forum for this, if not, I'd appreciate being pointed to the right place.
I have a segment package planned, that would provide an API for defining any kind of segmentation. The advantage of a single API for grapheme, word, line, sentence, etc. breaking and segmentation is that it promotes reuse of sometimes complicated code.
It may be a while before this is done. However, in the mean time, you can now already approximate Grapheme Cluster Iteration using "golang.org/x/text/unicode/norm".Iter. Normalization segments are not entirely the same. but it is sufficiently close for many applications..
Because the normalization package didn't do the trick in many cases, I went ahead and implemented grapheme cluster segmentation in the following package:
It passes the grapheme cluster break test cases so I'm fairly confident that it works as expected. But since it's a new project, I appreciate any bug reports.
I might add Word Boundaries and Sentence Boundaries, too, at some point. But for now, it's not my main focus.
I don't know if there's any interest in moving this to