Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text: add grapheme cluster iteration #14820

Open
cgilling opened this issue Mar 14, 2016 · 3 comments

Comments

@cgilling
Copy link

commented Mar 14, 2016

Hi, I'm in the middle of implementing support for iterating over grapheme clusters in a project that I am working on and it seems like something that would be a good fit for the golang.org/x/text. I wanted to reach out and see how much interest there would be around this and whether I should work on making something that would fit into this project. I was thinking the interface could be somewhat like this (naming just a stand-in for now, not a big fan of the name decode) :

package grapheme

// Decode reads the first grapheme cluster out of s and return it. To get the length of the
// grapheme simply take the len() of the return value.
func Decode(s string) string

I didn't want to go through the whole proposal process until I get an idea of whether there might be interest for this. I hope this is the right forum for this, if not, I'd appreciate being pointed to the right place.

Thanks

@bradfitz bradfitz changed the title adding grapheme cluster iteration to golang.org/x/text x/text: add grapheme cluster iteration Apr 9, 2016
@bradfitz bradfitz added this to the Unreleased milestone Apr 9, 2016
@mpvl

This comment has been minimized.

Copy link
Member

commented Apr 10, 2016

I have a segment package planned, that would provide an API for defining any kind of segmentation. The advantage of a single API for grapheme, word, line, sentence, etc. breaking and segmentation is that it promotes reuse of sometimes complicated code.

It may be a while before this is done. However, in the mean time, you can now already approximate Grapheme Cluster Iteration using "golang.org/x/text/unicode/norm".Iter. Normalization segments are not entirely the same. but it is sufficiently close for many applications..

@SamWhited

This comment has been minimized.

Copy link
Member

commented Oct 4, 2016

See also #17256

@rivo

This comment has been minimized.

Copy link

commented Mar 13, 2019

Because the normalization package didn't do the trick in many cases, I went ahead and implemented grapheme cluster segmentation in the following package:

https://github.com/rivo/uniseg

It passes the grapheme cluster break test cases so I'm fairly confident that it works as expected. But since it's a new project, I appreciate any bug reports.

I might add Word Boundaries and Sentence Boundaries, too, at some point. But for now, it's not my main focus.

I don't know if there's any interest in moving this to x/text at some point. I'm open to that but I'd like to know the efforts and responsibilities that would come with that. Get in touch if you want to push this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.