Hi, I'm in the middle of implementing support for iterating over grapheme clusters in a project that I am working on and it seems like something that would be a good fit for the golang.org/x/text. I wanted to reach out and see how much interest there would be around this and whether I should work on making something that would fit into this project. I was thinking the interface could be somewhat like this (naming just a stand-in for now, not a big fan of the name decode) :
// Decode reads the first grapheme cluster out of s and return it. To get the length of the// grapheme simply take the len() of the return value.funcDecode(sstring) string
I didn't want to go through the whole proposal process until I get an idea of whether there might be interest for this. I hope this is the right forum for this, if not, I'd appreciate being pointed to the right place.
The text was updated successfully, but these errors were encountered:
I have a segment package planned, that would provide an API for defining any kind of segmentation. The advantage of a single API for grapheme, word, line, sentence, etc. breaking and segmentation is that it promotes reuse of sometimes complicated code.
It may be a while before this is done. However, in the mean time, you can now already approximate Grapheme Cluster Iteration using "golang.org/x/text/unicode/norm".Iter. Normalization segments are not entirely the same. but it is sufficiently close for many applications..
I might add Word Boundaries and Sentence Boundaries, too, at some point. But for now, it's not my main focus.
I don't know if there's any interest in moving this to x/text at some point. I'm open to that but I'd like to know the efforts and responsibilities that would come with that. Get in touch if you want to push this forward.
@mpvl I've been needing an implementation of this for a project recently and have been considering writing up a design document for it. However, it sounds like you've got a more general purpose API in mind already. Would you have the time to write that up and post it somewhere? If you aren't planning an implementation in the immediate future it's possible that I'll be writing one anyways, and I'd much rather write something that stands a chance of eventually being upstreamed. Thanks!