We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The result of seg.Segment and seg.ModeSegment are the same, is there any bug?
seg.Segment
seg.ModeSegment
I thought the result of ModeSegment should like seg.CutSearch.
ModeSegment
seg.CutSearch
test code:
package main import ( "fmt" "github.com/go-ego/gse" ) var ( seg gse.Segmenter text = "《复仇者联盟3:无限战争》是全片使用IMAX摄影机拍摄制作的的科幻片." ) func main() { seg.LoadDict() addToken() cut() } func addToken() { seg.AddToken("《复仇者联盟3:无限战争》", 100, "n") } // 使用 DAG 或 HMM 模式分词 func cut() { // "《复仇者联盟3:无限战争》是全片使用IMAX摄影机拍摄制作的的科幻片." // use DAG and HMM hmm := seg.Cut(text, true) fmt.Println("cut use hmm: ", hmm) // cut use hmm: [《复仇者联盟3:无限战争》 是 全片 使用 imax 摄影机 拍摄 制作 的 的 科幻片 .] cut := seg.Cut(text) fmt.Println("cut: ", cut) // cut: [《 复仇者 联盟 3 : 无限 战争 》 是 全片 使用 imax 摄影机 拍摄 制作 的 的 科幻片 .] hmm = seg.CutSearch(text, true) fmt.Println("cut search use hmm: ", hmm) //cut search use hmm: [复仇 仇者 联盟 无限 战争 复仇者 《复仇者联盟3:无限战争》 是 全片 使用 imax 摄影 摄影机 拍摄 制作 的 的 科幻 科幻片 .] fmt.Println("analyze: ", seg.Analyze(hmm, text)) cut = seg.CutSearch(text) fmt.Println("cut search: ", cut) // cut search: [《 复仇 者 复仇者 联盟 3 : 无限 战争 》 是 全片 使用 imax 摄影 机 摄影机 拍摄 制作 的 的 科幻 片 科幻片 .] segment1 := seg.Segment([]byte(text)) for i, token := range segment1 { fmt.Println(i, token.Token().Text()) } segment2 := seg.ModeSegment([]byte(text), true) for i, token := range segment2 { fmt.Println(i, token.Token().Text()) } }
The text was updated successfully, but these errors were encountered:
Because the seg.Segment() just get the word shortest path token, you should use seg.Cut() or seg.ToSlice(token).
seg.Segment()
seg.Cut()
seg.ToSlice(token)
Sorry, something went wrong.
ok, i will try other methods, what designed for of ModeSegment ?
Just used by short path algorithm text segmentation, some text search mode have more token.
No branches or pull requests
The result of
seg.Segment
andseg.ModeSegment
are the same, is there any bug?I thought the result of
ModeSegment
should likeseg.CutSearch
.test code:
The text was updated successfully, but these errors were encountered: