Is there any bug of seg.ModeSegment? #143

hengfeiyang · 2022-02-28T04:47:44Z

The result of seg.Segment and seg.ModeSegment are the same, is there any bug?

I thought the result of ModeSegment should like seg.CutSearch.

test code:

package main

import (
	"fmt"

	"github.com/go-ego/gse"
)

var (
	seg  gse.Segmenter
	text = "《复仇者联盟3：无限战争》是全片使用IMAX摄影机拍摄制作的的科幻片."
)

func main() {
	seg.LoadDict()
	addToken()
	cut()
}

func addToken() {
	seg.AddToken("《复仇者联盟3：无限战争》", 100, "n")
}

// 使用 DAG 或 HMM 模式分词
func cut() {
	// "《复仇者联盟3：无限战争》是全片使用IMAX摄影机拍摄制作的的科幻片."

	// use DAG and HMM
	hmm := seg.Cut(text, true)
	fmt.Println("cut use hmm: ", hmm)
	// cut use hmm:  [《复仇者联盟3：无限战争》 是 全片 使用 imax 摄影机 拍摄 制作 的 的 科幻片 .]

	cut := seg.Cut(text)
	fmt.Println("cut: ", cut)
	// cut:  [《 复仇者 联盟 3 ： 无限 战争 》 是 全片 使用 imax 摄影机 拍摄 制作 的 的 科幻片 .]

	hmm = seg.CutSearch(text, true)
	fmt.Println("cut search use hmm: ", hmm)
	//cut search use hmm:  [复仇 仇者 联盟 无限 战争 复仇者 《复仇者联盟3：无限战争》 是 全片 使用 imax 摄影 摄影机 拍摄 制作 的 的 科幻 科幻片 .]
	fmt.Println("analyze: ", seg.Analyze(hmm, text))

	cut = seg.CutSearch(text)
	fmt.Println("cut search: ", cut)
	// cut search:  [《 复仇 者 复仇者 联盟 3 ： 无限 战争 》 是 全片 使用 imax 摄影 机 摄影机 拍摄 制作 的 的 科幻 片 科幻片 .]

	segment1 := seg.Segment([]byte(text))
	for i, token := range segment1 {
		fmt.Println(i, token.Token().Text())
	}
	segment2 := seg.ModeSegment([]byte(text), true)
	for i, token := range segment2 {
		fmt.Println(i, token.Token().Text())
	}
}

The text was updated successfully, but these errors were encountered:

vcaesar · 2022-03-04T00:37:23Z

Because the seg.Segment() just get the word shortest path token, you should use seg.Cut() or seg.ToSlice(token).

hengfeiyang · 2022-03-04T00:40:25Z

ok, i will try other methods, what designed for of ModeSegment ?

vcaesar · 2022-03-04T00:57:39Z

Just used by short path algorithm text segmentation, some text search mode have more token.

hengfeiyang changed the title ~~is there any bug of seg.ModeSegment?~~ Is there any bug of seg.ModeSegment? Feb 28, 2022

vcaesar added the question label Mar 4, 2022

hengfeiyang closed this as completed Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any bug of seg.ModeSegment? #143

Is there any bug of seg.ModeSegment? #143

hengfeiyang commented Feb 28, 2022 •

edited

Loading

vcaesar commented Mar 4, 2022 •

edited

Loading

hengfeiyang commented Mar 4, 2022

vcaesar commented Mar 4, 2022 •

edited

Loading

Is there any bug of seg.ModeSegment? #143

Is there any bug of seg.ModeSegment? #143

Comments

hengfeiyang commented Feb 28, 2022 • edited Loading

vcaesar commented Mar 4, 2022 • edited Loading

hengfeiyang commented Mar 4, 2022

vcaesar commented Mar 4, 2022 • edited Loading

hengfeiyang commented Feb 28, 2022 •

edited

Loading

vcaesar commented Mar 4, 2022 •

edited

Loading

vcaesar commented Mar 4, 2022 •

edited

Loading