Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

操作包含中文的字符串 #11

Open
kevinyan815 opened this issue Oct 22, 2019 · 0 comments
Open

操作包含中文的字符串 #11

kevinyan815 opened this issue Oct 22, 2019 · 0 comments

Comments

@kevinyan815
Copy link
Owner

kevinyan815 commented Oct 22, 2019

在 Golang 中,如果字符串中出现中文字符不能直接调用 len 函数来统计字符串字符长度,这是因为在 Go 中,字符串是以 UTF-8 为格式进行存储的,在字符串上调用 len 函数,取得的是字符串包含的 byte 的个数。

正确的做法是将字符串装换为[]rune,统计[]rune切片的长度。同样截取包含字符串也是一样,先将其转为[]rune,再截取后,转为string。

// 截取姓名的前四位
package main

import (
	"fmt"
)

// 截取姓名的前四位
func NormalizeRealName(name string) (realName string) {
	realNameRune := []rune(name)
	if len(realNameRune) <= 4 {
		realName = name
		return
	}

	realName = string(realNameRune[0:4])
	return
}

func main() {
	name := "欧阳正熊戊辰"
	
	fmt.Println(NormalizeRealName(name))
}

针对统计字符串长度我们可以使用内置库unicode/utf8utf8.RuneCountInString(s)方法,更方便。

rune 代表的 unicode 码点是固定的 4 个字节长度(等同于 int32),但是很多时候常用字符的Unicode码点只用2-3个字节,这就造成了很多空间浪费所以才用的 UTF-8这种变长字符编码。

Playground URL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant