Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/traditionalchinese:wrong coding mapping #21910

Open
beikege opened this issue Sep 16, 2017 · 8 comments
Open

x/text/encoding/traditionalchinese:wrong coding mapping #21910

beikege opened this issue Sep 16, 2017 · 8 comments
Milestone

Comments

@beikege
Copy link

@beikege beikege commented Sep 16, 2017

What version of Go are you using (go version)?

1.9
go get -u golang.org/x/text/

What did you do?

package main

import (
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {
	str := "包"
	b, err := traditionalchinese.Big5.NewEncoder().Bytes([]byte(str))
	if err != nil {
		log.Fatalln(err)
	}
	r, _ := utf8.DecodeRuneInString(str)
	fmt.Printf("unicode:0x%X big5:0x%X\n", r, b) //incorrect
}

What did you expect to see?

unicode:0x5305 big5:0xA55D

What did you see instead?

unicode:0x5305 big5:0xFABD

reference:
http://moztw.org/docs/big5/table/cp950-u2b.txt
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
https://encoding.spec.whatwg.org/index-big5.txt

@mpvl

@gopherbot gopherbot added this to the Unreleased milestone Sep 16, 2017
@ghost
Copy link

@ghost ghost commented Sep 20, 2017

Does a grep of the following files clarify anything?

encoding/simplifiedchinese/tables.go

encoding/traditionalchinese/tables.go

@ghost
Copy link

@ghost ghost commented Sep 20, 2017

% grep A55D golang.org/x/text/encoding/traditionalchinese/tables.go
% grep B0A8 golang.org/x/text/encoding/traditionalchinese/tables.go
39340 - 11904: 0xB0A8,
%

Try substituting the character "馬".

@beikege
Copy link
Author

@beikege beikege commented Sep 20, 2017

package main

import (
	"bytes"
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {

	src := []byte{165, 93} //big5 : 包

	// big5 to utf8
	b1, err := traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ := utf8.DecodeRune(b1)
	fmt.Printf("包 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err := traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// not equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

	fmt.Println("--------------------------")

	src = []byte{176, 168} //big5 : 馬

	// big5 to utf8
	b1, err = traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ = utf8.DecodeRune(b1)
	fmt.Printf("馬 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err = traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

}
包 unicode:0x5305 big5:0xA55D
[165 93] [250 189] false
--------------------------
馬 unicode:0x99AC big5:0xB0A8
[176 168] [176 168] true
@ghost
Copy link

@ghost ghost commented Sep 22, 2017

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 22, 2017

CC @mpvl

@ghost
Copy link

@ghost ghost commented Oct 4, 2017

包 bag
馬 horse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.