Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/traditionalchinese:wrong coding mapping #21910

Open
beikege opened this issue Sep 16, 2017 · 8 comments

Comments

@beikege
Copy link

commented Sep 16, 2017

What version of Go are you using (go version)?

1.9
go get -u golang.org/x/text/

What did you do?

package main

import (
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {
	str := ""
	b, err := traditionalchinese.Big5.NewEncoder().Bytes([]byte(str))
	if err != nil {
		log.Fatalln(err)
	}
	r, _ := utf8.DecodeRuneInString(str)
	fmt.Printf("unicode:0x%X big5:0x%X\n", r, b) //incorrect
}

What did you expect to see?

unicode:0x5305 big5:0xA55D

What did you see instead?

unicode:0x5305 big5:0xFABD

reference:
http://moztw.org/docs/big5/table/cp950-u2b.txt
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
https://encoding.spec.whatwg.org/index-big5.txt

@mpvl

@gopherbot gopherbot added this to the Unreleased milestone Sep 16, 2017

@forskning

This comment has been minimized.

Copy link

commented Sep 20, 2017

Does a grep of the following files clarify anything?

encoding/simplifiedchinese/tables.go

encoding/traditionalchinese/tables.go

@beikege

This comment has been minimized.

@forskning

This comment has been minimized.

Copy link

commented Sep 20, 2017

% grep A55D golang.org/x/text/encoding/traditionalchinese/tables.go
% grep B0A8 golang.org/x/text/encoding/traditionalchinese/tables.go
39340 - 11904: 0xB0A8,
%

Try substituting the character "馬".

@beikege

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

package main

import (
	"bytes"
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {

	src := []byte{165, 93} //big5 : 包

	// big5 to utf8
	b1, err := traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ := utf8.DecodeRune(b1)
	fmt.Printf("包 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err := traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// not equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

	fmt.Println("--------------------------")

	src = []byte{176, 168} //big5 : 馬

	// big5 to utf8
	b1, err = traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ = utf8.DecodeRune(b1)
	fmt.Printf("馬 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err = traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

}
包 unicode:0x5305 big5:0xA55D
[165 93] [250 189] false
--------------------------
馬 unicode:0x99AC big5:0xB0A8
[176 168] [176 168] true
@forskning

This comment has been minimized.

Copy link

commented Sep 22, 2017

@forskning

This comment has been minimized.

Copy link

commented Sep 22, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2017

CC @mpvl

@forskning

This comment has been minimized.

Copy link

commented Oct 4, 2017

包 bag
馬 horse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.