Support CP949 (Windows-949) #10

puzzlet · 2013-01-24T08:55:02Z

CP949 is a superset of EUC-KR (Korean) with extra characters defined. Almost all webpages declared themselves as EUC-KR should be safely assumed to be in CP949, as they potentially are, since it has been the default locale of Korean version of MS Windows.

Here is the usage stats on the web, according to Google: http://googleblog.blogspot.kr/2010/01/unicode-nearing-50-of-web.html

We can support this by:

renaming EUCKR-related classes and constants to CP949
and patching the byte-sequence state machine in mbcssm.py

The frequency table should be the same, since the supplemented characters are the most infrequent.

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2013-01-24T14:11:59Z

@puzzlet nowhere in that Google blog post do I see anything about this. They specifically use EUC-KR as a label for their graph, not CP949.

This reverts commit 5590a5b.

This reverts commit 95bf484.

This reverts commit ceecb4a.

puzzlet · 2013-01-24T14:24:23Z

@sigmavirus24 Web programmers involved in Korean encodings might call EUC-KR/CP949 interchangeably. Only EUC-KR is recognized by some standards, while CP949 is the de-facto encoding in which most of the pages are written, even when they say they're encoded in EUC-KR.

sigmavirus24 · 2013-01-24T14:29:08Z

https://en.wikipedia.org/wiki/Code_page_949 it isn't a standard recognized by IANA. I don't see a reason to change the naming of something. Creating a set of docs and noting that a page encoded with CP949 will be detected as EUC-KR is fine with me.

puzzlet · 2013-01-24T14:38:53Z

I just said that CP949 has extra characters defined. There are tons of live webpages containing those, which in result refuse to be detected as EUC-KR. The test case at ceecb4a is one example.

$ charade tests/CP949/ricanet.com.xml
tests/CP949/ricanet.com.xml: ISO-8859-2 with confidence 0.2285323490602884

To successfully detect CP949, we need a new state machine to adopt the newly introduced byte sequences.

sigmavirus24 · 2013-01-24T17:24:23Z

In the interest of being entirely transparent, I'm not just going to move EUC-KR. I'm going to keep EUC-KR and add an entirely separate set of tools for CP949 since it is fairly prominent. I'm swamped right now, so if you want to submit a pull request that'd be great, otherwise I'll be hacking away at this slowly and methodically.

Support CP949, fixes #10

dalguji · 2015-01-11T12:29:52Z

Side-note: If anyone drop by here in the interest of encoding detection support to cp949, please note that it has been implemented to chardet/chardet on Dec 2013.

ghost assigned puzzlet Jan 24, 2013

puzzlet mentioned this issue Jan 24, 2013

Support GB18030 #11

Closed

puzzlet added a commit that referenced this issue Jan 24, 2013

add CP949 test case for #10

ceecb4a

puzzlet added a commit that referenced this issue Jan 24, 2013

move EUC-KR tests under CP949 (for #10)

95bf484

puzzlet closed this as completed in 5590a5b Jan 24, 2013

sigmavirus24 added a commit that referenced this issue Jan 24, 2013

Revert "replace EUC-KR with its superset CP949, closes #10"

8ec0444

This reverts commit 5590a5b.

sigmavirus24 added a commit that referenced this issue Jan 24, 2013

Revert "move EUC-KR tests under CP949 (for #10)"

5f93c58

This reverts commit 95bf484.

sigmavirus24 added a commit that referenced this issue Jan 24, 2013

Revert "add CP949 test case for #10"

ed9f750

This reverts commit ceecb4a.

sigmavirus24 mentioned this issue Jan 24, 2013

Create some documentation #12

Closed

puzzlet reopened this Jan 24, 2013

sigmavirus24 closed this as completed in ae6dd25 Jan 25, 2013

sigmavirus24 added a commit that referenced this issue Jan 25, 2013

Merge pull request #13 from puzzlet/charade/feature/cp949

af28182

Support CP949, fixes #10

khwon mentioned this issue Nov 9, 2014

support for cp949 gnachman/iTerm2#188

Closed

sv24-archive locked and limited conversation to collaborators Jan 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CP949 (Windows-949) #10

Support CP949 (Windows-949) #10

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

dalguji commented Jan 11, 2015

Support CP949 (Windows-949) #10

Support CP949 (Windows-949) #10

Comments

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

puzzlet commented Jan 24, 2013

sigmavirus24 commented Jan 24, 2013

dalguji commented Jan 11, 2015