Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: improve CheckForHardcodedByName function #2078

Closed
spnethw opened this issue Mar 16, 2024 · 0 comments
Closed

Suggestion: improve CheckForHardcodedByName function #2078

spnethw opened this issue Mar 16, 2024 · 0 comments

Comments

@spnethw
Copy link
Contributor

spnethw commented Mar 16, 2024

В настоящее время в функции CheckForHardcodedByName не все сравниваемые строковые константы с именами кодировок совпадают с теми, что на выхлопе у uchardet.

Предлагаю проапдейтить функцию примерно следующим образом:
#include <cstring>
...

static int CheckForHardcodedByName(const char *cs)
{
    struct cmp_str
    {
        bool operator()(char const *a, char const *b) const
        {
            return std::strcmp(a, b) < 0;
        }
    };

    std::map<const char*, int, cmp_str> encodings
        {
            {"UTF-16",CP_UTF16LE},
            {"UTF-32",CP_UTF32LE},
            {"UTF-8",CP_UTF8},
            {"ISO-8859-1",28591},          // Latin 1; Western European
            {"ISO-8859-2",28592},          // Latin 2; Central European
            {"ISO-8859-3",28593},          // Latin 3; South European
            {"ISO-8859-4",28594},          // Latin 4; Baltic
            {"ISO-8859-5",28595},          // Cyrillic
            {"ISO-8859-6",28596},          // Arabic
            {"ISO-8859-7",28597},          // Greek
            {"ISO-8859-8",28598},          // Hebrew
            {"ISO-8859-9",28599},          // Latin-5; Turkish
            {"ISO-8859-10",28600},         // Latin-6; Nordic
            {"ISO-8859-11",28601},         // Thai
            {"ISO-8859-13",28603},         // Latin-7; Baltic Rim (Estonian)
            {"ISO-8859-15",28605},         // Latin-9; Western European
            {"ISO-8859-16",28606},         // Latin-10; South-Eastern European
            {"TIS-620",28601},             // Thai
            {"MAC-CYRILLIC",10007},        // Cyrillic (Mac)
            {"MAC-CENTRALEUROPE",10029},   // Mac OS Central European
            {"KOI8-R",20866},              // Cyrillic
            {"EUC-JP",20932},              // Japanese
            {"ISO-2022-JP",50220},         // Japanese
            {"Johab",1361},                // Korean
            {"SHIFT_JIS",932},             // Japanese
            {"EUC-KR",51949},              // Korean
            {"UHC",949},                   // Korean
            {"ISO-2022-KR",50225},         // Korean
            {"BIG5",950},                  // Traditional Chinese
            {"GB18030",54936}              // Chinese Simplified
        };

        // the rest:
        // ASCII, EUC-TW, GEORGIAN-ACADEMY, GEORGIAN-PS, HZ-GB-2312, ISO-2022-CN, VISCII

    auto r= encodings.find(cs);
    return r==encodings.end() ? -1 : r->second;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant