Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMMON: Encoding conversion #1809

Merged
merged 37 commits into from Aug 24, 2019

Conversation

@vyzigold
Copy link
Contributor

commented Aug 16, 2019

This pull request adds a class Common::Encoding, that allows conversion between different character encodings and Cyrillic to ASCII transliteration. To support as many encodings as possible, it also adds an option to compile ScummVM with iconv.

In the Common::Encoding class I tried to use everything, that is available to convert the encoding. So the class does the following:

  1. Try to use iconv (if iconv is available).
  2. Try to use backend specific conversion.
  3. Try to use TransMan mapping to either convert to the desired encoding, or to convert to UTF-32 and then try iconv and backend conversion again.

2 commits (94ef429, 3929c8c) are cherry-picked from the TTS pull request (#1808), because I needed to implement encoding conversion there too.

vyzigold added 27 commits Jul 25, 2019
WIN32: Implement conversion to and from UTF-32
UTF-32 is used in transliteration in Common::Encoding, so it is
pretty important encoding and Windows should be the only thing,
that cannot convert it.
TEST: Remove tests for ascii transliteration
This can be handled differently by each conversion method.
The "Šáleček" could be transliterated as "Salecek" or as
"S'alecek" or maybe even differently.
TESTBED: Move encoding conversion tests to testbed
This way it is possible to test the backend conversions too.
@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

In case anybody is wondering, why I used the Testbed engine to test the conversion instead of the CxxTest in test/. It is because the conversion uses backend code and needs the OSystem to be initialized (and it isn't with the CxxTest).

engines/testbed/encoding.h Outdated Show resolved Hide resolved
engines/testbed/encoding.h Outdated Show resolved Hide resolved
configure Outdated Show resolved Hide resolved
backends/platform/sdl/sdl.cpp Outdated Show resolved Hide resolved
#include "backends/platform/sdl/win32/codepage.h"
namespace Win32 {

int getCodePageId(Common::String codePageName) {

This comment has been minimized.

Copy link
@ccawley2011

ccawley2011 Aug 16, 2019

Member

Is it necessary for this one function to be in a separate file?

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 19, 2019

Author Contributor

I moved it inte the codepage.h, so it's not in a separate file

common/encoding.cpp Outdated Show resolved Hide resolved
common/encoding.h Show resolved Hide resolved
common/encoding.cpp Outdated Show resolved Hide resolved
vyzigold and others added 4 commits Aug 19, 2019
TESTBED: Comment correction
Co-Authored-By: Cameron Cawley <ccawley2011@gmail.com>
SDL: Remove check for SDL2 in convertEncoding()
SDL_iconv_string() is available even with SDL1
@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

@lephilousophe @ccawley2011 I hope I correctly addressed all of the issues you pointed out.

@lephilousophe

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

For my comment it's OK :)

vyzigold added 2 commits Aug 20, 2019
WIN32: Fix infinite loop when converting from utf32
Because of how cyrilic transliteration and UTF-32 is handled on
Windows, it was unfortunately possible to get into an infinite
loop of conversions. The string would get converted to UTF-32
when transliterating, but because windows backend conversion
cannot convert from UTF-32, it would use Common::Ustr to convert
it to UTF-8, which would again get converted to UTF-32 when
transliterating and so on.
backends/platform/sdl/win32/win32.cpp Show resolved Hide resolved
common/encoding.cpp Outdated Show resolved Hide resolved
common/encoding.cpp Outdated Show resolved Hide resolved
common/encoding.cpp Show resolved Hide resolved
common/encoding.h Show resolved Hide resolved
common/encoding.h Show resolved Hide resolved

@vyzigold vyzigold force-pushed the vyzigold:encoding2 branch 4 times, most recently from 0f81c43 to 6d14bea Aug 21, 2019

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

Thanks for your work! Merging

@bluegr bluegr merged commit 3cb57e2 into scummvm:master Aug 24, 2019

1 of 2 checks passed

Codacy/PR Quality Review Not up to standards. This pull request quality could be better.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@tsoliman

This comment has been minimized.

Copy link
Member

commented Aug 25, 2019

commit 1346dcc broke TESTBED compilation on macOS for me. This is because the system iconv.h has a lot of forbidden symbols and is included from TESTBED via encoding.h.

Here's the include chain:

In file included from ../engines/testbed/testbed.cpp:41:
In file included from ../engines/testbed/encoding.h:27:
In file included from ../common/encoding.h:31:
In file included from /opt/local/include/iconv.h:110:

A hack/workaround I did in the meantime is to #define FORBIDDEN_SYMBOL_ALLOW_ALL in engines/testbed/testbed.cpp which seems wrong because engines shouldn't have forbidden symbols. Then again TESTBED is now using BACKEND code to test encoding so it is no longer a pure engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.