ofTrueTypeFont support for iso8859-1 on utf-8 encoded files #1539

Merged
merged 8 commits into from Jun 14, 2013

Conversation

Projects
None yet
6 participants
Owner

arturoc commented Sep 3, 2012

ofTrueTypeFont as it is right now has support for all the characters in iso8859-1 if the flag bFullCharacterSet is set to true when loading, which can render all the extended characters in the latin alphabet like ñàäáâç...

the problem is that most IDEs use utf-8 encoding by default so when writing this characters in strings in the code or text files they fail to render because the encoding freetype is expecting is iso8859-1.

this converts utf-8 to iso8859-1 which solves the problem of rendering special characters for the latin alphabet. might be good to test in code from other ide's in case the default encoding is not utf-8

Contributor

damian0815 commented Sep 3, 2012

i have an app where i force latin-1 encoding in a source file (Xcode) using \xyyy format. i suspect this is going to break that...

Contributor

damian0815 commented Sep 3, 2012

although i could be wrong. it's very welcome to have oF be more aware of character encoding in any case! nice work.

Owner

arturoc commented Sep 3, 2012

it shouldn't break except for  and A with a ~ on top, but still this is kind of a hack, perhaps we should be able to specify the default encoding for strings and set it to utf-8? having a method like:

font.setEncoding(OF_ENCODING_UTF-8)

and by now we can support utf-8 and iso-8859-1 (latin 1)

Contributor

damian0815 commented Sep 3, 2012

+1 being able to properly set encoding.

actually when i was working with ofTrueTypeFont earlier this year i thought that there should be ofTrueTypeFont::drawLatin1String and ofTrueTypeFont::drawUTF8String

@arturoc arturoc ofTrueTypeFont: add getEncoding/setEncoding
ofConstants: add encoding enum
ofUTF8ToISO88591: faster
154874f

@bakercp bakercp commented on an outdated diff Sep 3, 2012

libs/openFrameworks/utils/ofUtils.cpp
@@ -467,6 +467,22 @@ string ofBinaryToString(const string& value) {
}
//--------------------------------------------------
+string ofUTF8ToISO8859_1(const string & utf8){
Member

bakercp commented Sep 3, 2012

I did a bunch of work on this in ofxUnicode / ofxFont to wrap Poco's text converters. Snippets below. I ended up moving away from it in favor of ICU based unicode support -- mainly because poco does not support bidirectional or UTF8 iterators, and more complex encoding support. You can find more on the ICU branch here: https://github.com/bakercp/ofxUnicode/

Also you might check out the ofCharacterSet integration into ofxFont. Anyway, all of this is good progress in the right direction!

enum ofTextEncoding {
    OF_TEXT_ENCODING_UTF8 = 0        , // variable width encoding (http://en.wikipedia.org/wiki/UTF-8) backward compatible  w/ ASCII
    OF_TEXT_ENCODING_UTF16           , // 16-bit multi-byte (http://en.wikipedia.org/wiki/UTF-16)
    OF_TEXT_ENCODING_ASCII           , // 7-bit ASCII text encoding (http://en.wikipedia.org/wiki/ASCII)
    OF_TEXT_ENCODING_LATIN_1         , // 8-bit single-byte - (http://en.wikipedia.org/wiki/ISO/IEC_8859-1)
    OF_TEXT_ENCODING_LATIN_9         , // 8-bit single-byte - (http://en.wikipedia.org/wiki/ISO/IEC_8859-15), western chars + €
    OF_TEXT_ENCODING_WINDOWS_1252      // Superset of Latin 1 (ISO 8859-1) http://en.wikipedia.org/wiki/Windows-1252
};
//------------------------------------------------------------------
string ofTextConverter::convert(const ofBuffer& buffer, ofTextEncoding inputEncoding, ofTextEncoding outputEncoding) {
    return convert(buffer.getBinaryBuffer(), buffer.size(), inputEncoding, outputEncoding);
}

//------------------------------------------------------------------
string ofTextConverter::convert(const void* source, int length,  ofTextEncoding inputEncoding, ofTextEncoding outputEncoding) {
    string output;
    Poco::TextEncoding& ie = getTextEncoding(inputEncoding);
    Poco::TextEncoding& oe = getTextEncoding(outputEncoding);
    Poco::TextConverter converter(ie,oe);
    converter.convert(source, length, output);
    return output;
}

//------------------------------------------------------------------
string ofTextConverter::convert(const string& input, ofTextEncoding inputEncoding, ofTextEncoding outputEncoding) {
    // this returns a std::string -- which is just a sequence of bytes.
    // the way those bytes is interpreted is up to the user.

    // pass through
    if(inputEncoding == outputEncoding) return input;

    string output;
    Poco::TextEncoding& ie = getTextEncoding(inputEncoding);
    Poco::TextEncoding& oe = getTextEncoding(outputEncoding);
    Poco::TextConverter converter(ie,oe);
    converter.convert(input, output);
    return output;
}

//------------------------------------------------------------------
Poco::TextEncoding& ofTextConverter::getTextEncoding(ofTextEncoding enc) {
    if(enc == OF_TEXT_ENCODING_UTF8) {
        static Poco::UTF8Encoding utf8_enc;
        return utf8_enc;
    } else if(enc == OF_TEXT_ENCODING_UTF16) {
        static Poco::UTF16Encoding utf16_enc;
        return utf16_enc;
    } else if(enc == OF_TEXT_ENCODING_ASCII) {
        static Poco::ASCIIEncoding ascii_enc;
        return ascii_enc;
    } else if(enc == OF_TEXT_ENCODING_LATIN_1) {
        static Poco::Latin1Encoding latin1_enc;
        return latin1_enc;
    } else if(enc == OF_TEXT_ENCODING_LATIN_9) {
        static Poco::Latin9Encoding latin9_enc;
        return latin9_enc;
    } else if(enc == OF_TEXT_ENCODING_WINDOWS_1252) {
        static Poco::Windows1252Encoding windows1252_enc;
        return windows1252_enc;
    } else {
        ofLogWarning("ofTextConverter") << "getEncoding - unknown encoding, returning utf8.");
        static Poco::UTF8Encoding utf8_enc;
        return utf8_enc;
    }
}
Owner

arturoc commented Sep 3, 2012

wow, thanks, i've moved it to use poco internally in ofTrueTypeFont, but those functions seem really useful. actually freetype uses utf-32 which poco lacks of so if we want to give support for other alphabets will need to find some other library or add that conversion to poco

Member

bakercp commented Sep 3, 2012

Yes, my mod of ofFont is utf-32 aware (it iterates through UTF8 strings) but it is not currently a drop-in replacement for ofTrueTypeFont. I'm currently separating out text formatting / layout from the ofFont class and building from (ofBaseFont)[https://github.com/bakercp/ofxFont/blob/master/src/ofBaseFont.h]. There are other, more flexible functions that do layout. That UTF8 iteration process is possible with POCO::TextIterator, but as I mentioned earlier, it is limited to one direction w/o random access, reverse iteration, etc. Anyway, I'm continuing work on ofFont -- it's kind of a ground-up reengineer. Keep an eye on it! It's coming soon.

bakercp commented on bec3c0a Sep 3, 2012

Nice mods! Eventually I think oF should have a suite of charset conversions in some sort of ofTextUtilities file (it's happening here currently - https://github.com/bakercp/ofxUnicode/blob/ICU/src/ofTextUtilities.cpp), so this seems like a really good move for the time being.

Contributor

thiagohersan commented Jan 26, 2013

fwiw, I'm using this in a project to render tweets in spanish.

so far so good.

Member

bakercp commented Jan 26, 2013

Unless there are any disputes, I'll merge this in the next day or two. As mentioned previously in the thread, many of these features will be brought into the upcoming ofFont updates, but for now this seems to be a solid, pragmatic solution.

Contributor

kylemcdonald commented Jun 14, 2013

i'm going to check out the conflicts here and try to merge this.

kylemcdonald merged commit d9caec4 into openframeworks:develop Jun 14, 2013

yty commented Aug 16, 2013

Can support Asian encoded do? I tested not. .

Member

bakercp commented Aug 16, 2013

Hey there, what do you mean by Asian encoded? Do you mean to show CJK
glyphs or a specific CJK encoding scheme (i.e.
http://en.wikipedia.org/wiki/CJK_characters#Encoding).

After the ycam dev conference we now have a roadmap for typography /
internationalization that supports CJK etc. I believe you'll have access
to translation between CJK encodings to UTF8/Unicode via libICU which will
soon be included in the core.

While we are transitioning, you might look into one of the addons that
offers better unicode support (check out ofxAddons.com for FTGL, fontstash,
etc)

Christopher


http://christopherbaker.net

On Fri, Aug 16, 2013 at 2:40 PM, yty notifications@github.com wrote:

Can support Asian encoded do? I tested not. .


Reply to this email directly or view it on GitHubhttps://github.com/openframeworks/openFrameworks/pull/1539#issuecomment-22748546
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment