base64 encoded images #90

Merged
merged 1 commit into from Apr 2, 2012

Conversation

Projects
None yet
2 participants
@simonwiles
Contributor

simonwiles commented Mar 18, 2012

Hi :)

GoldenDict doesn't seem to render base64 encoded images, which is a shame, because that's an extremely convenient way to work with images, and webkit ought to handle it just fine, right?

Here is an example that I was trying to use (rare Chinese character not in unicode):

<img style="height:1em;" alt="[這-言+囉]" src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAADIAAAArCAMAAAANOCvQAAAABlBMVEUAAAAAAAClZ7nPAAAAAXRS
TlMAQObYZgAAAM9JREFUeNrdlVEOgDAIQ+H+lzauS+qABvwzEtENeUZYnfYxcx/kJMu33yKZWZHl
JDJ0vPiab8esYvg6I8ROxDHHAOG2TxxNELOMGE0jzHORGBFGd6R8tkYQ7pGwlGUJHIS2aUGpjgnF
ZGTCpPJDvg/WxVgF5nOElwFyHx3CAtYQ1PZe/TBZS5T5md0j/DwYgIsCQscoMS0znOFAmC7Lpxwo
e40EBYV+e4sgwva1SFCnRLxE1LZRyuhE9Gpq01vyK8K41oXJbrmw5lf4VPWf7QLgqQLWfz5bOAAA
AABJRU5ErkJggg==
" />

It might even come out here:

[這-言+囉]

Thanks for considering this!

Edit: is it just a case of modifying the regex in stardict.cc (~ ln. 338) so that these don't get processed?

Edit: it seems it is! If I do a proper fork and pull request, would you be likely to accept it?

@chulai

This comment has been minimized.

Show comment
Hide comment
@chulai

chulai Mar 17, 2012

Member

Yes, contributions are always welcomed. It will need to go thru a review process though. To attach your pull request to this issue you can use the following script: http://goldendict.org/forum/viewtopic.php?f=6&t=1153&sid=b1f0f1ef145d6bf247fc41138396e40f

Thanks

Member

chulai commented Mar 17, 2012

Yes, contributions are always welcomed. It will need to go thru a review process though. To attach your pull request to this issue you can use the following script: http://goldendict.org/forum/viewtopic.php?f=6&t=1153&sid=b1f0f1ef145d6bf247fc41138396e40f

Thanks

@chulai

This comment has been minimized.

Show comment
Hide comment
@chulai

chulai Mar 19, 2012

Member

simonwiles, do you have a dictionary that can be shared and has data uris?

Member

chulai commented Mar 19, 2012

simonwiles, do you have a dictionary that can be shared and has data uris?

@simonwiles

This comment has been minimized.

Show comment
Hide comment
@simonwiles

simonwiles Mar 19, 2012

Contributor

sure -- I've just made the simplest one I have to hand :) here are two versions of the same dictionary, one with images in a res/ folder, and one with them encoded as data URIs. Search for "覽字" for a good example entry, with Brāhmī akṣaras (== graphemes) as images inline with Chinese text. https://www.dropbox.com/sh/lgy5soml35q3mtq/dFF7XFEn4g

You won't find any others, I'm sure, since without this commit in Goldendict there aren't any dictionary programs which read Stardict format which will support this. However, if you're willing to accept this trivial pull request, I (and perhaps others) can start using this as a new feature.

Some background:
I've been making dictionaries and trying to make sure that they work nicely in both Stardict and Goldendict (e.g., I've recoded all my old ones to use html instead of pango, which Goldendict doesn't support). But with Stardict being so crappy on Windows and OSX, it's been largely for my own benefit, and perhaps a very small handful of other Gnome users. Goldendict offers so many advantages now over Stardict, even under Gnome, that I'm inclined to take advantage of them and just give up making the dictionaries "backwards-compatible" with Stardict. Now, if we just had full-text search, and if something could be done about #70 ... I'm sorry to say I haven't written C++ for years and years, and my QT experience is limited, though -- I just don't have the time to get up to speed with it all for the foreseeable, or I'd start having a go.

Anyway, cheers for this, and thanks for everything!

Contributor

simonwiles commented Mar 19, 2012

sure -- I've just made the simplest one I have to hand :) here are two versions of the same dictionary, one with images in a res/ folder, and one with them encoded as data URIs. Search for "覽字" for a good example entry, with Brāhmī akṣaras (== graphemes) as images inline with Chinese text. https://www.dropbox.com/sh/lgy5soml35q3mtq/dFF7XFEn4g

You won't find any others, I'm sure, since without this commit in Goldendict there aren't any dictionary programs which read Stardict format which will support this. However, if you're willing to accept this trivial pull request, I (and perhaps others) can start using this as a new feature.

Some background:
I've been making dictionaries and trying to make sure that they work nicely in both Stardict and Goldendict (e.g., I've recoded all my old ones to use html instead of pango, which Goldendict doesn't support). But with Stardict being so crappy on Windows and OSX, it's been largely for my own benefit, and perhaps a very small handful of other Gnome users. Goldendict offers so many advantages now over Stardict, even under Gnome, that I'm inclined to take advantage of them and just give up making the dictionaries "backwards-compatible" with Stardict. Now, if we just had full-text search, and if something could be done about #70 ... I'm sorry to say I haven't written C++ for years and years, and my QT experience is limited, though -- I just don't have the time to get up to speed with it all for the foreseeable, or I'd start having a go.

Anyway, cheers for this, and thanks for everything!

@@ -336,7 +336,7 @@ string StardictDictionary::handleResource( char type, char const * resource, siz
string articleText = "<div class=\"sdct_h\">" + string( resource, size ) + "</div>";
return ( QString::fromUtf8( articleText.c_str() )
- .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']*)([^\"']*)", Qt::CaseInsensitive ),
+ .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']+)((?!data:)[^\"']*)", Qt::CaseInsensitive ),

This comment has been minimized.

@chulai

chulai Mar 26, 2012

Member

This change doesn't work as expected. Base64 string outputs with some <br>:

data:image/png;base64,<br>iVBORw0KGgoAAAANSUhEUgAAABgAAAAYEAYAAACw5+G7AAAACXZwQWcAAAAYAAAAGAB4TKWmAAAA<br>BmJLR0QAAAAAAAD5Q7t/AAAACXBIWXMAAABIAAAASABGyWs+AAAAWklEQVRYw+3XQQoAIAhE0eb+<br>hy6wdWRphPDnAMVjdKG6pZWNsgCy7F+a/0kAsgFeUBlAuR2IjthtM2GAd/ZfjRI7cAoBAAAAAAAA<br>AAAAwEn5r4Ey98AKkt3AAO4DP5jjTlC3AAAAAElFTkSuQmCC<br>

I have saved the article in GoldenDict (File > Save Article) and edited the html file to stript out those line break and it worked.

@chulai

chulai Mar 26, 2012

Member

This change doesn't work as expected. Base64 string outputs with some <br>:

data:image/png;base64,<br>iVBORw0KGgoAAAANSUhEUgAAABgAAAAYEAYAAACw5+G7AAAACXZwQWcAAAAYAAAAGAB4TKWmAAAA<br>BmJLR0QAAAAAAAD5Q7t/AAAACXBIWXMAAABIAAAASABGyWs+AAAAWklEQVRYw+3XQQoAIAhE0eb+<br>hy6wdWRphPDnAMVjdKG6pZWNsgCy7F+a/0kAsgFeUBlAuR2IjthtM2GAd/ZfjRI7cAoBAAAAAAAA<br>AAAAwEn5r4Ey98AKkt3AAO4DP5jjTlC3AAAAAElFTkSuQmCC<br>

I have saved the article in GoldenDict (File > Save Article) and edited the html file to stript out those line break and it worked.

This comment has been minimized.

@simonwiles

simonwiles Mar 27, 2012

Contributor

I think that's an invalid data URI, isn't it?, so I'm not surprised it doesn't work... It wouldn't work if you use any kind of invalid base64 data...

@simonwiles

simonwiles Mar 27, 2012

Contributor

I think that's an invalid data URI, isn't it?, so I'm not surprised it doesn't work... It wouldn't work if you use any kind of invalid base64 data...

This comment has been minimized.

@chulai

chulai Mar 27, 2012

Member

yes, I know that's invalid base64 data too. Apparently at some stage before outputing the data uri, the code is encoding line breaks as <br>. As you have generated the sample dictionary, is it possible that the base64 string is including line breaks in the dictionary? Line breaks, as <br>, are not valid in data uris (source Wikipedia):

In Mozilla Firefox 5, Google Chrome 17, and IE 9 (released June, 2011), encoded data must not contain newlines. 

So this might be the case for QtWebkit too.

@chulai

chulai Mar 27, 2012

Member

yes, I know that's invalid base64 data too. Apparently at some stage before outputing the data uri, the code is encoding line breaks as <br>. As you have generated the sample dictionary, is it possible that the base64 string is including line breaks in the dictionary? Line breaks, as <br>, are not valid in data uris (source Wikipedia):

In Mozilla Firefox 5, Google Chrome 17, and IE 9 (released June, 2011), encoded data must not contain newlines. 

So this might be the case for QtWebkit too.

This comment has been minimized.

@simonwiles

simonwiles Mar 27, 2012

Contributor

yes, the dictionary I created has linefeed characters in the base64-encoded data (I always understood they were appropriate, but I'll take them out now, as per your quote from WIkipedia -- thanks!). However, when I compiled GoldenDict from my fork, and fed it that dictionary, it worked just fine (i.e. -- at no stage were the linefeeds converted to tags). Strange.

@simonwiles

simonwiles Mar 27, 2012

Contributor

yes, the dictionary I created has linefeed characters in the base64-encoded data (I always understood they were appropriate, but I'll take them out now, as per your quote from WIkipedia -- thanks!). However, when I compiled GoldenDict from my fork, and fed it that dictionary, it worked just fine (i.e. -- at no stage were the linefeeds converted to tags). Strange.

@chulai

This comment has been minimized.

Show comment
Hide comment
@chulai

chulai Mar 27, 2012

Member

dragonroot, I couldn't find any reference to Stardict supporting data uri schemas. On the other hand, data uris have been part of the HTML specification for more than 10 years and are supported by almost all web browsers. And as we are outputting HTML in Stardict type 'h' dictionaries and QtWebKit supports data uri schemas, I don't see any reason why we wouldn't want this feature. What are your thoughts?

Member

chulai commented Mar 27, 2012

dragonroot, I couldn't find any reference to Stardict supporting data uri schemas. On the other hand, data uris have been part of the HTML specification for more than 10 years and are supported by almost all web browsers. And as we are outputting HTML in Stardict type 'h' dictionaries and QtWebKit supports data uri schemas, I don't see any reason why we wouldn't want this feature. What are your thoughts?

@simonwiles

This comment has been minimized.

Show comment
Hide comment
@simonwiles

simonwiles Mar 27, 2012

Contributor

For the record -- Stardict certainly doesn't support data URIs. Neither does it support much of the html markup that is possible because GoldenDict is handing the rendering task off to qtWebKit. All kinds of things are possible in Stardict type 'h' dictionaries on GoldenDict that are not possible on Stardict, which is why I'm planning to drop what I called "backwards compatibility" in the stuff I'm working on. Given that it's WebKit doing the lifting here, however, I expected that data URIs would work, and indeed they do, as long as they're not munged by the GoldenDict stardict parser first. That's all :)

Contributor

simonwiles commented Mar 27, 2012

For the record -- Stardict certainly doesn't support data URIs. Neither does it support much of the html markup that is possible because GoldenDict is handing the rendering task off to qtWebKit. All kinds of things are possible in Stardict type 'h' dictionaries on GoldenDict that are not possible on Stardict, which is why I'm planning to drop what I called "backwards compatibility" in the stuff I'm working on. Given that it's WebKit doing the lifting here, however, I expected that data URIs would work, and indeed they do, as long as they're not munged by the GoldenDict stardict parser first. That's all :)

@chulai chulai merged commit 85250bb into goldendict:master Apr 2, 2012

@chulai

This comment has been minimized.

Show comment
Hide comment
@chulai

chulai Apr 2, 2012

Member

In spite of what Wikipedia says about data uris with line breaks all modern browsers (IE 9, Opera 11, Chrome 18, Safari 5, Firefox 11) and QtWebkit render them just well.
On the other hand, the dictionary included some data uris with <br> and other with line breaks. It was not GoldenDict who was encoding line breaks to <br> as a first supposed.

Member

chulai commented Apr 2, 2012

In spite of what Wikipedia says about data uris with line breaks all modern browsers (IE 9, Opera 11, Chrome 18, Safari 5, Firefox 11) and QtWebkit render them just well.
On the other hand, the dictionary included some data uris with <br> and other with line breaks. It was not GoldenDict who was encoding line breaks to <br> as a first supposed.

@simonwiles

This comment has been minimized.

Show comment
Hide comment
@simonwiles

simonwiles Apr 3, 2012

Contributor

right -- I've never had any problems using data URIs like this. Sorry if there was a problem with my example dictionary -- I'll go back and check again, but I haven't found it yet.

Thanks very much for merging the branch. I'll start using it right away, and update my dictionaries once there's a release that I can point end-users to.

Contributor

simonwiles commented Apr 3, 2012

right -- I've never had any problems using data URIs like this. Sorry if there was a problem with my example dictionary -- I'll go back and check again, but I haven't found it yet.

Thanks very much for merging the branch. I'll start using it right away, and update my dictionaries once there's a release that I can point end-users to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment