Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

base64 encoded images #90

Merged
merged 1 commit into from

2 participants

@simonwiles

Hi :)

GoldenDict doesn't seem to render base64 encoded images, which is a shame, because that's an extremely convenient way to work with images, and webkit ought to handle it just fine, right?

Here is an example that I was trying to use (rare Chinese character not in unicode):

<img style="height:1em;" alt="[這-言+囉]" src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAADIAAAArCAMAAAANOCvQAAAABlBMVEUAAAAAAAClZ7nPAAAAAXRS
TlMAQObYZgAAAM9JREFUeNrdlVEOgDAIQ+H+lzauS+qABvwzEtENeUZYnfYxcx/kJMu33yKZWZHl
JDJ0vPiab8esYvg6I8ROxDHHAOG2TxxNELOMGE0jzHORGBFGd6R8tkYQ7pGwlGUJHIS2aUGpjgnF
ZGTCpPJDvg/WxVgF5nOElwFyHx3CAtYQ1PZe/TBZS5T5md0j/DwYgIsCQscoMS0znOFAmC7Lpxwo
e40EBYV+e4sgwva1SFCnRLxE1LZRyuhE9Gpq01vyK8K41oXJbrmw5lf4VPWf7QLgqQLWfz5bOAAA
AABJRU5ErkJggg==
" />

It might even come out here:

[這-言+囉]

Thanks for considering this!

Edit: is it just a case of modifying the regex in stardict.cc (~ ln. 338) so that these don't get processed?

Edit: it seems it is! If I do a proper fork and pull request, would you be likely to accept it?

@chulai
Collaborator

Yes, contributions are always welcomed. It will need to go thru a review process though. To attach your pull request to this issue you can use the following script: http://goldendict.org/forum/viewtopic.php?f=6&t=1153&sid=b1f0f1ef145d6bf247fc41138396e40f

Thanks

@chulai
Collaborator

simonwiles, do you have a dictionary that can be shared and has data uris?

@simonwiles

sure -- I've just made the simplest one I have to hand :) here are two versions of the same dictionary, one with images in a res/ folder, and one with them encoded as data URIs. Search for "覽字" for a good example entry, with Brāhmī akṣaras (== graphemes) as images inline with Chinese text. https://www.dropbox.com/sh/lgy5soml35q3mtq/dFF7XFEn4g

You won't find any others, I'm sure, since without this commit in Goldendict there aren't any dictionary programs which read Stardict format which will support this. However, if you're willing to accept this trivial pull request, I (and perhaps others) can start using this as a new feature.

Some background:
I've been making dictionaries and trying to make sure that they work nicely in both Stardict and Goldendict (e.g., I've recoded all my old ones to use html instead of pango, which Goldendict doesn't support). But with Stardict being so crappy on Windows and OSX, it's been largely for my own benefit, and perhaps a very small handful of other Gnome users. Goldendict offers so many advantages now over Stardict, even under Gnome, that I'm inclined to take advantage of them and just give up making the dictionaries "backwards-compatible" with Stardict. Now, if we just had full-text search, and if something could be done about #70 ... I'm sorry to say I haven't written C++ for years and years, and my QT experience is limited, though -- I just don't have the time to get up to speed with it all for the foreseeable, or I'd start having a go.

Anyway, cheers for this, and thanks for everything!

@chulai chulai commented on the diff
stardict.cc
@@ -336,7 +336,7 @@ string StardictDictionary::handleResource( char type, char const * resource, siz
string articleText = "<div class=\"sdct_h\">" + string( resource, size ) + "</div>";
return ( QString::fromUtf8( articleText.c_str() )
- .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']*)([^\"']*)", Qt::CaseInsensitive ),
+ .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']+)((?!data:)[^\"']*)", Qt::CaseInsensitive ),
@chulai Collaborator
chulai added a note

This change doesn't work as expected. Base64 string outputs with some <br>:

data:image/png;base64,<br>iVBORw0KGgoAAAANSUhEUgAAABgAAAAYEAYAAACw5+G7AAAACXZwQWcAAAAYAAAAGAB4TKWmAAAA<br>BmJLR0QAAAAAAAD5Q7t/AAAACXBIWXMAAABIAAAASABGyWs+AAAAWklEQVRYw+3XQQoAIAhE0eb+<br>hy6wdWRphPDnAMVjdKG6pZWNsgCy7F+a/0kAsgFeUBlAuR2IjthtM2GAd/ZfjRI7cAoBAAAAAAAA<br>AAAAwEn5r4Ey98AKkt3AAO4DP5jjTlC3AAAAAElFTkSuQmCC<br>

I have saved the article in GoldenDict (File > Save Article) and edited the html file to stript out those line break and it worked.

I think that's an invalid data URI, isn't it?, so I'm not surprised it doesn't work... It wouldn't work if you use any kind of invalid base64 data...

@chulai Collaborator
chulai added a note

yes, I know that's invalid base64 data too. Apparently at some stage before outputing the data uri, the code is encoding line breaks as <br>. As you have generated the sample dictionary, is it possible that the base64 string is including line breaks in the dictionary? Line breaks, as <br>, are not valid in data uris (source Wikipedia):

In Mozilla Firefox 5, Google Chrome 17, and IE 9 (released June, 2011), encoded data must not contain newlines. 

So this might be the case for QtWebkit too.

yes, the dictionary I created has linefeed characters in the base64-encoded data (I always understood they were appropriate, but I'll take them out now, as per your quote from WIkipedia -- thanks!). However, when I compiled GoldenDict from my fork, and fed it that dictionary, it worked just fine (i.e. -- at no stage were the linefeeds converted to tags). Strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@chulai
Collaborator

dragonroot, I couldn't find any reference to Stardict supporting data uri schemas. On the other hand, data uris have been part of the HTML specification for more than 10 years and are supported by almost all web browsers. And as we are outputting HTML in Stardict type 'h' dictionaries and QtWebKit supports data uri schemas, I don't see any reason why we wouldn't want this feature. What are your thoughts?

@simonwiles

For the record -- Stardict certainly doesn't support data URIs. Neither does it support much of the html markup that is possible because GoldenDict is handing the rendering task off to qtWebKit. All kinds of things are possible in Stardict type 'h' dictionaries on GoldenDict that are not possible on Stardict, which is why I'm planning to drop what I called "backwards compatibility" in the stuff I'm working on. Given that it's WebKit doing the lifting here, however, I expected that data URIs would work, and indeed they do, as long as they're not munged by the GoldenDict stardict parser first. That's all :)

@chulai chulai merged commit 85250bb into goldendict:master
@chulai
Collaborator

In spite of what Wikipedia says about data uris with line breaks all modern browsers (IE 9, Opera 11, Chrome 18, Safari 5, Firefox 11) and QtWebkit render them just well.
On the other hand, the dictionary included some data uris with <br> and other with line breaks. It was not GoldenDict who was encoding line breaks to <br> as a first supposed.

@simonwiles

right -- I've never had any problems using data URIs like this. Sorry if there was a problem with my example dictionary -- I'll go back and check again, but I haven't found it yet.

Thanks very much for merging the branch. I'll start using it right away, and update my dictionaries once there's a release that I can point end-users to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 1 addition and 1 deletion.
  1. +1 −1  stardict.cc
View
2  stardict.cc
@@ -336,7 +336,7 @@ string StardictDictionary::handleResource( char type, char const * resource, siz
string articleText = "<div class=\"sdct_h\">" + string( resource, size ) + "</div>";
return ( QString::fromUtf8( articleText.c_str() )
- .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']*)([^\"']*)", Qt::CaseInsensitive ),
+ .replace( QRegExp( "(<\\s*img\\s+[^>]*src\\s*=\\s*[\"']+)((?!data:)[^\"']*)", Qt::CaseInsensitive ),
@chulai Collaborator
chulai added a note

This change doesn't work as expected. Base64 string outputs with some <br>:

data:image/png;base64,<br>iVBORw0KGgoAAAANSUhEUgAAABgAAAAYEAYAAACw5+G7AAAACXZwQWcAAAAYAAAAGAB4TKWmAAAA<br>BmJLR0QAAAAAAAD5Q7t/AAAACXBIWXMAAABIAAAASABGyWs+AAAAWklEQVRYw+3XQQoAIAhE0eb+<br>hy6wdWRphPDnAMVjdKG6pZWNsgCy7F+a/0kAsgFeUBlAuR2IjthtM2GAd/ZfjRI7cAoBAAAAAAAA<br>AAAAwEn5r4Ey98AKkt3AAO4DP5jjTlC3AAAAAElFTkSuQmCC<br>

I have saved the article in GoldenDict (File > Save Article) and edited the html file to stript out those line break and it worked.

I think that's an invalid data URI, isn't it?, so I'm not surprised it doesn't work... It wouldn't work if you use any kind of invalid base64 data...

@chulai Collaborator
chulai added a note

yes, I know that's invalid base64 data too. Apparently at some stage before outputing the data uri, the code is encoding line breaks as <br>. As you have generated the sample dictionary, is it possible that the base64 string is including line breaks in the dictionary? Line breaks, as <br>, are not valid in data uris (source Wikipedia):

In Mozilla Firefox 5, Google Chrome 17, and IE 9 (released June, 2011), encoded data must not contain newlines. 

So this might be the case for QtWebkit too.

yes, the dictionary I created has linefeed characters in the base64-encoded data (I always understood they were appropriate, but I'll take them out now, as per your quote from WIkipedia -- thanks!). However, when I compiled GoldenDict from my fork, and fed it that dictionary, it worked just fine (i.e. -- at no stage were the linefeeds converted to tags). Strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
"\\1bres://" + QString::fromStdString( getId() ) + "/\\2" )
.toUtf8().data() );
}
Something went wrong with that request. Please try again.