Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dict: OPTION MIME Support #947

Closed
Michahel opened this issue Jan 19, 2018 · 7 comments · Fixed by #1339
Closed

Dict: OPTION MIME Support #947

Michahel opened this issue Jan 19, 2018 · 7 comments · Fixed by #1339

Comments

@Michahel
Copy link

The DICT-server can give MIME content, i.e. give arbitrary content without restrictions with reference to their type. If the DICT-client is able to receive a MIME content, it shall send the OPTION MIME command to the DICT-server (see dict -M). In this case, the DICT-client goes to processing task of the content. dictd, in turn, can give Plain Text or MIME content depending on whether if the client gave the OPTION MIME command to him. See A Dictionary Server Protocol. So, DICT-dictionary can consist of two modules: one with PlainText content, and other with a MIME content. In the file /usr/share/doc/dictd/examples/dictd_mime.conf we can find for example how to set up the configuration file /etc/dictd/dictd.conf in order to DICT-server will able properly to process the OPTION MIME command.
The GoldenDict doesn't support the OPTION MIME for the dictionaries which are given in the "Edit > Dictionaries > Sources > DICT servers" menu, in this way all dictionaries installed in the "DICT servers" tab can be only with Plain Text content.
I express my desire about working out such a support. For testing MIME-heading I can suggest DICT-server: dict.bibleonline.ru; dictionaries:

  • heb-rus_strong
  • ell-rus_strong

These dictionaries have the following MIME-heading:

Content-type: text/html; charset=utf-8
Content-transfer-encoding: 8bit

In answer to the OPTION MIME command DICT-server may give arbitrary content without limit by indicating its type. DICT-client may not expect such content which will be given to him. In this case I will consider in the quality of the DICT-client the program GoldenDict. The task is included so that GoldenDict would give an error message in such a case if it receives such a MIME heading which it is not yet able to support. Sample message text: "The dictionary entry contains a MIME heading that is not supported in GoldenDict." Further, as a need appears a list of supported MIME headings can be expanded for the user.

@wzyboy
Copy link

wzyboy commented Jun 1, 2018

This is an excellent idea. I am building my own dictd(1) server on localhost. I converted a Kindle dictionary to DICT format (MOBI -> HTML -> TSV -> DICT) and loaded it successfully with dictd(1). It works well, I can telnet localhost or dict -h localhost or use GoldenDict to get definitions. However, the returned definitions are in HTML as the source files use HTML.

My current workaround is to add a "Program" entry in GoldenDict, using dict %GDWORD% to send the query, and choose "HTML" type. This way, GoldenDict renders the HTML definitions perfectly.

@Michahel
Copy link
Author

Michahel commented Jun 4, 2018

I am very glad that I met someone likeminded. I also went that way. I want to now bring up those flaws which make themselves known when directly using dict as an external program.

How to reproduce the problem

Start up GoldenDict.
In the menu, under "Edit > Dictionaries > Sources" in the "Programs" tab click on the "Add" button and create a new entry. Choose the Type of program — Html. In the "Program name" field write "Dict dictionary". Enter in the "Command Line" field command as shown below:

dict -M -h dict.bible.ru -s word "%GDWORD%

Check the "Enabled" option and click OK.
In the Search Text box type the word "אדם" and wait a little until the definitions from the recently created dictionary are displayed.
See screen capture 1
See screen capture 2

The core of the problem:

  1. After the title we see MIME-heading which isn't on a separate line. The title (line begining with the word From) MIME-heading and the first line in the definition looks like one paragraph.
  2. The last line in the definition doesn't separate from the next definition lines following it but connects them into one paragraph. More precisely, in one paragraph the last paragraph of the first definition is joined with the first paragraph of the next definition.
  3. If the definitions are received from the dictionaries with Plain Text, then they all go without formating the paragraphs.

In order to solve all of these problems an intermediate script must be created which will run dict and will as a result format in the needed way. I created such a script in the JScript language. Look at the screen captures with these wort definitions:

See screen capture 3
See screen capture 4

How to install

Run the self-extracting archive dict1.12.1.exe.
After you click on the archive it will automatically extract the files, start the process of detecting GoldenDict on your computer, and after detection — it automatically adds three files (dict.js, dict.exe, cygwin1.dll), which are contained in the archive, to the sub-directory dict1.12.1 in the same directory as the GoldenDict is located.
Note:
You have to be in an administrator account under Vista or later to carry out this installation.
Start up GoldenDict.
In the menu click on "Edit > Dictionaries > Sources". In the "Programs" tab click on the "Add" button and create a new entry. Indicate the Type of program — Html. In the "Program name" field write "JScript dictionary". Type in the "Command Line" field:

cscript //Nologo //U dict1.12.1\dict.js /h:dict.bible.ru /p:2628 /s:word "%GDWORD%"

Check the "Enabled" option and click OK.

The flaw could be considered that which is calculated only for users with Windows, installer is intended only for russian language users, and the setup process itself, in my opinion, has turned out to be quite difficult.
I hope that the discussion of this topic will contribute so that in GoldenDict appears the support OPTION MIME.

@Michahel
Copy link
Author

Since a lot of time has passed and something has changed, I want to add additional information to the issue.
Note that the protocol looks simple, the client should send

OPTION MIME

And the server replies with

250 ok - using MIME headers

Subsequent responses to DEFINE commands include the content-type/content-encoding headers.
Try heb-rus_strong, grk-rus_strong, heb-eng_strong, grk-eng_strong on dict.bible.ru which has MIME content that you can use for testing. You can just search for the occurrences of a Strong's number. Type in the number 5485, the number for the Greek word charis, "grace.” If you want to locate a Hebrew word, simply add a 0 before the number, as in 02580, the number for the Hebrew chen, "grace".

@Michahel
Copy link
Author

I accidentally found that starting with version 1.5.0-RC2-359-g9bae6d2 this problem has been resolved. One could close this issue, but after testing I saw one very serious problem. I want to make a report on this now.

In dictserver.cc, at lines 840-921, we see a solution to the problem referred to in the topic

However, now that support the OPTION MIME is implemented, this part of the code should be executed provided that there is no MIME header, such as this:

Content-type: text/html; charset=utf-8
Content-transfer-encoding: 8bit

If a similar header is received from a dictionary, then this part of the code should not be executed.

Use the dict.bible.ru server for the test. For example, a search for the word "Давид" will open the definition from V.P. Vikhlyantsev's Bible Dictionary with the translation of this name, a short history of King David and passage references. In the screen capture we can see that the four Bible references look like hyperlinks. However, they must appear to be enclosed in curly brackets ({ }). {See Давид}

@sikmir
Copy link
Contributor

sikmir commented Jan 15, 2021

Just for reference:

$ dict -h dict.bible.ru -d VIH -M Давид | sed -n '1,7p;16p'
1 definition found

From Библейский Словарь для избранных  В.П.Вихлянцев [VIH]:

  Content-type: text/html; charset=utf-8
  Content-transfer-encoding: 8bit
  
   <small>{1&#1062;&#1072;&#1088; 16:13}</small> &#1087;&#1088;&#1080;&#1084;&#1077;&#1088;&#1085;&#1086; &#1074;

@sikmir
Copy link
Contributor

sikmir commented Jan 15, 2021

If a similar header is received from a dictionary, then this part of the code should not be executed.

Actually, replacing of refs ({..}) happens here:

goldendict/dictserver.cc

Lines 836 to 837 in c6f8d29

articleText = QString::fromUtf8( articleStr.c_str(), articleStr.size() )
.replace(refs, "<a href=\"gdlookup://localhost/\\1\">\\1</a>" );

@markvdvelde
Copy link

This is an excellent idea. I am building my own dictd(1) server on localhost. I converted a Kindle dictionary to DICT format (MOBI -> HTML -> TSV -> DICT) and loaded it successfully with dictd(1). It works well, I can telnet localhost or dict -h localhost or use GoldenDict to get definitions. However, the returned definitions are in HTML as the source files use HTML.

My current workaround is to add a "Program" entry in GoldenDict, using dict %GDWORD% to send the query, and choose "HTML" type. This way, GoldenDict renders the HTML definitions perfectly.

Hi, I am using Mobidict to consult Kindle dictionaries on my desktop. I believe that is much easier then converting the mobi files. However, I would like to load the Mobidict search results into GoldenDict. Would you know how to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants