Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to index a dictionary in UTF-8 format on Windows #23

Merged
merged 1 commit into from Jul 1, 2011

Conversation

Projects
None yet
2 participants
@Tvangeste
Copy link
Member

commented Jul 1, 2011

Here's an example of such dictionary: http://www.multiupload.com/OT6632HYRQ

An attempt to index it leads to "Encoding error" on Windows, while works just fine on Linux.

Patch to follow.

@ghost ghost assigned Tvangeste Jul 1, 2011

Fix for #23: Failure to index a dictionary in UTF-8 format on Windows.
Here's what happens. At some moment during dictionary conversion,
in DslScanner::readNextLine(), whe call iconv(), when both buffers
are of size 4 (the conversion is from UTF-8 to UTF-16).

Now, the dictionary contains two em-dash symbols at that position,
one after another, each is encoded in 3 bytes in UTF-8. So, the
input buffer of size 4 contains entire first em-dash (3 bytes)
and the first byte from the second em-dash.

Calling iconv() on Linux leads to Iconv::NeedMoreOut (E2BIG),
which makes sense, since we converted the first char and there is
no more space in the output buffer.

Calling iconv() on Windows leads to Iconv::NeedMoreIn (EINVAL),
which *also* makes sense, since we converted the first char, started
to look at the second one and noticed that it is incomplete.

The difference is only what iconv() checks first, the state
of the input or the state of the output. And it seems that it
does different things on Windows and Linux.

The patch takes this into account and resolves the conversion
problem on Windows: the only error condition that requires
to throw an encoding error is when outBytesLeft is non-empty,
that means that iconv didn't convert anything.
@Tvangeste

This comment has been minimized.

Copy link
Member Author

commented Jul 1, 2011

Take a look at the commit message for the explanation.

@dragonroot

This comment has been minimized.

Copy link
Member

commented Jul 1, 2011

The fix seems right.

Btw, Linux would typically use a newer version of the iconv library than the one shipped in the Windows version of GD (that one is quite prehistoric), so I could also suggest updating the Windows dll to the latest version.

dragonroot added a commit that referenced this pull request Jul 1, 2011

Merge pull request #23 from VVSiz/review/iconv
Failure to index a dictionary in UTF-8 format on Windows

@dragonroot dragonroot merged commit d75a3cb into goldendict:master Jul 1, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.