Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.3 broke synonym parsing on dictunformat -> ifo conversion #367

Closed
doozan opened this issue Mar 4, 2022 · 1 comment
Closed

4.3 broke synonym parsing on dictunformat -> ifo conversion #367

doozan opened this issue Mar 4, 2022 · 1 comment
Labels

Comments

@doozan
Copy link
Contributor

doozan commented Mar 4, 2022

This may have actually been a bug, but it was a useful bug: prior to 4.3.0 a | separated list of words in a dictunformat would be split into synonyms when converting to .ifo. Is there a supported method of listing synonyms in dictunformat?

I started writing this issue as a feature request to add support for synonyms when converting from dictunformat to slob, but when I upgraded from 4.2 to the latest version to make a testcase, I found that it broke the synonym handling for stardict format. Previously I was converting dictunformat -> stardict -> slob as a way to get dictunformat+synonyms into slob format.

cat <<EOF>test.dictunformat
_____
foo|foo1|foo2|foo3
This is a test
_____
bar|bar1|bar2|bar3
Another test
EOF

pyglossary-4.2.1 test.dictunformat test-4.2-1.ifo
pyglossary-4.3 test.dictunformat test-4.3.ifo

cat test-4.2.1.ifo
StarDict's dict ifo file
version=3.0.0
bookname=test.dictunformat
wordcount=2
idxfilesize=24
sametypesequence=m
synwordcount=6
description=

cat test-4.3.ifo
StarDict's dict ifo file
version=3.0.0
bookname=test.dictunformat
wordcount=2
idxfilesize=54
sametypesequence=m
description=

@ilius
Copy link
Owner

ilius commented Mar 4, 2022

dictunformat tool uses "semi-colon and 3 spaces" by default to separate headword and alternates (You can change it with --headword-separator).

_____

foo;   foo1;   foo2;   foo3
This is a test
_____

bar;   bar1;   bar2;   bar3
Another test

I pushed a commit and made it the default behavior of PyGlossary to split with this separator.
I suggest you fix your files.
But if you still want to use |, use a command like this:

pyglossary test.dictunformat test.ifo --read-options 'headword_separator=|'

@ilius ilius added the Feature label Mar 4, 2022
@ilius ilius closed this as completed Mar 4, 2022
ilius added a commit that referenced this issue Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants