Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Combine dictionaries #31

Open
Heineken opened this issue Mar 15, 2016 · 11 comments
Open

Feature Request: Combine dictionaries #31

Heineken opened this issue Mar 15, 2016 · 11 comments

Comments

@Heineken
Copy link
Contributor

I have created dictionaries from dict.cc data for English, Russian, French and Spanish as they offer some words and some grammatical information which cannot be found in the standard dictionaries.

Also, they don't have the annoying "linked entries" where you have to click on a blue link and scroll a while to find the translation.

Looking up a rare word can take a while, because I have to switch between dictionaries, without knowing which one is active, before I try Leo, Google Translate or whatever.

I propose to implement an option for merging dictionaries / combined lookup. That could be automatic for all dictionaries of the same language and would thus require only one additional checkbox.

@rdoeffinger
Copy link
Owner

(Ignoring that I suspect this might not be that easy to implement, as having a scrollable list means you need a global index and combining multiple ones into a single combined one at runtime sounds likely expensive)
I'm not really sure how you think the result of that combined lookup should be displayed?
Just dumping all results would probably look a bit messy? Using multiple columns would not fit on most screens. Indicating the language for each translation in-line might waste a good bit of space.

@Heineken
Copy link
Contributor Author

Well, I think the lookup should look like the lookup in a single dictionary. Identifying the origin of each entry is optional.

If there is no simple way of looking up in two list at once and it is to expensive at runtime, how about merging the dictionaries once into a new file? Maybe in a first step as a command line tool. Maybe that DictionaryBuilder.jar can be modified.

@rdoeffinger
Copy link
Owner

How do you create the dictionaries? If you use the DictionaryBuilder you should simply specify the sources for one as --input0 and the other as --input1 and you get both in a single dictionary.

@Heineken
Copy link
Contributor Author

I have one text file with both languages and pipe that through the jar, along with an ignore list.

I can't specify an existing dictionary as a source, can I?

I need text files as input, right?

I don't have the input files for the "stock" dictionaries, do I?

@rdoeffinger
Copy link
Owner

Dictionaries as source is not yet possible, no. I guess it would be kind of nice, but doubt I'll have time for that.
The process for generating the stock dictionaries is in the readme.txt (this repository, not the DictionaryPC, don't ask me why, I probably wasn't thinking).
All data is public, it's quite a large download from wiktionary though (and sometimes they delete old database versions, the you need to use newer ones).
Would probably be easiest to modify generate_dictionaries.sh to include your additional sources once you get to that step.
The ca. 10 first lines of that script can be used to generate only some dictionaries.

@Heineken
Copy link
Contributor Author

OK, thanks, I'll look into that.

@Heineken
Copy link
Contributor Author

Hmm, I use a single jar named "DictionaryBuilder.jar" to compile my dictionaries. It works fine on Windows (except for ? instead of Cyrillic letters in the success message).

Now I looked into DictionaryPC and can't find any jar of that name. The command in the shell scripts also looks different from what I use: I call my DictionaryBuilder.jar with the same arguments the readme.txt (last line) gives for run.sh.

The stuff in DictionaryPC isn't meant to be run "natively" in Windows as I do it, or is it? If so, where do I find the current version of DictionaryBuilder.jar?

@rdoeffinger
Copy link
Owner

DictionaryBuilder.jar is just a compiled and bundled up version of DictionaryPC.
The scripts should work fine on Windows as long as you have a POSIX shell and JDK available (usually would mean installing cygwin I guess).
The run.sh could also just call a DictionaryBuilder.jar. Or you can just figure out the command-lines from the scripts.
But if you use an old DictionaryBuilder.jar you will have some issues. Mostly some small bugs, especially with French but also no support for the more efficient v7 dictionary format.

@Heineken
Copy link
Contributor Author

The 1st sentence doesn't make sense.

Wouldn't it make sense to provide such a bundled jar to lower the threshold for Non-Linux users who don't want to install additional software for a small script? I guess, more people have Java installed than Cygwin or Linux.

@rdoeffinger
Copy link
Owner

Fixed that first sentence.

Even with the jar, to use the scripts you'd still need Cygwin or Linux or such.
So a jar currently only really helps the users that can figure out how to use it from scratch with no real documentation and can't or do not want to use Cygwin, Linux or manually compile or use an IDE like eclipse to compile and generate a jar themselves.
So yes, a release of DictionaryPC with a compiled version would be nice but is very, very far down the list in priority.

@Heineken
Copy link
Contributor Author

OK, I see. I just want to make sure you got my use case:

I don't see the necessity to build the dictionaries on my as they can be downloaded conveniently from within the app. I do, however, want to create alternative dictionaries from other sources. For this I don't need no scripts, only the command to run the jar and examples for the input file formats. I don't have to be a developer to do this, just a command line user.

Question is, if someone were to add a compiled version, could the compilation process be automatized to include later updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants