Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting large slob files to StarDict #422

Closed
polo2137 opened this issue Jan 29, 2023 · 25 comments
Closed

Converting large slob files to StarDict #422

polo2137 opened this issue Jan 29, 2023 · 25 comments

Comments

@polo2137
Copy link

Hi, Ive encounter a problem when converting .slob to StarDict. My files size which iI want to convert is respectively 291,9 mb and 3,5 gb and the problem occurs with both of them. I use mac with 8 GB RAM and M1 chip. The following error message displayed:

[INFO] Automatically switching to SQLite mode for writing Stardict
[INFO] Using sortKeyName = 'stardict'
[INFO] Failed to detect sourceLang and targetLang from glossary name 'Wikisłownik (pl)'
[INFO] Writing to Stardict file '/Users/hejhejka/Desktop/slownik/PL1'
[INFO] Sorting took 17.1 seconds
[INFO] Auto-selecting sametypesequence=h
[ERROR] Exception while calling plugin's write function
Traceback (most recent call last):
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/glossary.py", line 859, in _write
gen.send(entry)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 746, in write
yield from self.writeCompact(self._sametypesequence)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 833, in writeCompact
uint32ToBytes(dictMark) +
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/text_utils.py", line 165, in uint32ToBytes
return struct.pack('>I', n)
^^^^^^^^^^^^^^^^^^^^
struct.error: 'I' format requires 0 <= number <= 4294967295
Traceback (most recent call last):
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/glossary.py", line 859, in _write
gen.send(entry)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 746, in write
yield from self.writeCompact(self._sametypesequence)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 833, in writeCompact
uint32ToBytes(dictMark) +
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/text_utils.py", line 165, in uint32ToBytes
return struct.pack('>I', n)
^^^^^^^^^^^^^^^^^^^^
struct.error: 'I' format requires 0 <= number <= 4294967295

[CRITICAL] Writing file '../slownik/PL1' failed.

@ilius ilius changed the title 'I' format requires 0 error when converting .slob to StarDict Error converting large slob files to StarDict Jan 29, 2023
@ilius ilius changed the title Error converting large slob files to StarDict Converting large slob files to StarDict Jan 29, 2023
@ilius
Copy link
Owner

ilius commented Jan 29, 2023

Same as #392

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

I created a branch stardict-large-file.
Please use this branch and add --write-options=large_file=true flag to your command and try again.
Please also test that the dictionary is working in your application (GoldenDict?).

@polo2137
Copy link
Author

Thanks for your answer and work! Im really sorry but Im a total noobie when it comes to commands etc, I use Gtk3 interface. Can you explain me how to use this branch?

@polo2137
Copy link
Author

And how do I update pyGlossary to add this new branch?

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

Click on the link above, click the green button Code, then Dowbload ZIP. Extract the zip, cd to the directory in command line, and run ./main.py

Then set the input and output files, then click on Options button in the right side of output file and format, and enable the option large_file.

@polo2137
Copy link
Author

This error pops up when i check the large_file and then ok [ERROR] invalid option value large_file =

@polo2137
Copy link
Author

This happens with every other option in options menu

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

Sorry, click on the right empty cell so that it changes to True.

@polo2137
Copy link
Author

Thanks so much! It works like a charm, the only problem is the file size it creates, from a 292 MB to 6,3 gb and 33,6 from 3,5 gb. And also it doesnt create a zip file but its not a problem for me [WARNING] dictzip command was not found. Make sure it's in your $PATH

@polo2137
Copy link
Author

Is there any way to make these files smaller? I want to upload them to KOReader on my kobo libra 2 and it doesnt let me do that
The item “WiktionaryPL2.dict” can’t be copied because it is too large for the volume’s format.

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

Great.

You can install dictzip in most Linux distros, and run it giving your .dict file as argument.

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

Can you also test them on GoldenDict or StarDict?

@polo2137
Copy link
Author

Sorry, I've tried both and i cant get them to work. There is no GoldenDict for mac and StarDickt seems to now work with M1 chip... (I downloaded all 3 versions) So I cant test it unless I somehow get this files light enough to upload KOReader

@ilius
Copy link
Owner

ilius commented Jan 29, 2023

You use Gtk so I thought you are using Linux!

I'm not sure about Mac.

But instead of dictzip, you can use gzip to compress .dict

gzip WiktionaryPL2.dict
mv WiktionaryPL2.dict.gz WiktionaryPL2.dict.dz

KOReader uses sdcv, and I think you need to set merge_syns=True option during conversion to make it work with either one.

You can try to install sdcv and use it directly.
https://github.com/Dushistov/sdcv

@polo2137
Copy link
Author

OK! So I've managed to compress WiktionaryPL2.dict to WiktionaryPL2.dict.dz, the file size is 398,1 so its great! I transfered WiktionaryPL2.dict.dz , WiktionaryPL2.idx and WiktionaryPL2.ifo to KOReader. The dictionary is visable in the menu in KOReader and I've set it to use it but it doesnt work. Simply says "no definition found" when looking up a word. I didnt manage to use sdcv because I dont know how same with merge_syns=True.

@polo2137
Copy link
Author

I tried deleting .idx and .ifo files with only .dict.dz and then KOReader cant even see this dictionary, so I've tried deleting .dict.dz and left only .idx and .ifo and KOReader can see it. So the .dict.dz is not even noticed by KOReader.

@ilius
Copy link
Owner

ilius commented Jan 30, 2023

It doesn't look like sdcv supports idxoffsetbits, which I think means you can't use these glossaries on sdcv or KOReader, unless you split them into multiple glossaries.

I suggest you keep trying to get GoldenDict working.

Can you upload your smaller slob file?

BTW why do you want to have them in StarDict format?
You want to use them on Mac?

@polo2137
Copy link
Author

I need them to use on my Kobo Libria 2 with KOReader installed.
Here is a wetransfer link to a smaller slob file
Ive tried splitting them using sdcv (split -b 500 command) but it spit out some random files.
Btw when I check the file format of the .dict file with 'file' command it says its a .html format but it can be that 'file' command doesnt support recognizing .dict format.

@polo2137
Copy link
Author

Why do I need GoldenDict in this proces?

@polo2137
Copy link
Author

Ok I installed goldendict

@polo2137
Copy link
Author

Ok these files are working fine in GoldenDict but not on Koreader

@ilius
Copy link
Owner

ilius commented Jan 30, 2023

Okay so the StarDict files you have are useless for KOReader.

You have to split the glossary up.

You have Bash, right?
Make sure your slob file is named WiktionaryPL2.slob and you cd to it's directory, then run:

pyglossary WiktionaryPL2.slob WiktionaryPL2.tmp.txt --write-options=file_size_approx=1000mb

mv WiktionaryPL2.tmp.txt WiktionaryPL2.tmp.txt.0

for F in WiktionaryPL2.tmp.txt.* ; do
    base=${F%%.*}
    num=${F##*.}
    znum=$(printf "%02d" $num)
    pyglossary $F "$base-stardict-${znum}.ifo" --read-format=Tabfile
done

rm WiktionaryPL2.tmp.txt.*

All .dict files should be around 1 GB, so you don't have to compress them.
You can copy all files starting with WiktionaryPL2-stardict- into your Kobo and test them.

@polo2137
Copy link
Author

Ok it works! It is not usable bcs of its size and loading time on kobo device in KOReader but it works indeed! Thanks so much!

@ilius
Copy link
Owner

ilius commented Jan 30, 2023

No worries.

You can make them smaller by changing that number 1000mb at the end of first line.

@ilius
Copy link
Owner

ilius commented Jan 30, 2023

This issue is getting pretty long.
I'm closing it.
Feel free to open a new issue if you needed any help.

@ilius ilius closed this as completed Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants