Converting large slob files to StarDict #422

polo2137 · 2023-01-29T12:27:01Z

Hi, Ive encounter a problem when converting .slob to StarDict. My files size which iI want to convert is respectively 291,9 mb and 3,5 gb and the problem occurs with both of them. I use mac with 8 GB RAM and M1 chip. The following error message displayed:

[INFO] Automatically switching to SQLite mode for writing Stardict
[INFO] Using sortKeyName = 'stardict'
[INFO] Failed to detect sourceLang and targetLang from glossary name 'Wikisłownik (pl)'
[INFO] Writing to Stardict file '/Users/hejhejka/Desktop/slownik/PL1'
[INFO] Sorting took 17.1 seconds
[INFO] Auto-selecting sametypesequence=h
[ERROR] Exception while calling plugin's write function
Traceback (most recent call last):
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/glossary.py", line 859, in _write
gen.send(entry)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 746, in write
yield from self.writeCompact(self._sametypesequence)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 833, in writeCompact
uint32ToBytes(dictMark) +
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/text_utils.py", line 165, in uint32ToBytes
return struct.pack('>I', n)
^^^^^^^^^^^^^^^^^^^^
struct.error: 'I' format requires 0 <= number <= 4294967295
Traceback (most recent call last):
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/glossary.py", line 859, in _write
gen.send(entry)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 746, in write
yield from self.writeCompact(self._sametypesequence)
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/plugins/stardict.py", line 833, in writeCompact
uint32ToBytes(dictMark) +
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hejhejka/Desktop/pyglossary-master/pyglossary/text_utils.py", line 165, in uint32ToBytes
return struct.pack('>I', n)
^^^^^^^^^^^^^^^^^^^^
struct.error: 'I' format requires 0 <= number <= 4294967295

[CRITICAL] Writing file '../slownik/PL1' failed.

ilius · 2023-01-29T13:43:54Z

Same as #392

ilius · 2023-01-29T15:09:38Z

I created a branch stardict-large-file.
Please use this branch and add --write-options=large_file=true flag to your command and try again.
Please also test that the dictionary is working in your application (GoldenDict?).

polo2137 · 2023-01-29T15:57:33Z

Thanks for your answer and work! Im really sorry but Im a total noobie when it comes to commands etc, I use Gtk3 interface. Can you explain me how to use this branch?

polo2137 · 2023-01-29T16:07:16Z

And how do I update pyGlossary to add this new branch?

ilius · 2023-01-29T17:04:18Z

Click on the link above, click the green button Code, then Dowbload ZIP. Extract the zip, cd to the directory in command line, and run ./main.py

Then set the input and output files, then click on Options button in the right side of output file and format, and enable the option large_file.

polo2137 · 2023-01-29T17:11:07Z

This error pops up when i check the large_file and then ok [ERROR] invalid option value large_file =

polo2137 · 2023-01-29T17:12:36Z

This happens with every other option in options menu

ilius · 2023-01-29T17:29:58Z

Sorry, click on the right empty cell so that it changes to True.

polo2137 · 2023-01-29T18:28:57Z

Thanks so much! It works like a charm, the only problem is the file size it creates, from a 292 MB to 6,3 gb and 33,6 from 3,5 gb. And also it doesnt create a zip file but its not a problem for me [WARNING] dictzip command was not found. Make sure it's in your $PATH

polo2137 · 2023-01-29T18:30:30Z

Is there any way to make these files smaller? I want to upload them to KOReader on my kobo libra 2 and it doesnt let me do that
The item “WiktionaryPL2.dict” can’t be copied because it is too large for the volume’s format.

ilius · 2023-01-29T19:05:42Z

Great.

You can install dictzip in most Linux distros, and run it giving your .dict file as argument.

ilius · 2023-01-29T19:06:55Z

Can you also test them on GoldenDict or StarDict?

polo2137 · 2023-01-29T19:35:04Z

Sorry, I've tried both and i cant get them to work. There is no GoldenDict for mac and StarDickt seems to now work with M1 chip... (I downloaded all 3 versions) So I cant test it unless I somehow get this files light enough to upload KOReader

ilius · 2023-01-29T20:14:27Z

You use Gtk so I thought you are using Linux!

I'm not sure about Mac.

But instead of dictzip, you can use gzip to compress .dict

gzip WiktionaryPL2.dict
mv WiktionaryPL2.dict.gz WiktionaryPL2.dict.dz

KOReader uses sdcv, and I think you need to set merge_syns=True option during conversion to make it work with either one.

You can try to install sdcv and use it directly.
https://github.com/Dushistov/sdcv

…ad, #392 #422

polo2137 · 2023-01-29T22:24:34Z

OK! So I've managed to compress WiktionaryPL2.dict to WiktionaryPL2.dict.dz, the file size is 398,1 so its great! I transfered WiktionaryPL2.dict.dz , WiktionaryPL2.idx and WiktionaryPL2.ifo to KOReader. The dictionary is visable in the menu in KOReader and I've set it to use it but it doesnt work. Simply says "no definition found" when looking up a word. I didnt manage to use sdcv because I dont know how same with merge_syns=True.

polo2137 · 2023-01-29T22:33:39Z

I tried deleting .idx and .ifo files with only .dict.dz and then KOReader cant even see this dictionary, so I've tried deleting .dict.dz and left only .idx and .ifo and KOReader can see it. So the .dict.dz is not even noticed by KOReader.

ilius · 2023-01-30T06:36:52Z

It doesn't look like sdcv supports idxoffsetbits, which I think means you can't use these glossaries on sdcv or KOReader, unless you split them into multiple glossaries.

I suggest you keep trying to get GoldenDict working.

Can you upload your smaller slob file?

BTW why do you want to have them in StarDict format?
You want to use them on Mac?

polo2137 · 2023-01-30T08:00:10Z

I need them to use on my Kobo Libria 2 with KOReader installed.
Here is a wetransfer link to a smaller slob file
Ive tried splitting them using sdcv (split -b 500 command) but it spit out some random files.
Btw when I check the file format of the .dict file with 'file' command it says its a .html format but it can be that 'file' command doesnt support recognizing .dict format.

polo2137 · 2023-01-30T08:36:30Z

Why do I need GoldenDict in this proces?

polo2137 · 2023-01-30T15:02:04Z

Ok I installed goldendict

polo2137 · 2023-01-30T15:47:39Z

Ok these files are working fine in GoldenDict but not on Koreader

ilius · 2023-01-30T17:45:44Z

Okay so the StarDict files you have are useless for KOReader.

You have to split the glossary up.

You have Bash, right?
Make sure your slob file is named WiktionaryPL2.slob and you cd to it's directory, then run:

pyglossary WiktionaryPL2.slob WiktionaryPL2.tmp.txt --write-options=file_size_approx=1000mb

mv WiktionaryPL2.tmp.txt WiktionaryPL2.tmp.txt.0

for F in WiktionaryPL2.tmp.txt.* ; do
    base=${F%%.*}
    num=${F##*.}
    znum=$(printf "%02d" $num)
    pyglossary $F "$base-stardict-${znum}.ifo" --read-format=Tabfile
done

rm WiktionaryPL2.tmp.txt.*

All .dict files should be around 1 GB, so you don't have to compress them.
You can copy all files starting with WiktionaryPL2-stardict- into your Kobo and test them.

polo2137 · 2023-01-30T20:01:06Z

Ok it works! It is not usable bcs of its size and loading time on kobo device in KOReader but it works indeed! Thanks so much!

ilius · 2023-01-30T20:07:02Z

No worries.

You can make them smaller by changing that number 1000mb at the end of first line.

ilius · 2023-01-30T21:01:26Z

This issue is getting pretty long.
I'm closing it.
Feel free to open a new issue if you needed any help.

…ad, #392 #422

ilius changed the title ~~'I' format requires 0 error when converting .slob to StarDict~~ Error converting large slob files to StarDict Jan 29, 2023

ilius changed the title ~~Error converting large slob files to StarDict~~ Converting large slob files to StarDict Jan 29, 2023

ilius added a commit that referenced this issue Jan 29, 2023

StarDict writer: add option large_file, #392 #422

00b5bd9

ilius added a commit that referenced this issue Jan 29, 2023

StarDict: add write option large_file, support idxoffsetbits=64 on re…

9829130

…ad, #392 #422

ilius closed this as completed Jan 30, 2023

ilius added Feature Q&A labels Jan 30, 2023

ilius added a commit that referenced this issue Feb 9, 2023

StarDict: add write option large_file, support idxoffsetbits=64 on re…

cae8411

…ad, #392 #422

ilius added a commit that referenced this issue Feb 21, 2023

StarDict: add write option large_file, support idxoffsetbits=64 on re…

65a39c1

…ad, #392 #422

ilius added a commit that referenced this issue Feb 24, 2023

StarDict: add write option large_file, support idxoffsetbits=64 on re…

c3825b0

…ad, #392 #422

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting large slob files to StarDict #422

Converting large slob files to StarDict #422

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023 •

edited

Loading

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023 •

edited

Loading

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

ilius commented Jan 30, 2023 •

edited

Loading

polo2137 commented Jan 30, 2023

ilius commented Jan 30, 2023

ilius commented Jan 30, 2023

Converting large slob files to StarDict #422

Converting large slob files to StarDict #422

Comments

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023 • edited Loading

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023

ilius commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 29, 2023 • edited Loading

polo2137 commented Jan 29, 2023

polo2137 commented Jan 29, 2023

ilius commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

polo2137 commented Jan 30, 2023

ilius commented Jan 30, 2023 • edited Loading

polo2137 commented Jan 30, 2023

ilius commented Jan 30, 2023

ilius commented Jan 30, 2023

ilius commented Jan 29, 2023 •

edited

Loading

ilius commented Jan 29, 2023 •

edited

Loading

ilius commented Jan 30, 2023 •

edited

Loading