Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training dictionary failed: Src size is incorrect #11

Open
leoplusx opened this issue Aug 4, 2022 · 3 comments
Open

Training dictionary failed: Src size is incorrect #11

leoplusx opened this issue Aug 4, 2022 · 3 comments

Comments

@leoplusx
Copy link

leoplusx commented Aug 4, 2022

That's the error I'm getting when running

SELECT zstd_incremental_maintenance(null, 1);

Input:

SELECT zstd_enable_transparent('{"table": "absatz","column": "text", "compression_level": 19, "dict_chooser": "''a''"}');
SELECT zstd_incremental_maintenance(null, 1);

Output:

[2022-08-04T11:36:24Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma auto_vacuum=full;`
[2022-08-04T11:36:24Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma busy_timeout=2000;` or higher
[2022-08-04T11:44:13Z INFO  sqlite_zstd::transparent] absatz.text: Total 36006063 rows (16.46GB) to potentially compress (split in 1 groups).
Error: getting dict

Caused by:
    0: Training dictionary failed
       
       Caused by:
           Src size is incorrect
    1: Error code 1: SQL error or missing database

What causes this and how do I fix it?

Also, how can I change the settings for zstd_enable_transparent afterwards? Running the command again on the same column (even with different settings) gives me Error: Column text is already enabled for compression.

Thanks! 🙏

@phiresky
Copy link
Owner

phiresky commented Aug 5, 2022

I've not seen that error before.. Could you set env SQLITE_ZSTD_LOG=debug and run it again to see better what the dictionary training params are? Maybe the target dict size is too large for zstd or something. If you could find a way to send me your file or find a more minimal example that would also help.

Right now it's not easily possible to change the settings. What do you want to change exactly? For many settings you can change the config simply by editing the json in _zstd_config directly, but there's no integrated functionality to tell you if it will work.

@phiresky
Copy link
Owner

phiresky commented Aug 5, 2022

Ah you're probably hitting this error case: https://github.com/facebook/zstd/blob/eadb6c874f9d0c9e90c835f8b0181da802361e4c/lib/dictBuilder/fastcover.c#L328

Where the max training size is 1GB or 4GB. Try setting "train_dict_samples_ratio": 5 in the config json

@anacrolix
Copy link
Contributor

I think this worked for me: I used json_set to add the necessary field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants