Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lzhs_scanner: missing call to lzhs_init_lookup() #13

Closed
boom1 opened this issue Oct 31, 2018 · 2 comments
Closed

lzhs_scanner: missing call to lzhs_init_lookup() #13

boom1 opened this issue Oct 31, 2018 · 2 comments

Comments

@boom1
Copy link

boom1 commented Oct 31, 2018

In the tool lzhs_scanner, function scan_lzhs() is calling lzhs_decode() without calling lzhs_init_lookup() first.
(Probably the lookup tables should become part of the struct lzhs_ctx.)

diff -ru epk2extract/src/tools/lzhs_scanner.c epk2extract-master/src/tools/lzhs_scanner.c
--- epk2extract/src/tools/lzhs_scanner.c        2018-10-31 15:07:27.754211489 +0100
+++ epk2extract-master/src/tools/lzhs_scanner.c 2018-10-31 12:06:27.649465940 +0100
@@ -65,6 +65,7 @@
 
                                uint8_t out_checksum;
                                asprintf(&outdecode, "%s/%s_file%d.unlzhs", dirn, filen, count);
+                               lzhs_init_lookup();
                                lzhs_decode(out, 0, outdecode, &out_checksum);
                                if(extract == 2 && out_checksum != header->checksum){
                                        printf("Checksum Mismatch, Skipping\n");

@smx-smx
Copy link
Member

smx-smx commented Nov 7, 2018

Hi, thanks for your contribution.
Yes that could be a problem, and i'll refactor the lzhs initialization code to fix it (perhaps a quick&dirty singleton initializer guard).
However i am not sure the lookup table should be moved to struct lzhs_ctx, because the tables are fixed/hardcoded, and the lookup table doesn't depend on the input data.

Let's pick the charlen table for example. A match is obtained by looping (incrementing ctx->i) until

huff_charlen[ctx->i]->len == ctx->len &&
huff_charlen[ctx->i]->code == ctx->code

Which means we're looking for an element (inside huff_charlen, which is the hardcoded table) whose len and code match the ones we are looking for.
The search will always produce the same index, because there can be only 1 element with given len/code, no duplicates allowed.

As such, it shouldn't be necessary to store multiple copies of the same lookup table, because they would be filled with the same data.

Multithreading shouldn't be an issue either, because if thread 1 accesses the lookup table while thread 2 writes an index to it (worst case) it will not find a cached index (value -1), and it will do an extra loop.

Going a step further, one could fill the lookup table ahead of time, by:

  • looping for all elements of the table
  • converting each element (code/len pairs) to lookup keys
  • filling the table with index -> key

As you can see, this process is not dependant on the input data we're decoding. It was done the way i did to build the lookup table lazily, as-needed

@boom1
Copy link
Author

boom1 commented Nov 7, 2018

Sorry, it was just a guess that the lookup table would depend on the input data. I must admit that I don't understand the huffman algorithm. I thought that this lookup table is the tree of symbols or something.
Thanks for fixing it.

@smx-smx smx-smx closed this as completed Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants