lzhs_scanner: missing call to lzhs_init_lookup() #13

boom1 · 2018-10-31T14:12:01Z

In the tool lzhs_scanner, function scan_lzhs() is calling lzhs_decode() without calling lzhs_init_lookup() first.
(Probably the lookup tables should become part of the struct lzhs_ctx.)

diff -ru epk2extract/src/tools/lzhs_scanner.c epk2extract-master/src/tools/lzhs_scanner.c
--- epk2extract/src/tools/lzhs_scanner.c        2018-10-31 15:07:27.754211489 +0100
+++ epk2extract-master/src/tools/lzhs_scanner.c 2018-10-31 12:06:27.649465940 +0100
@@ -65,6 +65,7 @@
 
                                uint8_t out_checksum;
                                asprintf(&outdecode, "%s/%s_file%d.unlzhs", dirn, filen, count);
+                               lzhs_init_lookup();
                                lzhs_decode(out, 0, outdecode, &out_checksum);
                                if(extract == 2 && out_checksum != header->checksum){
                                        printf("Checksum Mismatch, Skipping\n");

The text was updated successfully, but these errors were encountered:

smx-smx · 2018-11-07T14:45:49Z

Hi, thanks for your contribution.
Yes that could be a problem, and i'll refactor the lzhs initialization code to fix it (perhaps a quick&dirty singleton initializer guard).
However i am not sure the lookup table should be moved to struct lzhs_ctx, because the tables are fixed/hardcoded, and the lookup table doesn't depend on the input data.

Let's pick the charlen table for example. A match is obtained by looping (incrementing ctx->i) until

huff_charlen[ctx->i]->len == ctx->len &&
huff_charlen[ctx->i]->code == ctx->code

Which means we're looking for an element (inside huff_charlen, which is the hardcoded table) whose len and code match the ones we are looking for.
The search will always produce the same index, because there can be only 1 element with given len/code, no duplicates allowed.

As such, it shouldn't be necessary to store multiple copies of the same lookup table, because they would be filled with the same data.

Multithreading shouldn't be an issue either, because if thread 1 accesses the lookup table while thread 2 writes an index to it (worst case) it will not find a cached index (value -1), and it will do an extra loop.

Going a step further, one could fill the lookup table ahead of time, by:

looping for all elements of the table
converting each element (code/len pairs) to lookup keys
filling the table with index -> key

As you can see, this process is not dependant on the input data we're decoding. It was done the way i did to build the lookup table lazily, as-needed

boom1 · 2018-11-07T16:23:51Z

Sorry, it was just a guess that the lookup table would depend on the input data. I must admit that I don't understand the huffman algorithm. I thought that this lookup table is the tree of symbols or something.
Thanks for fixing it.

smx-smx closed this as completed Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lzhs_scanner: missing call to lzhs_init_lookup() #13

lzhs_scanner: missing call to lzhs_init_lookup() #13

boom1 commented Oct 31, 2018

smx-smx commented Nov 7, 2018

boom1 commented Nov 7, 2018

lzhs_scanner: missing call to lzhs_init_lookup() #13

lzhs_scanner: missing call to lzhs_init_lookup() #13

Comments

boom1 commented Oct 31, 2018

smx-smx commented Nov 7, 2018

boom1 commented Nov 7, 2018