Any interest in adding a table based decode?

Hello. I've been enjoying your module and associated writings/docs! May I 👏 and say 'Well done!'?

I'm an R and Rcpp/c++ kinda guy who had never really looked at Python until just a few weeks ago and immediately had a use for your module and have found it easy to use and instructive _but_ since I was exploring I obviously was profiling a ton while learning and found two things that I thought might be of interest.

First let me present some benchmark numbers...

```
*** Large Sample Size (Symbols: 12, Max Code Length: 6, In Bits: 132074, Out Symbols: 42858)

Clock type: CPU
Ordered by: ttot, asc

name                                                                    ncall            tsub      ttot      tavg      
DecodeTableTest.py:81 ba_C_iterdecode                                   10000            0.017626  0.098321  0.000010
DecodeTableTest.py:89 ba_C_decode                                       10000            0.022404  0.111367  0.000011
DecodeTableTest.py:139 baTo01_Py_lookupDict                             10000            0.693772  0.715359  0.000072
DecodeTableTest.py:122 baTo01_Py_lookupTbl_builtin_int                  10000            0.863217  0.885142  0.000089
DecodeTableTest.py:177 rawBytes_Py_lookupTbl_intfrombytes_memoryview    10000            2.719999  4.090594  0.000409
DecodeTableTest.py:195 ba_Py_lookupTbl_builtin_int_baMemoryviewTo01     10000            3.397645  4.807507  0.000481
DecodeTableTest.py:155 ba_Py_lookupTbl_builtin_int_shift                10000            3.415027  4.837891  0.000484
DecodeTableTest.py:97 ba_Py_iterdecode                                  10000            0.033506  13.49097  0.001349
DecodeTableTest.py:105 ba_Py_lookupTbl_ba2int                           10000            3.263269  33.08122  0.003308
DecodeTableTest.py:211 ba_Py_lookupTbl_ba2int_baMemoryview              10000            3.923426  33.38520  0.003339
```
(where `ba` is bitarray, and `baTo01` is the string representation and the table lookups are list lookups of symbol/freq tuples and the memoryview is on the first max code length bits but shifting the whole underlying bitarray)

My first interest was in using a table lookup for the decode instead of a tree so I figured if I compared the Python `iterdecode()` to the C I could get a feel for the gains to be had going from Python to the underlying C and as you already know its a massive improvement (`ba_Py_iterdecode`) and I am thinking that even if a table lookup is a fraction of that improvement then it could possibly blaze past the `C` tree decodes.

I haven't looked into the Python C bindings modules yet and that is my next step (or, preferably, C++ bindings) but if you are interested in it then you know your code and where it'd plug in. I was thinking along the lines of a `.decode(decodeTable)` method where the decodeTable would be treated in the same fashion as the decode dict / decodeTree setup.

------
The other thing I wanted to mention, and you can see in the benchmarks, was `ba2int()` is _slow_. I see what it is doing with the checks and padding and such but thought perhaps I am missing something or perhaps it is something added for convenience that hasn't had usage enough to decide if it is worth working on to make it faster... 🤷 

```
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:19 t_bitarray_ba2int_frombytes                   10000            0.020761  0.094871  0.000009
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:13 t_bitarray_ba2int_fromstrings                 10000            0.013938  0.077259  0.000008
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:41 t_builtin_int_frombytes                       10000            0.011092  0.016091  0.000002
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:36 t_builtin_int_fromstrings                     10000            0.006036  0.006036  0.000001
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any interest in adding a table based decode? #153

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Any interest in adding a table based decode? #153

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions