Hello. I've been enjoying your module and associated writings/docs! May I 👏 and say 'Well done!'?
I'm an R and Rcpp/c++ kinda guy who had never really looked at Python until just a few weeks ago and immediately had a use for your module and have found it easy to use and instructive but since I was exploring I obviously was profiling a ton while learning and found two things that I thought might be of interest.
First let me present some benchmark numbers...
*** Large Sample Size (Symbols: 12, Max Code Length: 6, In Bits: 132074, Out Symbols: 42858)
Clock type: CPU
Ordered by: ttot, asc
name ncall tsub ttot tavg
DecodeTableTest.py:81 ba_C_iterdecode 10000 0.017626 0.098321 0.000010
DecodeTableTest.py:89 ba_C_decode 10000 0.022404 0.111367 0.000011
DecodeTableTest.py:139 baTo01_Py_lookupDict 10000 0.693772 0.715359 0.000072
DecodeTableTest.py:122 baTo01_Py_lookupTbl_builtin_int 10000 0.863217 0.885142 0.000089
DecodeTableTest.py:177 rawBytes_Py_lookupTbl_intfrombytes_memoryview 10000 2.719999 4.090594 0.000409
DecodeTableTest.py:195 ba_Py_lookupTbl_builtin_int_baMemoryviewTo01 10000 3.397645 4.807507 0.000481
DecodeTableTest.py:155 ba_Py_lookupTbl_builtin_int_shift 10000 3.415027 4.837891 0.000484
DecodeTableTest.py:97 ba_Py_iterdecode 10000 0.033506 13.49097 0.001349
DecodeTableTest.py:105 ba_Py_lookupTbl_ba2int 10000 3.263269 33.08122 0.003308
DecodeTableTest.py:211 ba_Py_lookupTbl_ba2int_baMemoryview 10000 3.923426 33.38520 0.003339
(where ba is bitarray, and baTo01 is the string representation and the table lookups are list lookups of symbol/freq tuples and the memoryview is on the first max code length bits but shifting the whole underlying bitarray)
My first interest was in using a table lookup for the decode instead of a tree so I figured if I compared the Python iterdecode() to the C I could get a feel for the gains to be had going from Python to the underlying C and as you already know its a massive improvement (ba_Py_iterdecode) and I am thinking that even if a table lookup is a fraction of that improvement then it could possibly blaze past the C tree decodes.
I haven't looked into the Python C bindings modules yet and that is my next step (or, preferably, C++ bindings) but if you are interested in it then you know your code and where it'd plug in. I was thinking along the lines of a .decode(decodeTable) method where the decodeTable would be treated in the same fashion as the decode dict / decodeTree setup.
The other thing I wanted to mention, and you can see in the benchmarks, was ba2int() is slow. I see what it is doing with the checks and padding and such but thought perhaps I am missing something or perhaps it is something added for convenience that hasn't had usage enough to decide if it is worth working on to make it faster... 🤷
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:19 t_bitarray_ba2int_frombytes 10000 0.020761 0.094871 0.000009
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:13 t_bitarray_ba2int_fromstrings 10000 0.013938 0.077259 0.000008
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:41 t_builtin_int_frombytes 10000 0.011092 0.016091 0.000002
/home/thell/Projects/HuffmanTestPy/bit2intTesting.py:36 t_builtin_int_fromstrings 10000 0.006036 0.006036 0.000001
Hello. I've been enjoying your module and associated writings/docs! May I 👏 and say 'Well done!'?
I'm an R and Rcpp/c++ kinda guy who had never really looked at Python until just a few weeks ago and immediately had a use for your module and have found it easy to use and instructive but since I was exploring I obviously was profiling a ton while learning and found two things that I thought might be of interest.
First let me present some benchmark numbers...
(where
bais bitarray, andbaTo01is the string representation and the table lookups are list lookups of symbol/freq tuples and the memoryview is on the first max code length bits but shifting the whole underlying bitarray)My first interest was in using a table lookup for the decode instead of a tree so I figured if I compared the Python
iterdecode()to the C I could get a feel for the gains to be had going from Python to the underlying C and as you already know its a massive improvement (ba_Py_iterdecode) and I am thinking that even if a table lookup is a fraction of that improvement then it could possibly blaze past theCtree decodes.I haven't looked into the Python C bindings modules yet and that is my next step (or, preferably, C++ bindings) but if you are interested in it then you know your code and where it'd plug in. I was thinking along the lines of a
.decode(decodeTable)method where the decodeTable would be treated in the same fashion as the decode dict / decodeTree setup.The other thing I wanted to mention, and you can see in the benchmarks, was
ba2int()is slow. I see what it is doing with the checks and padding and such but thought perhaps I am missing something or perhaps it is something added for convenience that hasn't had usage enough to decide if it is worth working on to make it faster... 🤷