You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be possible to fuse the _unicode and _byte functions? Apart from fetching the next element (and technically the hard-coded sizeof's), both cython code paths are identical, I think. The pure-python is 100% identical. I'm not familiar with cython and the code seems more complicated than the standard ahocorasick implementation, so maybe not.. I'm thinking of some lookup table matching type -> get_next_element_function. Perhaps then the container type can by extended to any (python) sequence type and the contained type can be any (python) comparable type. Cython automatically selects the fastest type, I think, so str->UCS4, bytes->char, int->int?, bool->bint, ->.
I'd also like to support the c bitarray extension module, which is a char*-backed List[Bool], without the intermediate boxing of bit to python boolean. Any ideas?
The text was updated successfully, but these errors were encountered:
Good question. I remember thinking about merging the two at some point, but given how critical the performance is here, ended up optimising them separately. I even recall making them more similar again at some point, but that was already a while ago...
I would expect bitarray to export a buffer, which could then be unpacked and used in acora. Since the bytes type (and Python's bytearray, array.array, memoryview and others) supports the buffer interface as well, it should be enough to switch the current bytes implementation to char[:] buf (or unsigned char[:]?) as input and pass &buf[0] and length.
Would it be possible to fuse the _unicode and _byte functions? Apart from fetching the next element (and technically the hard-coded sizeof's), both cython code paths are identical, I think. The pure-python is 100% identical. I'm not familiar with cython and the code seems more complicated than the standard ahocorasick implementation, so maybe not.. I'm thinking of some lookup table matching type -> get_next_element_function. Perhaps then the container type can by extended to any (python) sequence type and the contained type can be any (python) comparable type. Cython automatically selects the fastest type, I think, so str->UCS4, bytes->char, int->int?, bool->bint, ->.
I'd also like to support the c bitarray extension module, which is a char*-backed List[Bool], without the intermediate boxing of bit to python boolean. Any ideas?
The text was updated successfully, but these errors were encountered: