New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower string manipulation performance than CPython #7535
Comments
Thank you for your interest in Numba. Currently, the unicode support in Numba is primitive and it's not performing well for application relying on string manipulation. IIRC, it is particularly bad on code that requires processing on individual characters such as the As for the performance limitation, i think the reference counting operations are preventing optimizations. In CPython, many of the implementation can access the underlying buffer directly and bypass any reference counted operations. Numba needs a similar direct access to the chars. Lastly, beside Intel SDC, I only know of bodo for pandas support. (ping @ehsantn) |
Recently, I saw SDC gives a str_ext implementation, the core idea is like below:
It seems sdc uses But I don't know whether this kind of operations will unbox python unicode type to numba UnicodeType, and box back when call I also find some Any comments are welcome, or any numba discussion group can let me in, haha. thanks. |
This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed. |
close this. I think no many users will care for string operation performance when using numba, we more focus on numeric computing. Feel free to reopen it. |
Feature request
After investigating Numba usage and internals, I want to popularize the usage of Numba in my work group.
However, our existing Python bussiness code is often related with user-defined-functions using Pandas DataFrame and string manipulation.
I know Numba is excellent at numeric manipulation, not good at string stuff.
I look into Numba internals, it already has Python unicode type support, but with much slower speed than CPython.
I wonder:
For example, I tested a small snippet of code:
The text was updated successfully, but these errors were encountered: