-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string getitem methods are slow #4694
Comments
Couple of reasons:
looks like the perf hit is about 2x, might be able to squash that by moving string methods to cython |
Ah, that'll do it (I guess it's only sometimes apply doesn't care about errors?). maybe cythonizing these is the way forward, I guess even with object dtype you get some perf improvement. |
these methods could be much faster (there is an issue out there about this) if you basically push everything to use native c calls (eg stuff like strcmp and such) or maybe add a nice c library in the mix just cythonizing doesn't help much but this would be a bit of work |
wonder if this is worth looking into: http://bstring.sourceforge.net/ |
If you're going to c level, better to use a c library that handles strings |
darn bstring doesn't support unicdoe |
Converting these functions to C without breakage is going to be very difficult. You'll probably have to use ICU and have a compatibility layer between Cython (PyICU might make this a bit easier) and ICU. We definitely cannot use the C standard library string functions since they don't handle Unicode. |
Is anyone working on this currently? Would I be duplicating effort if I were to look into possible quick wins for at least getting these slow |
@brandon-rhodes don't think so. would be great! prob DO need some asv benchmarks for these. |
Is this still an ongoing effort? I would like to give it a try |
@3vts Feel free to give it a try! I did not wind up with time to make progress on it, and my guess is that the project I was on that needed the extra performance found a workaround. To be honest, I had, alas, forgotten all about it in the intervening years. |
This is fairly fast with the new pyarrow string type which have a lot of benefits over the string object implementation so closing for now. Can reopen if there are specific hotspots that are addressable |
related #2802
It seems that str[1] is significantly slower than
.apply(lambda x: x[1])
See this So answer http://stackoverflow.com/a/18473330/1240268
The text was updated successfully, but these errors were encountered: