Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
utf8 decoding in dbapi looks like it could be sped up #473
I was doing some timing tests loading columns from the db comparing sqlalchemy and using the dbapi connection directly, and I discovered that it appears that psycopg2's utf-8 decoding is noticeably slower than the "fallback" decoding provided by sqlalchemy's c extensions.
Here is some info from Mike Bayer's test:
And here is some explanation of why sqlalchemy has its own c extension for unicode decoding.
Don't know if this type of optimization is worthwhile to you but just letting you know.
The full thread is here:
Thank you for the pointer, will take a look.
On my workplace in the past caching the code actually provided some good speedup. The codec was only utf8 there and the python version was well known. In psycopg there is more variability, but probably there are fast paths deserving a look.
I've played a bit with the idea and I thing it's good stuff.
I've tried a quick test: storing a pointer to a fast C decode function for known codec in the connection (e.g. for an utf8 connection store the pointer to
Because the overhead paid for the codec lookup is per string, not per data size, the improvement is more relevant decoding the same amount of data, but in more, shorter strings: 55% for 4M of 100B strings:
Other things to do:
If someone wants contribute to the idea, the first commit is in this branch. Any feedback or help is welcome.