-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance on short strings. #2
Comments
I've optimzed all the Turbo Base64 functions also for short inputs.
|
Now it is also possible to do a direct call to the archtitecture dependent functions instead of tb64enc+tb64dec. |
There are now optimized functions for short strings. |
Perfect! I will try right now... |
We run all our tests with UBSan and it said that better to replace unaligned stores with memcpy: |
Unaligned access is used only for 32-bits integers using the ctou32 macro. |
Yes, it's 100% safe from the CPU standpoint but it isn't according to the C or C++ standard. |
I've changed the unaligned access to memcpy and made the decoding 5% more faster.
|
Thank you! Let's see what our CI will show... |
You must always include turbob64sse in your builds (cmake files), not only for amd64. |
Ok. BTW, performance test is finished to run and we see significant performance improvement! (the queires with base64 are near the top) |
I'll close this issue as it is completely resolved. I will try to finish the integration of Turbo-Base64 to our product in the nearest days... |
There is one minor issue remains to integrate this library: ClickHouse/ClickHouse#8444 (comment) |
Hi, |
Turbobase64 now extended to do full checking per default in the short string functions. |
I have updated the library and also added |
You must update the library with latest changes 3 hours ago as in my last comment. |
Now it looks all right, thank you! |
Congratulations, it's merged! PS. You can add a reference, tweet, whatever you want! |
Great, thank you. |
Yes, we have high demand for efficient codecs. (It will be done not by me, this is the task for intern students to try. Will see how it will go...) |
BTW, it's possible for libraries with compatible license (not GPL). |
Nice to see you're evaluating the integration of other components. |
While running ClickHouse tests on different servers with several sanitizers we encounter this error under UndefinedSanitizer. ``` Logging trace to /var/log/clickhouse-server/clickhouse-server.log Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log ../contrib/base64/turbob64sse.c:418:25: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' #0 0x1a70894c in cpuisa /build/obj-x86_64-linux-gnu/../contrib/base64/turbob64sse.c:418:25 powturbo#1 0x1a708b18 in tb64ini /build/obj-x86_64-linux-gnu/../contrib/base64/turbob64sse.c:485:13 powturbo#2 0x198e52ec in DB::registerFunctionBase64Encode(DB::FunctionFactory&) (/usr/bin/clickhouse+0x198e52ec) powturbo#3 0x198b6c75 in DB::registerFunctionsString(DB::FunctionFactory&) (/usr/bin/clickhouse+0x198b6c75) powturbo#4 0x1696f486 in DB::registerFunctions() (/usr/bin/clickhouse+0x1696f486) powturbo#5 0x15390b4f in DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:220:5 powturbo#6 0x20c4d36f in Poco::Util::Application::run() /build/obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334:8 powturbo#7 0x1538fd9d in DB::Server::run() /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:184:25 powturbo#8 0x153b0eb0 in mainEntryClickHouseServer(int, char**) /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:1084:20 powturbo#9 0x1531cdce in main /build/obj-x86_64-linux-gnu/../programs/main.cpp:324:12 powturbo#10 0x7fd9ab5761e2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x271e2) powturbo#11 0x152fb02d in _start (/usr/bin/clickhouse+0x152fb02d) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../contrib/base64/turbob64sse.c:418:25 in ``` I'm running UBSAN build on server with **avx512vl** ``` cpuinfo flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke ``` To fix signed integer overflow I propose this one char fix. `1u` has uint type and there will be no error. If you need more context, feel free to ask me about it. ClickHouse issue: ClickHouse/ClickHouse#12318
ClickHouse/ClickHouse#8397 (comment)
The library behaves worse than https://github.com/aklomp/base64
on strings of average length 77 bytes:
You can download the test data here:
https://clickhouse.yandex/docs/en/getting_started/example_datasets/metrica/
The text was updated successfully, but these errors were encountered: