Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low performance on short strings. #2

Closed
alexey-milovidov opened this issue Dec 25, 2019 · 23 comments
Closed

Low performance on short strings. #2

alexey-milovidov opened this issue Dec 25, 2019 · 23 comments

Comments

@alexey-milovidov
Copy link

ClickHouse/ClickHouse#8397 (comment)

The library behaves worse than https://github.com/aklomp/base64
on strings of average length 77 bytes:

:) SELECT avg(length(URL)) FROM test.hits

SELECT avg(length(URL))
FROM test.hits

┌──avg(length(URL))─┐
│ 77.54074297450794 │
└───────────────────┘

You can download the test data here:
https://clickhouse.yandex/docs/en/getting_started/example_datasets/metrica/

@powturbo
Copy link
Owner

powturbo commented Dec 26, 2019

I've optimzed all the Turbo Base64 functions also for short inputs.
It is also possible to disable checking for more faster decoding by compiling Turbo Base64 with

make NCHECK=1

@powturbo
Copy link
Owner

powturbo commented Dec 27, 2019

Now it is also possible to do a direct call to the archtitecture dependent functions instead of tb64enc+tb64dec.
Use _tb64e+_tb64d (after calling initialization tb64ini at the program start) instead of tb64enc+tb64dec.
This saves a check + a non-inlined function call.

@powturbo
Copy link
Owner

powturbo commented Dec 27, 2019

There are now optimized functions for short strings.
Actually only for avx2. Just call the new tb64ini with the parameter isshort = 1 at the start of
the program. See turbob64.h

@alexey-milovidov
Copy link
Author

This saves a check + a non-inlined function call.

Perfect! I will try right now...

@powturbo
Copy link
Owner

Unaligned access is used only for 32-bits integers using the ctou32 macro.
You can see in "conf.h" all the recent cpu's intel/amd, arm 32/64 bits, powerpc,... are supporting unaligned 32-bits access.
If it's necessary, I can replace this macro with memcpy using a preprocessor switch in "turbob64c.c"
and "turbob64d.c"

@alexey-milovidov
Copy link
Author

alexey-milovidov commented Dec 28, 2019

Yes, it's 100% safe from the CPU standpoint but it isn't according to the C or C++ standard.
Replacing with memcpy would not impact performance in any kind.

@powturbo
Copy link
Owner

powturbo commented Dec 29, 2019

I've changed the unaligned access to memcpy and made the decoding 5% more faster.
I'll optimize late SSE, AVX and ARM for short strings.

  • You can now remove the check for 4GB
  • It's also possible to remove the check for the return value in decoding.

@alexey-milovidov
Copy link
Author

Thank you! Let's see what our CI will show...

@powturbo
Copy link
Owner

powturbo commented Dec 31, 2019

You must always include turbob64sse in your builds (cmake files), not only for amd64.
see turbobase64 makefile
Please, upload the latest version. More faster for very short strings.

@alexey-milovidov
Copy link
Author

Ok. BTW, performance test is finished to run and we see significant performance improvement!

https://clickhouse-test-reports.s3.yandex.net/8444/aa979eb438aa06c082d6825d11f40b5e402761a3/performance_test/comparison_min_time_gcc_9.html

(the queires with base64 are near the top)

@alexey-milovidov
Copy link
Author

I'll close this issue as it is completely resolved.
And I would like to especially thank you for your help!

I will try to finish the integration of Turbo-Base64 to our product in the nearest days...

@alexey-milovidov
Copy link
Author

There is one minor issue remains to integrate this library: ClickHouse/ClickHouse#8444 (comment)
If you are interested you can finish and open another PR.

@powturbo
Copy link
Owner

powturbo commented Jan 11, 2020

Hi,
I've added the checking for the short strings decoding functions.
Due to other overheads in ClickHouse, you can't see the full speed advantage of turbobase64,
but in my short strings tests it is 3 times faster in encoding and 3.5-4 times faster in decoding (with checking) than your current base64. see short strings benchmark
I don't know the origin of the base64 strings in ClickHouse, but if they are coming from an external source, a database should normally do a one time check at insertion.
If you're self generating the base64 strings, an additional checking is meaningless.
Please reopen the PR, I'm very interested in this subject.

@powturbo
Copy link
Owner

Turbobase64 now extended to do full checking per default in the short string functions.
Functional tests with full checking are successful.

@alexey-milovidov
Copy link
Author

I have updated the library and also added -DB64CHECK but it didn't help.

@powturbo
Copy link
Owner

You must update the library with latest changes 3 hours ago as in my last comment.
The short strings functions are now doing the check per default.
I've tested this and it's working.

@alexey-milovidov
Copy link
Author

Now it looks all right, thank you!

@alexey-milovidov
Copy link
Author

Congratulations, it's merged!

PS. You can add a reference, tweet, whatever you want!

@powturbo
Copy link
Owner

Great, thank you.
Your team might have a look also at TurboPFor.
There are not only the best/fastest integer compression functions, but also interesting lossless and lossy floating-point compression in fp.c. See also the error-controlled lossy convertion FP functions in bitutil.c
You can build or download icapp to test all functions with your data.

@alexey-milovidov
Copy link
Author

Yes, we have high demand for efficient codecs.
We will try to integrate your libraries and I appreciate if you will help!

(It will be done not by me, this is the task for intern students to try. Will see how it will go...)

@alexey-milovidov
Copy link
Author

BTW, it's possible for libraries with compatible license (not GPL).

@powturbo
Copy link
Owner

powturbo commented Jan 16, 2020

Nice to see you're evaluating the integration of other components.
I can probably make a separate non-GPL package with the functions you're interested in.

qoega added a commit to qoega/Turbo-Base64 that referenced this issue Jul 9, 2020
While running ClickHouse tests on different servers with several sanitizers we encounter this error under UndefinedSanitizer.
```
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
../contrib/base64/turbob64sse.c:418:25: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
    #0 0x1a70894c in cpuisa /build/obj-x86_64-linux-gnu/../contrib/base64/turbob64sse.c:418:25
    powturbo#1 0x1a708b18 in tb64ini /build/obj-x86_64-linux-gnu/../contrib/base64/turbob64sse.c:485:13
    powturbo#2 0x198e52ec in DB::registerFunctionBase64Encode(DB::FunctionFactory&) (/usr/bin/clickhouse+0x198e52ec)
    powturbo#3 0x198b6c75 in DB::registerFunctionsString(DB::FunctionFactory&) (/usr/bin/clickhouse+0x198b6c75)
    powturbo#4 0x1696f486 in DB::registerFunctions() (/usr/bin/clickhouse+0x1696f486)
    powturbo#5 0x15390b4f in DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:220:5
    powturbo#6 0x20c4d36f in Poco::Util::Application::run() /build/obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334:8
    powturbo#7 0x1538fd9d in DB::Server::run() /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:184:25
    powturbo#8 0x153b0eb0 in mainEntryClickHouseServer(int, char**) /build/obj-x86_64-linux-gnu/../programs/server/Server.cpp:1084:20
    powturbo#9 0x1531cdce in main /build/obj-x86_64-linux-gnu/../programs/main.cpp:324:12
    powturbo#10 0x7fd9ab5761e2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x271e2)
    powturbo#11 0x152fb02d in _start (/usr/bin/clickhouse+0x152fb02d)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../contrib/base64/turbob64sse.c:418:25 in
```
I'm running UBSAN build on server with **avx512vl**
```
cpuinfo
flags    : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke
```

To fix signed integer overflow I propose this one char fix. `1u` has uint type and there will be no error.

If you need more context, feel free to ask me about it.

ClickHouse issue: ClickHouse/ClickHouse#12318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants