Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoongArch LSX and LASX support #2159

Merged
merged 2 commits into from Apr 4, 2024
Merged

Conversation

MQ-mengqing
Copy link
Contributor

LSX is 128-bit and LASX is 256-bit. They can be supported in LoongArch CPU.
LSX is more widely supported SIMD instruction set than LASX.
The expected gcc version is 14.1 and later. The expected linux kernel version
is 6.6 and later. The expected llvm version is 17 and later.

@MQ-mengqing
Copy link
Contributor Author

The fallback result is,

$ cd benchmark/ ; ./bench_ondemand --benchmark_min_time=30 --benchmark_filter=partial_tweets\<simdjson_ondemand\>
2024-03-27T15:27:31+08:00
Running ./bench_ondemand
Run on (4 X 2500 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x4)
  L1 Data 64 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 16384 KiB (x1)
Load Average: 1.13, 1.95, 1.61
simdjson::dom implementation:      fallback
simdjson::ondemand implementation (stage 1): fallback
simdjson::ondemand implementation (stage 2): fallback
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
partial_tweets<simdjson_ondemand>/manual_time    2079032 ns      2070087 ns        20529 best_bytes_per_sec=314.532M best_docs_per_sec=498.06 best_items_per_sec=49.806k bytes=631.515k bytes_per_second=289.683M/s docs_per_sec=480.993/s items=100 items_per_second=48.0993k/s [BEST: throughput=  0.31 GB/s doc_throughput=   498 docs/s items=       100 avg_time=   2079031 ns]

With LASX,

$ mkdir build-lasx
$ cd build-lasx
$ cmake -D SIMDJSON_DEVELOPER_MODE=ON -D SIMDJSON_PREFER_LSX=OFF ..
$ cmake --build . -j4
$ ctest
100% tests passed, 0 tests failed out of 99

Label Time Summary:
acceptance             =  65.84 sec*proc (69 tests)
assert                 =   0.01 sec*proc (1 test)
compile                =  30.89 sec*proc (12 tests)
dom                    =   2.56 sec*proc (15 tests)
explicitonly           =   0.01 sec*proc (1 test)
no_mingw               =  80.38 sec*proc (36 tests)
ondemand               =  16.96 sec*proc (35 tests)
other                  =   0.36 sec*proc (4 tests)
per_implementation     =   9.22 sec*proc (50 tests)
quickstart             =  30.89 sec*proc (12 tests)
quickstart_ondemand    =  12.88 sec*proc (5 tests)
singleheader           =   5.29 sec*proc (3 tests)

Total Test time (real) = 116.61 sec
$ cd benchmark/ ; ./bench_ondemand --benchmark_min_time=30 --benchmark_filter=partial_tweets\<simdjson_ondemand\>
2024-03-27T15:02:57+08:00
Running ./bench_ondemand
Run on (4 X 2500 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x4)
  L1 Data 64 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 16384 KiB (x1)
Load Average: 0.39, 1.76, 1.34
simdjson::dom implementation:      lasx
simdjson::ondemand implementation (stage 1): lasx
simdjson::ondemand implementation (stage 2): lasx
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
partial_tweets<simdjson_ondemand>/manual_time     527599 ns       568537 ns        79456 best_bytes_per_sec=1.21989G best_docs_per_sec=1.93169k best_items_per_sec=193.169k bytes=631.515k bytes_per_second=1.11476G/s docs_per_sec=1.89538k/s items=100 items_per_second=189.538k/s [BEST: throughput=  1.22 GB/s doc_throughput=  1931 docs/s items=       100 avg_time=    527598 ns]

With LSX,

$ mkdir build-lsx
$ cd build-lsx
$ cmake -D SIMDJSON_DEVELOPER_MODE=ON ..
$ cmake --build . -j4
$ ctest
100% tests passed, 0 tests failed out of 98

Label Time Summary:
acceptance             =  57.87 sec*proc (69 tests)
assert                 =   0.00 sec*proc (1 test)
compile                =  32.34 sec*proc (12 tests)
dom                    =   2.65 sec*proc (15 tests)
explicitonly           =   0.00 sec*proc (1 test)
no_mingw               =  75.94 sec*proc (36 tests)
ondemand               =  14.71 sec*proc (35 tests)
other                  =   0.36 sec*proc (4 tests)
per_implementation     =   7.63 sec*proc (50 tests)
quickstart             =  32.34 sec*proc (12 tests)
quickstart_ondemand    =  13.72 sec*proc (5 tests)
singleheader           =   3.54 sec*proc (3 tests)

Total Test time (real) = 101.16 sec
$ cd benchmark/ ; ./bench_ondemand --benchmark_min_time=30 --benchmark_filter=partial_tweets\<simdjson_ondemand\>
2024-03-27T15:15:28+08:00
Running ./bench_ondemand
Run on (4 X 2500 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x4)
  L1 Data 64 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 16384 KiB (x1)
Load Average: 1.36, 1.95, 1.54
simdjson::dom implementation:      lsx
simdjson::ondemand implementation (stage 1): lsx
simdjson::ondemand implementation (stage 2): lsx
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
partial_tweets<simdjson_ondemand>/manual_time     497467 ns       540448 ns        80073 best_bytes_per_sec=1.28521G best_docs_per_sec=2.03513k best_items_per_sec=203.513k bytes=631.515k bytes_per_second=1.18228G/s docs_per_sec=2.01018k/s items=100 items_per_second=201.018k/s [BEST: throughput=  1.29 GB/s doc_throughput=  2035 docs/s items=       100 avg_time=    497466 ns]

@lemire
Copy link
Member

lemire commented Mar 29, 2024

@MQ-mengqing Any chance of running these kernels in CI? I realize that Microsoft may not provide the hardware, but is it available through an emulator (e.g., qemu)?

I expect we will merge but we want to be able to test it in CI otherwise it is difficult to support it on the long run.

@MQ-mengqing
Copy link
Contributor Author

but is it available through an emulator (e.g., qemu)?

The latest version of Qemu already support LASX and LSX.
I have tested it on "Debian GNU/Linux 12".

// Step1, install cross compiler.
$ cd ~
$ wget https://github.com/MQ-mengqing/CrossCompiler/releases/download/v0.1.0/cross-tools.tar.gz
$ tar -zxvf cross-tools.tar.gz
// It was decompressed to "cross-tools".
$ export MY_CROSS_BASE_PATH=`pwd`/cross-tools
// Test it can be executed.
$ ${MY_CROSS_BASE_PATH}/bin/loongarch64-unknown-linux-gnu-c++ -v -mlasx

// Step2, set qemu env
$ cd ~
$ git clone https://gitlab.com/qemu-project/qemu.git
$ cd qemu
$ git submodule init
$ git submodule update --recursive
$ mkdir build
$ cd build
// You may need install ninja, libglib2.0-0 and libglib2.0-dev if in debian12.
$ ../configure --target-list=loongarch64-linux-user
$ ninja
// Write binfmt
# echo ":qemu-loongarch64:M:0:\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x02\x01:\xff\xff\xff\xff\xff\xfe\xfe\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:`pwd`/qemu-loongarch64:" > /proc/sys/fs/binfmt_misc/register
// Make sure the binfmt has been writen into system.
$ cat /proc/sys/fs/binfmt_misc/qemu-loongarch64

// Step3, build, assume that the simdjson has been copied.
$ cd ~
$ cd simdjson
$ mkdir build
$ cd build
// Set cross build cmake file
$ echo -e "set(CMAKE_SYSTEM_NAME Linux)\n" > cross.cmake
$ echo "set(TOOLCHAIN_PATH ${MY_CROSS_BASE_PATH})" >> cross.cmake
$ echo "set(CMAKE_C_COMPILER ${MY_CROSS_BASE_PATH}/bin/loongarch64-unknown-linux-gnu-gcc)" >> cross.cmake
$ echo "set(CMAKE_CXX_COMPILER ${MY_CROSS_BASE_PATH}/bin/loongarch64-unknown-linux-gnu-g++)" >> cross.cmake
// -static: Build static for run tests without share libraries.
// -mlsx: Test LSX. (-mlasx for test LASX)
// -Wno-error=template-id-cdtor: avoid compile error "template-id not allowed for constructor in C++20"
$ cmake -D SIMDJSON_DEVELOPER_MODE=ON -DCMAKE_TOOLCHAIN_FILE=./cross.cmake -DCMAKE_C_FLAGS="-static -mlsx -Wno-error=template-id-cdtor" -DCMAKE_CXX_FLAGS="-static -mlsx -Wno-error=template-id-cdtor" ..
// finaly run ctest
$ ctest
// The result is
//     99% tests passed, 1 tests failed out of 99
//     31 - simdjson_force_implementation_error (Failed)
// It doesn't go wrong on native machine, and I guess it failed due to environment.

@lemire
Copy link
Member

lemire commented Apr 2, 2024

@MQ-mengqing I'll merge this. I just need to see about testing.

@lemire
Copy link
Member

lemire commented Apr 3, 2024

@MQ-mengqing Can you review this PR:

https://github.com/simdjson/simdjson/pull/2160/files

I would be ok with not running the tests, but I'd like to at least verify that the code builds (e.g., by cross-compiling).

Otherwise it is very difficult to support the code long term.

@lemire
Copy link
Member

lemire commented Apr 3, 2024

@MQ-mengqing Can you sync your fork with our main branch so that the CI tests try to compile your code? I would like to see your code compiling in CI automatically before merging it.

As I stated earlier, if we don't even compile the code, we have no chance of being able to spot problems. We need something like a build test, at least.

@lemire
Copy link
Member

lemire commented Apr 4, 2024

@MQ-mengqing You may need to sync once more since I have applied the fix your suggest on the main branch. Sorry about that.

@lemire
Copy link
Member

lemire commented Apr 4, 2024

Running tests.

@lemire
Copy link
Member

lemire commented Apr 4, 2024

Merging. I will be issuing a release.

@lemire lemire merged commit 4c98e51 into simdjson:master Apr 4, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants