This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Description
Problem
AVX instructions is are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel. It was used to load vector data from main memory to register and do calculation. With cuda build, we load all data to GPU memory and all calculation is done by GPU so build with AVX is not necessary.
I tested performance of example server in RTX 4090 and get this result (each build type run 3 times with total 100 request parallel) when offload all to GPU:
- no avx: 219 - 220 -230 tokens/s
- avx2: 220 - 230 - 214 tokens/s
- avx: 231 - 219 - 214 tokens/s
Success Criteria
Add build cuda without any avx and remove build cuda with avx, avx2, avx 512