This repository contains various C optimized version of the Stochastic Gradient Descent x SoftSVM x Polynomial Kernel Method algorithm.
Provided optimizations make use of advanced programming concept such us SIMD parallelism (which is exploited by using Intel SSE and AVX instruction sets) and Cache Blocking.
Implementations were subidived in two groups:
- 32 bit: single precision floating point data and Intel SSE instruction set.
- 64 bit: double precision floating point data and Intel AVX instruction set.
Here-in Bash scripts for compiling and running each version of the algorithm can be found.
Here-in some datasets have been provided.
Here-in Python scripts for assessing algorithm's accuracy have been provided. Binary Classification accuracy is compared with SkLearns's SVC Classifier.
Here-in C and Assembly source code can be found. Note that Assembly code is located under src/algorithm/sse and src/algorithm/avx.
For both 32 bit and 64 bit, the following implementations have been provided:
- C standard SoftSvmPoly[32|64].c;
- SIMD optimization SoftSvmPolySimd[32|64].c;
- SIMD + Cache-Blocking SoftSvmPolyBlockWise[32|64].c;
- SIMD + OpenMP SoftSvmPolySimdOmp[32|64].c.
Tests were done on an Intel (R) i7 9750H CPU.
A 1000-dimensional dataset contaning 951 samples has been used for benchmarking.
Algorithm is run for 1902 iterations.
- SoftSvmPoly32.c required 4.000 s for completing. (96% Accuracy).
- SoftSvmPolySimd32.c required 0.506 s for completing. 8x faster than SoftSvmPoly32.c (96% Accuracy).
- SoftSvmPolyBlockWise32.c required 0.201 s for completing. 20x faster than SoftSvmPoly32.c (96% Accuracy).
- SoftSvmPoly64.c required 4.483 s for completing. (97% Accuracy).
- SoftSvmPolySimd64.c required 0.725 s for completing. 6x faster than SoftSvmPoly64.c (97% Accuracy).
- SoftSvmPolyBlockWise64.c required 0.327 s for completing. 13x faster than SoftSvmPoly64.c (97% Accuracy).