CUDA implementation of Van Voorhis's optimal sorting network for 16 numbers.
The code is described in this paper: Ouyang M, Sorting sixteen numbers. Proceedings of IEEE High Performance Extreme Computing Conference (HPEC), 2015, 1-6.
To compile: make sort16