Implement the library with Vulkan compute engine for os x, linux and windows on amd, nvidia and Intel gpus.
Benchmark on all supported OSs and many major cards.
If Vulkan performance can be on-par with the CUDA implementation then implement with Vulkan for all platforms(os x, linux, windows), and for all Vulkan supported GPUs (including Intel GPUs), otherwise: add OS X support via Vulkan only. Optimized API implementation for (n, 1, 1) - similar to phase I requirements.
Vulkan performance target is between the OpenCL and the Cuda implementation and no less than 80% of Cuda.
For OS X - If Vulkan performance is not good compared to Cuda then fallback to implement Cuda for Nvidia GPUs and OpenCL for Intel + AMD GPUs.
All new implementations: optimize for (n=variable, r=1, p=1) scrypt params.
Implement gpu-post in Vulkan / SpirV. Support any Vulkan 1.0 compatible gpu from all major vendors (Intel, AMD, Nvidia) on linux / windows. Benchmark. Deliverable: library can run on any Vulkan 1.0 compatible GPU with the appropriate drivers installed.
Implement same as above for OS X possibly using MoltenVK https://github.com/KhronosGroup/MoltenVK ? benchmark. Deliverable: library can be built on OS X and run on any GPU (requires just metal support)?
Optimize Vulkan implementation for (n=arbitrary, r=1, p=1) on all supported systems / major gpus and OSes. Performance goal: At least 75% of CUDA gpu-post and equal or better than OpenCL gpu-post on the same gpu.