Is is capable of large input size, i.e. n = 131072 numbers and l = 1024 bits for each number. The program is designed to run on a CUDA-capable GPU only. Time complexity is a stunning O(n*l). Final project for GPU computing course at WUT.
We can see here 4 distinct binary strings. 3rd and 4th string differ by two digits. 3rd and 5th string differ by one digit, so it is a pair that we are interested in. Checking every possible pair gives us O(ln^2) complexity, which is way less efficient than this implementation's O(nl).