A demonstration of speeding up a 1D convolution using SSE
Information about the implementations is provided in convolve.c, which contains the interesting code.
Build it using
Test it with the python script
py_test_convolve.py. This checks the output
and prints out the times taken for each implementation and the flops estimate.
The test_convolve c script seems to be broken (feel free to submit a patch).