An implementation of the Fast Multipole Method in C, as decribed in this paper by John Strain.
Create a CauchyMultiplier
struct using
newCauchyMultiplier(double* sources, double* targets, double* input, int n, int precision)
double* fastMultiply(CauchyMultiplier cm)
uses the FMM to apply the Cauchy matrix determined by the sources and targets to the input with precision p. When finished with the calculations, free the CauchyMultiplier struct with freeCauchyMultiplier(CauchyMultiplier cm)
.
Run make test
to generate error, speed, and flops data which will be outputed to output
in .csv
format.
This implementation uses max(log2(n), 2) levels. Also included are some graphs log2(n) levels, just because it is interesting. For matrices of size about 200 and larger this implementation of the fast multipole method uses fewer floating point operations than the naive method (direct multiplication). Interestingly, as far as speed when tested on my computer goes, the degree to which increasing the precision increases the runtime of the program is smaller than expected; the numbers of flops more properly reflects this increase. The main purpose of the speed plot is to confirm that the flop plot can be trusted.
This algorithm runs in O(nlogn) time, however it seems to run close to linear time as shown by the plot of flops/n and flops/(nlogn) below (e.g flops/(nlogn) seems to approach a constant while flops/n increases but at a very slow rate). This is most likely due to how in this implementation the source moments are only computed at the finest level, but the Taylor coeffiecients are computed at every level. Since there is O(logn) levels, and the Taylor coeffiecient evaluation is called about 3n times per level, the resulting O(nlogn) behaviour is expected.
As far as programming goes, binomial coefficients are cached from n = 0 to 2 * p, and powers are accumulated instead of using pow(). Both of these save quite a few of flops. The array of source moments and Taylor coefficients are reallocated on each level which is inefficent, however it probably doesn't have a huge effect of performance. As of now, this implementation only works for Cauchy matricies with entries on (0, 1), however making it work for arbitrary intervals would only add a few flops per cell lookup, (if anyone wants this implemented contact me and I can add it).