In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import tidynamics

In [None]:
def autocorrelation(data):

    N = len(data)

    norm = np.fft.fft(np.ones(N))
    norm = np.fft.ifft(norm*np.conj(norm))
    norm = np.real(norm).astype(int)

    fourier = np.fft.fft(data-np.mean(data))
    result  = np.fft.ifft(fourier*np.conj(fourier))
    result  = np.divide(result,norm)
    result /= np.var(data)

    return np.real(result)[0:N//2]


### Generate test data

The test data is a random signal with zero mean, to take into
account the removal of the mean in `autocorrelation`.

In [None]:
N = 128
data_1 = np.random.random(size=N)
data_1 -= data_1.mean()

In [None]:
np_cor = np.correlate(data_1, data_1, mode='full')[N-1:]
np_cor /= np_cor[0]
np_cor = np_cor * N/(N-np.arange(N))
plt.plot(np_cor)

plt.plot(autocorrelation(data_1))

The signals seem close enough, let's have a closer look by plotting the difference between the two.

In [None]:
plt.plot(np_cor[:N//2] - autocorrelation(data_1))

It appears that the difference grows for longer times, confirming that the
issue of normalizing the FFT result by the number of items in the
corresponding sum is the issue with the routine.

In [None]:
plt.plot(np_cor[:N//2])
plt.plot(autocorrelation(data_1) * (N/(N-np.arange(N)))[:N//2])
plt.plot(autocorrelation(data_1))

### The norm

Let's have a look at the norm variable in the routine.

In [None]:
def get_norm(data):
    N = len(data)
    norm = np.fft.fft(np.ones(N))
    norm = np.fft.ifft(norm*np.conj(norm))
    norm = np.real(norm).astype(int)
    return norm

In [None]:
get_norm(data_1)

Ok, so `norm` is just a constant given by the length of the data. We can replace it by `N`.

In [None]:
def autocorrelation_no_norm(data):

    N = len(data)

    fourier = np.fft.fft(data-np.mean(data))
    result  = np.fft.ifft(fourier*np.conj(fourier))
    result /= np.var(data)

    return np.real(result)[0:N//2]/N


In [None]:
plt.plot(autocorrelation_no_norm(data_1))
plt.plot(autocorrelation(data_1))

Confirmed, no difference without the norm variable.

Other point, the correlation with the Fourier transform assumes periodic
data. It is possible to circumvent this by adding zero to the data, so that
the "periodic" interaction of the data with itself is effectively 0.

In [None]:
data_2 = np.concatenate([data_1, np.zeros(N)])

In [None]:
np_cor_data_2 = np.correlate(data_2, data_2, mode='full')[2*N-1:]
np_cor_data_2 /= np_cor_data_2[0]
np_cor_data_2 = np_cor_data_2[:N]

plt.plot(autocorrelation_no_norm(data_2))
plt.plot(np_cor_data_2)


In [None]:
plt.plot(autocorrelation_no_norm(data_2)-np_cor_data_2)


This final result, where the difference between the signal is of about
$10^{-16}$ corresponds to the machine precision and thus roundoff errors.

### The catch

NumPy's [`correlate`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html#numpy.correlate)
computes the following formula:

$$c_k = \sum_n a_{n+k}\ v^\ast_{n}$$

whereas the number of summation items in $c_k$ is $N-k$ (due to the zero padding).

The "physics" correlation is the average of the product of the variables and to
obtain the average, one must "correct" the result by dividing the elements not by $N$ but by $N-k$.

In [None]:
def autocorrelation_no_norm_proper_average(data):

    N = len(data)

    fourier = np.fft.fft(data-np.mean(data))
    result  = np.fft.ifft(fourier*np.conj(fourier))
    result /= np.var(data)
    result /= (N - np.arange(N))

    return np.real(result)[0:N//2]


In [None]:
plt.plot(autocorrelation_no_norm_proper_average(data_2))
tiny_acf = tidynamics.acf(data_2)
plt.plot(tiny_acf/tiny_acf[0])

### Summary

The routine `autocorrelation` lacked the padding with zeros and the proper normalization.

The final routine `autocorrelation_no_norm_proper_average` still does not automatically
perform the padding and I humbly suggest to use tidynamics instead :-)