In [1]:
import numpy as np

# PLUMED masterclass 21.2: Statistical errors in MD (data generation)

## Initial notes

The data contained in the files <i>uncorrelated_data</i>, <i>correlated_data</i> and <i>weighted_data</i> was generated by sampling from known statistical models.  For those who are interested I have included the python code that I used to generate the data in this notebook.  Please note that the codes in the cells below generate (pseudo) random numbers.  If you run these cells you will thus not get the same list of data points that are in <i>uncorrelated_data</i>, <i>correlated_data</i> and <i>weighted_data</i> that you downloaded from GitHub.

### Uncorrelate data 

The uncorrelated data are all samples from a normal random variable with $\mu=0$ and $\sigma=1$.  The data is generated as follows:

In [None]:
f = open("../data/uncorrelated_data", "w")

f.write("#! FIELDS time rand \n")
for i in range(0,10001):
    f.write(str(i) + " "  + str( np.random.normal(0,1) ) + "\n" )
f.close()

### Correlated data

The correlated data is generated by running the following script.  You can see that the previous random variable in the time series is used when generating the next variable.  The data points are thus clearly correlated.

In [None]:
f = open("../data/correlated_data", "w")

prev = 0.;
f.write("#! FIELDS time rand \n")
for i in range(0,10001):
    new = 0.95*prev + 2*np.random.uniform(0,1) - 1
    f.write( str(i) + " " + str(new/2. + 0.5) + "\n" )
    prev = new
f.close()

## Data with weights

The weighted data is generated by generating normal random variables with $\mu=0.6$ and $\sigma=0.25$.  In the exercise we suppose that the following bias potential is acting on the data:

$$
V(x) = \frac{1}{2} 4(x-0.6)^2
$$

Our reweighting weights counteract the effect of this bias.  We should thus see that the reweighted distribution for the CV approximately uniform.  Masterclass-21-3 explores the theory that this reweighting algorithm is based on in more detail. 

In [None]:
f = open("../data/weighted_data", "w")

n = 0
f.write("#! FIELDS time rand \n")
while True :
    x = np.random.normal( 0.6, 0.5 )
    if (x>=0) & (x<=1) :
        f.write(str(n) + " " + str(x) + "\n")
        n = n + 1
    if n==10001 : break 
f.close()