We would like to know the sample variance of the daily changes in value against the USD for the Euro
(EUR) and the Japanese Yen (JPY). We also want to find the p-value for the hypothesis that EUR and JPY have the same volatility.

To start off this problem, we read the files and prepare datasets as usual:

In [None]:
# Read the csv files
if os.path.exists("EURtoUSD.csv"): 
    eur = pd.read_csv("EURtoUSD.csv")
if os.path.exists("USDtoJPY.csv"): 
    jpy = pd.read_csv("USDtoJPY.csv")

In [None]:
# Prepare data
eur['Change %'] = eur['Change %'].str.rstrip('%').astype('float') 
jpy['Change %'] = jpy['Change %'].str.rstrip('%').astype('float') 

In [None]:
x = eur['Change %'].to_numpy()
y = jpy['Change %'].drop([8229, 8230, 8231, 8232]).to_numpy()

Now, it's time to calculate the variances. There are many situations where we might want to choose the process with smaller variability for a variable of interest. This is why finding and comparing the variances is important. One thing to consider here is that we stripped data out of its percentage signs for the compatibility with numpy, therefore the variance we find will be divided by $10^4$.

In [None]:
var_x = np.var(x, ddof=1)
var_y = np.var(y, ddof=1)

print(var_x / (10 * 10 * 10 * 10))
print(var_y / (10 * 10 * 10 * 10))

3.587397931854963e-05
4.37880288952247e-05


In order to use the F-test, the samples must come from a normal distribution. The Central Limit Theorem applies to sample means, as it did in the previous question, not to the data. Hence, the fact that the sample size is large solely does not mean we can assume the data come from a normal distribution. For this example, we will assume that they are normally distributed.

In order to compare the variances, the hypotheses are as following:

$H_0 : \sigma_x / \sigma_y = 1$

$H_1: \sigma_x / \sigma_y \neq 1 $ 

The inequality indicates a two-tailed test. The test statistic F will be found by the formula:

$F = s_1^2 / s_2^2$ 

where $s_1^2$ and $s_2^2$ are sample variances of x and y respectively.

The hypothesis that the two variances are equal will be rejected if:

$F > F_{α/2,N_1−1,N_2−1}$

where $F_{α, N_1-1, N_2-1}$ is the critical value of the F distribution with $N_1-1$ and $N_2-1$ degrees of freedom and a significance level of $α$.

**Regarding the following calculation**

This function works only when the first sample variance is larger than the second sample variance. Because of this, we defined the two samples in a way that they work with the function.

The F test statistic is calculated as $F = s_1^2 / s_2^2$ . By default, numpy.var calculates the population variance, that's why our code uses var(x) and var(y) to calculate F.

In [None]:
#calculate the test statistic
F = np.var(y) / np.var(x)
print(F)

1.220665945827109


Now we will find the p-value. It will correspond to 1 – cdf of the F distribution with numerator degrees of freedom $N_2-1$ and denominator degrees of freedom $N_1-1$.

In [None]:
#calculate the p value
d_f1 = y.size - 1 #numerator 
d_f2 = x.size - 1 #denominator 
p = 1 - scipy.stats.f.cdf(F, d_f1, d_f2) 
print(p)

1.1102230246251565e-16


The F test statistic is 1.22067 and the corresponding p-value is 1.110223e-16. Since this p-value is less than .05, we should reject the null hypothesis. This means we have enough evidence to say that the two population variances are not equal.