# Moon Illusion 
One sample and paired t-test examples<br>
For data science curriculum development by John Ketterer<br>
Reference<br>
Howell, D. C. (1999). Fundamental Statistics for the Behavioral Sciences, 4th Edition. Duxbury Press, Pacific Grove, California.


Why does the moon appear to be so much larger when it is near the horizon than when it is directly overhead? This question has produced a wide variety of theories from psychologists. An important early hypothesis was put forth by Holway and Boring (1940) who suggested that the illusion was due to the fact that when the moon was on the horizon, the observer looked straight at it with eyes level, whereas when it was at its zenith, the observer had to elevate his or her eyes as well as his or her head to see it. To test this hypothesis, Kaufman and Rock (1962) devised an apparatus that allowed them to present two artificial moons, one at the horizon and one at the zenith, and to control whether the subjects elevated their eyes or kept them level to see the zenith moon. The horizon, or comparison, moon was always viewed with eyes level. Subjects were asked to adjust the variable horizon moon to match the size of the zenith moon or vice versa. For each subject the ratio of the perceived size of the horizon moon to the perceived size of the zenith moon was recorded with eyes elevated and with eyes level. A ratio of 1.00 would represent no illusion. If Holway and Boring were correct, there should be a greater illusion in the eyes-elevated condition than in the eyes-level condition.


Holway, A. H., and Boring, E. G. (1940). The moon illusion and the angle of regard. American Journal of Psychology 53, 509-516.
Kaufman, L., and Rock, I. (1962). The moon illusion I.  Science 136, 953-961.

<ul>
<li>Subject	--->	Subject number, 1 to 10 </li>
<li>Elevated --->		Perceived ratio with eyes elevated</li>
<li>Level --->	Perceived ratio with eyes level</li>
</ul>

In [1]:
import numpy as np
from numpy.random import seed
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import t, ttest_rel, ttest_1samp

In [2]:
# data collected from the experiment which will be provided in the following test sections
subject = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Subject ---> Subject number, 1 to 10
elevated = np.array([1.65, 1, 2.03, 1.25, 1.05, 1.02, 1.67, 1.86, 1.56, 1.73])
# Elevated ---> Perceived ratio with eyes elevated
level = np.array([1.73, 1.06, 2.03, 1.4, 0.95, 1.13, 1.41, 1.73, 1.63, 1.56])
# Level ---> Perceived ratio with eyes level


# One sample t-test

The following data for ten subjects are taken from Kaufman and Rock’s paper and represent the ratio of the diameter of the variable moon and the standard moon. A ratio of 1.00 would indicate no illusion; a ratio other than 1.00 would represent an illusion.

In [3]:
# given data; seperate into individual arrays
data = dict(zip(subject, level))

Step 1: <br>
Write your null hypothesis statement.<br>
H0: There is no change in perception of the size of the moon <br>						µ = 1 <br>
<br>
Write your alternate hypothesis. This is the one you’re testing. <br>
H1: There is a change in perception of the size of the moon. <br>						µ != 1<br>

Define a significance level: <br>
alpha = 0.05 <br>

Step 2:  Check for assumptions; The one sample t-test has four main assumptions:<br>
<ul>
<li>The dependent variable must be continuous (interval/ratio).</li>
<li>The observations are independent of one another.</li>
<li>The dependent variable should be approximately normally distributed.</li>
<li>The dependent variable should not contain any outliers.</li>

Step 3: Identify the following pieces of information you’ll need to calculate the test statistic. 
<ul>
    <li>sample mean(x̄)</li>
    <li>population mean(μ)</li>
    <li>sample standard deviation(s) </li>
    <li>Number of observations(n)</li>
    <li>standard error (SE)</li>
    
</ul>


In [4]:
sample_mean = level.mean()
population_mean = 1
sample_std = level.std(ddof = 1)
num_observation = len(level)
standard_error = sample_std / np.sqrt(num_observation)


Step 4: Find the t score

In [5]:
# this is your calculated t value; see formula 
tscore = (sample_mean - 1) / standard_error
round(tscore, 2)

4.3

Step 5: Find the critical value.
You can use 
<ol>
    <li>Percent Point Function (PPF):<br>
         <pre><code>Returns the observation value for the provided probability that is less than or equal to the provided probability   from the distribution.</pre></code>
    </li>

   <li>
    T-table:You need two values to find this:<br>
        <pre><code>1. The alpha level: given as 1%, 5%, 10%, etc.<br></pre></code>
        <pre><code>2. The degrees of freedom, which is the number of items in the sample (n) minus 1.
        </pre </code>
        
  
</ol>

In [6]:
# retrieve critical value <= probability
critical_value = t.ppf(0.95, 9, loc = sample_mean, scale = sample_std)
print('critical_value is', critical_value)
# confirm with cdf
probability = t.cdf(critical_value, 9, loc = sample_mean, scale = sample_std)
# cdf is the inverse of ppf and will return the probability
print('probability is', probability)

critical_value is 2.0875181379588663
probability is 0.9499999999997932


Step 6: You can also calculate a confidence interval, create visualizations to support your evidence and/or finally compare the values from step 4 and 5 to reject the null hypothesis or fail to reject the null hypothesis.
<br>


In [7]:
# confirm using scipy's one sample function
ttest_1samp(level, 1)

Ttest_1sampResult(statistic=4.297591739651884, pvalue=0.001997695334372524)

tscore > critical_value <br>
We should reject the null hypothesis and conclude that the true mean ratio under these conditions isnot equal to 1.00. In fact, it is greater than 1.00, which is what we wouldexpect on the basis of our experience. (It is always comforting to see scienceconfirm what we have all known since childhood, but the results also mean that Kaufman and Rock’s experimental apparatus performs as it should.


We confirmed using the simple function from scipy. This gave us the ttest and the pvalue. The pvalue v=can also be used to reject or fail to reject the null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

For years, users of statistical techniques that analyze their data have been content to declare that they found a  significant difference, and then considered their work done. Many have complained and argued for some kind of statement by  the experimenter that gave an indication not only that the difference was significant, but whether it was meaningful. If we use enough subjects, we can almost always find even a meaningless difference to be significant. Enter effect size...
<pre><code>Effect size:The difference between two populations divided by the standard deviation of either population — sometimes presented in raw score units.</pre></code>
The effect size is a statistic that gives a meaningful indication of how large a mean is, or how different two means are.
<br>
We know that we have a significant difference from our data based on the comparison of the t-score and critical value, but when we report this difference, we want to be able to convince the audience that he or she cares about the effect. If the moon looks just a tiny bit larger at the horizon, that may not be that big of a deal. Recall the nature of our dependent variable. Participants looked at the moon high in the sky and adjusted a “moon” off to one side to appear to be the same size as the real moon. Then they looked at the moon just above the horizon, and made a  similar  adjustment.  If  there  were no moon illusion,  the  two  settings  would  be about the same, and their ratio would be about 1.00. But in actual fact, the settings for the horizon moon were much larger than the settings for the zenith moon, and the average ratio of these two settings was 1.463. This means that, on average, the moon on the horizon appeared to be 1.463 times larger (or 46.3% larger) than the moon at its zenith.
<br>
This is a huge difference—at least it appears so. (Notice that I am not referring to the measurement of the setting the participant made, but to the ratio of the sizes under the two conditions.This experiment illustrates a case wherein we can convey to the reader some-thing meaningful about the size of our effect just by reporting the mean. We don’t have to get fancy. When you tell your readers that the moon at the horizon appears nearly half again as large as the moon at its zenith, you are telling them something more than simply that the horizon moon appears significantky larger. You  are certainly telling them much more than saying that the average setting for the horizon moon was 5.23 centimeters. In  this example we have a situation where the ratios that we collect are such that we can express important information simply by telling the reader what the mean ratio was. It now makes sense here to say that “people perceive a moon on the horizon to be nearly 1.5 times as large as the apparent size of the moon at its zenith.”

# Two sample T-test Variance Unequal

Now we can add a little more complexity to the problem. In Kaufman and Rock’s experiment, they tested whether or not the eye perspective was an influence. For each subject, the ratio of the perceived size of the horizon moon compared to the perceived size of the zenith moon was recorded with eyes elevated and with eyes level. Will eyes being elevated compared to eyes being level effect the perception of the moon size? In this case, we need to compare the means of two groups. We have n matched pairs, the actual sample size is n (pairs) although we may have data from two different subjects they pair up as we are trying to find the difference between them.

In [8]:
# given data; seperate into individual arrays
data = dict(zip(elevated, level))

Step 1: <br>
Write your null hypothesis statement.<br>
H0: There is no change in perception of the size of the moon based on level eyes or elevated eyes. The mean difference(μd) will equal zero. <br>						H0: μd = 0 <br>
<br>
Write your alternate hypothesis. This is the one you’re testing. <br>
H1: There is a change in perception of the size of the moon based on level eyes or elevated eyes. The mean difference (μd) will not be equal to zero. <br>						H1: μd ≠ 0<br>

Define a significance level: <br>
alpha = 0.05 <br>



Step 2:<br>
In a paired sample t-test, the observations are defined as the differences between two sets of values, and each assumption refers to these differences, not the original data values. The paired sample t-test has four main assumptions:
<ul>
<li>The dependent variable must be continuous (interval/ratio).</li>
<li>The observations are independent of one another.</li>
<li>The dependent variable should be approximately normally distributed.</li>
<li>The dependent variable should not contain any outliers.</li>
<ul/>

Step 3: Identify the following pieces of information you’ll need to calculate the test statistic. 
Let x = level ratio, y = elevated ratio;
<ul>
    <li>Calculate the difference (di = yi − xi) between the two observations on each pair,
making sure you distinguish between positive and negative differences.</li>
    <li>difference in sample means(d)</li>
    <li>Number of observations(n)</li>
    <li>hypothesized mean difference(μ)</li>
    <li>sample standard deviation of the differences(σ^) </li>
    <li>standard error (SE)</li>
    
</ul>

NOTE:<br>
For this test to be valid the differences only need to be approximately normally distributed.
Therefore, it would not be advisable to use a paired t-test where there were any extreme
outliers.


In [9]:
# numpy array's allow for broadcasting
array_differences = np.array(level) - np.array(elevated)
num_observation = len(level)
difference_mean = array_differences.mean()
hypothesized_mean = 0
difference_std = array_differences.std(ddof = 1)
standard_error = difference_std / np.sqrt(num_observation)


Step 4: Find the t score

In [10]:
# this is your calculated t value; see formula 
tscore = difference_mean / standard_error
round(tscore, 2)

-0.44

Step 5: Find the critical value.
You can use 
<ol>
    <li>Percent Point Function (PPF):<br>
         <pre><code>Returns the observation value for the provided probability that is less than or equal to the provided probability   from the distribution.</pre></code>
    </li>

   <li>
    T-table:You need two values to find this:<br>
        <pre><code>1. The alpha level: given as 1%, 5%, 10%, etc.<br></pre></code>
        <pre><code>2. The degrees of freedom, which is the number of items in the sample (n) minus 1.
        </pre </code>
        
  
</ol>

In [11]:
# retrieve critical value <= probability
critical_value = t.ppf(0.95, 9)
print('critical_value is', critical_value)


critical_value is 1.8331129326536335


Step 6: You can also calculate a confidence interval, create visualizations to support your evidence and/or finally compare the values from step 4 and 5 to reject the null hypothesis or fail to reject the null hypothesis.
<br>

tscore < critical_value <br>
We should fail to reject the null hypothesis and conclude that the position of the eyes do not significantly effect the perception of the moon. 


In [12]:
ttest,pval = ttest_rel(level, elevated)
ttest, pval

(-0.43808582711518157, 0.6716502377784621)

A p-value higher than 0.05 (> 0.05) is not statistically significant and indicates strong evidence for the null hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis, we can only reject the null or fail to reject it.
A statistically significant result cannot prove that a research hypothesis is correct (as this implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was incorrect – e.g. less than 5%).

It would be
useful to calculate a confidence interval for the mean difference to tell us within what limits
the true difference is likely to lie. A 95% confidence interval for the true mean difference is the difference of means plus/ minus the margin of error.<br>
The margin of error is the standard error times the t-score:
\begin{equation}
\frac{std}{\sqrt{n}}
\end{equation}
<br>
Therefore to put it all together we will subtract to get the lower bound of the interval and add to get the higher bound of the interval:
<br><br>
\begin{equation}
 (d - t * \frac{std}{\sqrt{n}}),   (d + t * \frac{std}{\sqrt{n}})                        
\end{equation}

In [13]:
# confidence interval for the moon illusion eyes level/ elevated experiment
lower_bound = difference_mean - (tscore * standard_error)
upper_bound = difference_mean + (tscore * standard_error)
lower_bound, round(upper_bound, 2)

(0.0, -0.04)

We have a mean difference of -0.02. This confirms that, the difference in scores is statistically insignificant, so we can see that the interval of the diiference of means is actually relatively small. We can be 95% sure that the true mean difference lies somewhere between just under zero point and just over -0.04. You might say that the elevation of the eyes makes the moon appear slightly smaller but the appearence is not effected in a major or significant way. 

In the next notebook we will talk more about effect size and how to deliver the magnitude of the reults to your audience.