# Module 4 - The Normal Distribution

In this module, you will learn how to work with something very important to probability and statistics, the normal distribution. The normal distribution looks like a bell curve, and it describes the behaviour of many real-world phenomena, from average monthly rainfall to heights of students in your class. In this module, we look specifically at how to calculate probabilities relating to the normal distribution, and how to find quantiles of the normal distribution.

## Calculating Standard Normal Probabilities

Suppose that we are working with a standard normal distribution, and we want to know the probability of getting a point less than 1.5. You can get this from the table in your textbook, but it will only be precise to 4 decimal places. For most purposes this is fine, but some people like to measure things very precisely and for them 4 decimal places might not be enough.

Now suppose that we want to know the probability of being less than 1.524. Most tables do not let us look up numbers with 3 decimal places. In this case, the best we can say is that the probability we want is somewhere between the probabilites for 1.52 and 1.53. 

R addresses both of these problems with a function that calculates the probability of being less than any point on a standard normal distribution. The function we use is called "pnorm()" (for Probability from a NORMal distribution). We will start by just using the first input for this function. Later in this module, we will discuss other inputs for the "pnorm" function that can be useful. The first input to the "pnorm()" function is called "q". This is the point that we want to get the probability of being below.

Let's calculate the probability of being below 1.5, 1.524, 1.52 and 1.53 on a standard normal distribution. We will also print some text so that we know which probability corresponds to which number.

In [None]:
print("Probability of being below 1.5")
pnorm(q = 1.5)
print("Probability of being below 1.524")
pnorm(q = 1.524)
print("Probability of being below 1.52")
pnorm(q = 1.52)
print("Probability of being below 1.53")
pnorm(q = 1.53)

Notice that we would have been correct in guessing that the probability of being below 1.524 (about 0.9362) is between the probabilities for 1.52 (about 0.93574) and 1.53 (about 0.93699). The "pnorm()" function however, gives us the exact value for 1.524, rather than just a range. It also gives us much more than 4 decimal places for each probability.

## Calculating Upper-Tail Probabilities

Sometimes we want to know the probability of being above a particular point on the standard normal distribution. One way to do this is to calculate the probability of being below that point and subtracting it from 1. Let's use this method to find the probability of being above 0.793.

In [None]:
print("Probability of being above 0.793")
1 - pnorm(q = 0.793)

We could also have assigned a name to the probability of being below 0.793, and subtracted this name from 1. Let's use the name "p.below" for the probability of being below 0.793.

In [None]:
print("Probability of being above 0.793")
p.below = pnorm(q = 0.793)
1 - p.below

Notice that the "pnorm()" function did not print anything this time. That is because we saved its output under the name "p.below". Most functions in R that calculate something will print out their answer unless you save it under a name. If we want to see the value of "p.below", we have to use the "print()" function on it. Similarly, if we don't want to see the value of "1 - p.below", we have to save it under a name.

In [None]:
print(p.below)
p.above = 1 - p.below

This is a useful bit of R knowledge that can help you diagnose why something is printing when it shouldn't be, or not printing when it should be.

Now back to computing probabilities. So far, we have had to calculate the probability of being above a certain point by subtracting from 1 the probability of being below that point. The "pnorm()" function has an optional input called "lower.tail" that lets us skip this step. If "lower.tail" is set to "TRUE" then the "pnorm()" function calculates the probability of being below the point. However, if "lower.tail" is set to "FALSE" then the "pnorm()" function calculates the probability of being above that point (if it's not calculating lower tail probabilites, it must be calculating upper tail probabilities). This can save us some time when calculating right-tailed probabilities from the standard normal distribution. 

Let's calculate the probability of being above 0.793 on a standard normal distribution onece more, this time by using the "pnorm()" function with its "lower.tail" input set to "FALSE".

In [None]:
pnorm(q = 0.793, lower.tail=FALSE)

Note that the default value of "lower.tail" is "TRUE", so if we do not include it in our "pnorm()" function, "lower.tail" will automatically be set to "TRUE" and we will get left-tailed probabilities.

## Calculating General Normal Probabilities

Suppose that we are now working with a normal distribution with mean 5 and standard deviation 2. Let's find the probability of being less than 6.284.

The standard technique is to standardize the value we are interested in. Remember that to standardize, we subtract the mean of our normal distribution and divide by the standard deviation. This gives us a point on the standard normal distribution, so we can use the "pnorm()" function to get the probability.

We will calculate this probability in R using 3 steps. First, we will enter the value 6.284, and give it a name, say "x" (lower case, remember that R is case sensitive). Second, we will standardize "x" by subtracting the mean, 5, and dividing by the standard deviation, 2. We will give the standardized value a name, say "z". We will then use "z" as the input for "pnorm()" to calculate the probability we are looking for. 

In [None]:
x = 6.284
z = (x - 5)/2 # Be careful with order of operations here
pnorm(q = z)

Remember that anything following the "#" symbol is called a comment and is ignored by R.

We don't need to give the probability a name because we want to see the value it produces. If we did want to name this probability so that we could use it later, we would give it a name, say "prob.below", then use the "print" function.

In [None]:
x = 6.284
z = (x - 5)/2 
prob.below = pnorm(q = z)
print(prob.below)

Note that the "print()" function does not give us as many decimal places as we had before. We can change this, but 7 decimal places is more than enough for most purposes.

We do not include the "lower.tail" input with the "pnorm()" function. This is the same as setting "lower.tail" equal to its default value, "TRUE", which is exactly what we want here.

The "pnorm()" function has 2 more optional inputs that let us set the mean and standard deviation of the normal distribution we are working with. This means that we can skip the standardizing step. These optional inputs are called "mean" and "sd" respectively.

Let's calculate the probability of being below 6.284 on a normal distribution with mean 5 and standard deviation 2. Recall that "x" is the name of the value we are interested in, 6.284.

In [None]:
pnorm(q = x, mean = 5, sd = 2)

Let's calculate the probability of being above -3 on a normal distribution with mean -1 and standard deviation 5. To do this, we need to put together all the optional inputs that we have learned for the "pnorm()" function. 

In [None]:
pnorm(q = -3, mean = -1, sd = 5, lower.tail=FALSE)

We could have calculated this without any optional inputs by standardizing and subtracting our answer from 1.

In [None]:
y = -3
w = (y - (-1))/5
new.prob.below = pnorm(w)
1 - new.prob.below

## Finding Standard Normal Quantiles

Suppose that we want to find the point on a standard normal distribution that has 0.7 probability below it. We could do this using the table in your book, but 0.7 does not show up exactly. This means we could only say that the point we are looking for is between 0.52 and 0.53 (you can check this in your book).

R lets us get an exact answer to this problem using the "qnorm()" function (for Quantile of a NORMal distribution). We use the "qnorm()" function in much the same way as the "pnorm" function, but "qnorm()" gives us points corresponding to probabilities while "pnorm()" gives us probabilities corresponding to points. The main input for "qnorm()" is called "p". This is the probability below the point that we want to find.

We can now find the point on a standard normal distribution that has 0.7 probability below it.

In [None]:
qnorm(p = 0.7)

Let's double-check that this point actually has 0.7 probability below it by using the "pnorm()" function.

In [None]:
pnorm(0.524400512708041)

Alternatively, we could save the value given by "qnorm()" as a name, say "point". We can the use the "pnorm()" function on "point". This means we don't have to copy and paste the answer from "qnorm()" into "pnorm()".

In [None]:
point = qnorm(p = 0.7)
pnorm(point)

## Finding General Normal Quantiles

As you might have guessed, the "qnorm()" function has several optional inputs. They are actually the same ones we used for the "pnorm()" function. 

We can use the "lower.tail" input to tell R whether we are specifying the lower-tail or upper-tail probability. If we want the point that has a certain probability below it, we want "lower.tail" to be equal to "TRUE". Alternatively, if we want the point that has a certain probability above it, we want "lower.tail" to be equal to "FALSE". In "qnorm()", the default value of "lower.tail" is "TRUE"; just like in "pnorm()". This means that if we don't specify the value of "lower.tail", it is automatically set to "TRUE".

Let's use the "lower.tail" option in "qnorm()" to find the point on the standard normal distribution that has 0.2 probability above it. That means we have to set "lower.tail" equal to "FALSE".

In [None]:
qnorm(p = 0.2, lower.tail=FALSE)

There are two more optional inputs for the "qnorm()" function that let us set the mean and standard deviation of the normal distribution we are working with. The inputs are called "mean" and "sd", and they let us set the mean and standard deviation respectively.

Let's use the "mean" and "sd" inputs with "qnorm()" to find the point that has probability 0.45 below it on a normal distribution with mean 1 and standard deviation 10.

In [None]:
qnorm(p = 0.45, mean = 1, sd = 10)

Let's put together all the optional inputs for "qnorm()" to find the point that has probability 0.05 above it on a normal distribution with mean -3 and standard deviation 0.2.

In [None]:
qnorm(p = 0.05, mean = -3, sd = 0.2, lower.tail = FALSE)