# 1. Probability

## 1.1 Introduction

Probability is a measure of the likelihood of an event to occur. Many events cannot be predicted with total certainty. We can predict only the chance of an event to occur i.e., how likely they are going to happen, using it. Probability can range from 0 to 1, where 0 means the event to be an impossible one and 1 indicates a certain event.

The probability of all the events in a sample space adds up to 1.

For example, when we toss a coin, either we get Head OR Tail, only two possible outcomes are possible (H, T). But when two coins are tossed then there will be four possible outcomes,  i.e {(H, H), (H, T), (T, H), (T, T)}.

## 1.2 Formula

The probability formula is defined as the possibility of an event to happen is equal to the ratio of the number of favourable outcomes and the total number of outcomes.

**Probability of event to happen P(E) = Number of favourable outcomes/Total Number of outcomes**

Sometimes students get mistaken for “favourable outcome” with “desirable outcome”. This is the basic formula. But there are some more formulas for different situations or events.

## 1.3 Terms

![image.png](attachment:image.png)

## 1.4 Examples

**Example 1:** There are 6 pillows in a bed, 3 are red, 2 are yellow and 1 is blue. What is the probability of picking a yellow pillow?

Ans: The probability is equal to the number of yellow pillows in the bed divided by the total number of pillows, i.e. 2/6 = 1/3.

**Example 2:** There is a container full of coloured bottles, red, blue, green and orange. Some of the bottles are picked out and displaced. Sumit did this 1000 times and got the following results:

No. of blue bottles picked out: 300
No. of red bottles: 200
No. of green bottles: 450
No. of orange bottles: 50

a) What is the probability that Sumit will pick a green bottle?

Ans: For every 1000 bottles picked out, 450 are green.

Therefore, P(green) = 450/1000 = 0.45

b) If there are 100 bottles in the container, how many of them are likely to be green?

Ans: The experiment implies that 450 out of 1000 bottles are green.

Therefore, out of 100 bottles, 45 are green.

**Note:**

- **Mutually Exclusive Event:** In logic and probability theory, two events are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails, but not both.

- **Non-Mutually Exclusive Events:** Non-mutually exclusive events are events that can happen at the same time. Examples include driving and listening to the radio, even numbers and prime numbers on a die, losing a game and scoring, running and sweating, etc. Non-mutually exclusive events can make calculating probability more complex.

## 1.5 Addition Rule

If A and B are two events in a probability experiment, then the probability that either one of the events will occur is:

**P(A or B) = P(A) + P(B) − P(A and B)**

This can be represented in a Venn diagram as:

**P(A ∪ B) = P(A) + P(B) − P(A ∩ B)**

![image-5.png](attachment:image-5.png)

If A and B are two mutually exclusive events , **P(A ∩ B)=0**. Then the probability that either one of the events will occur is: **P(A or B) = P(A) + P(B)**

This can be represented in a Venn diagram as:

**P(A ∪ B) = P(A) + P(B)**

![image-4.png](attachment:image-4.png)

**Example 1:**

![image-3.png](attachment:image-3.png)

**Example 2:**

What's the probability of drawing either a queen or a heart from deck of cards?

P(Q or ❤️) = P(Q) + P(❤️) - P(Q and ❤️)
In this case, you can get a card that’s both a queen and a heart (the queen of hearts!) so we need to subtract that probability to avoid double counting. There are 4 queens, 13 hearts, and one queen of hearts:

![image-6.png](attachment:image-6.png)

Mathematically:

P(Q or ❤️) = P(Q) + P(❤️) - P(Q and ❤️)

P(Q or ❤️) = 4/52 + 13/52 - 1/52

P(Q or ❤️) = 16/52 = 30.77%

## 1.6 Multiplication Rule

The multiplication rule is a way to find the probability of two events happening at the same time. 

There are two multiplication rules:

**1. Using General Rule:**

The general multiplication rule formula is: P(A ∩ B) = P(A) P(B|A). P(B|A) means “the probability of A happening given that B has occurred”.

This rule can be used for any event (they can be independent or dependent events). You still have to multiply two numbers, but first you have to use a little logic to figure out the second probability before multiplying.

**Example:**

A bag contains 6 black marbles and 4 blue marbles. Two marbles are drawn from the bag, without replacement. What is the probability that both marbles are blue?

![image.png](attachment:image.png)

Step 1: Label your events A and B. Let A be the event that marble 1 is blue and let B be the event that marble 2 is blue.

Step 2: Figure out the probability of A. There are ten marbles in the bag, so the probability of drawing a blue marble is 4/10.

Step 3: Figure out the probability of B. There are nine marbles in the bag, so the probability of choosing a blue marble (P B|)A is 3/9.

Step 4: Multiply Step 2 and 3 together: (4/10)*(3/9) = 2/15.

**2. Using Specific Rule:**

The specific multiplication rule, P(A and B) = P(A) * P(B), is only valid if the two events are independent. In other words, it only works if one event does not change the probability of the other event.

Examples of independent events:

- Owning a cat and getting a weekly paycheck.
- Finding a parking space and having a coin for the meter.
- Buying a book and then buying a coffee.

**Example:**

Using the specific multiplication rule formula is very straightforward. Just multiply the probability of the first event by the second. For example, if the probability of event A is 2/9 and the probability of event B is 3/9 then the probability of both events happening at the same time is (2/9)*(3/9) = 6/81 = 2/27.

# 2. Permutations

Imagine you're visiting a zoo with six animals, and I ask you to record the first three animals you see. The six animals are:

Tiger, Lion, Monkey, Zebra, Walrus, Snake

How many different ways could you run into three animals?

![image.png](attachment:image.png)

There are 6 animals you could record in the first spot. After that, there are only 5 animals left to see for the second spot. And after that, there are only 4 animals left to see for the final spot. When we multiply those three numbers, we find that there are 120 different ways you could run into three different animals. This is a permutation.

When I say that there are 120 different ways you could run into three different animals, I'm saying that order matters. For example, there are six different ways to run into the same three animals:

Tiger, Lion, Monkey

Tiger, Monkey, Lion

Lion, Monkey, Tiger

Lion, Tiger, Monkey

Monkey, Lion, Tiger

Monkey, Tiger, Lion

The permutations formula for this data would look something like this:

![image-2.png](attachment:image-2.png)

But wait, what do the exclamation points(!) mean? Those mark factorials. Here are two examples of factorials:

6! = 6 * 5 * 4 * 3 * 2 * 1 = 720

4! = 4 * 3 * 2 * 1 = 24

In a factorial, you take the initial number and multiply it by every number between itself and one. So, our final answer for the first problem would look like this:

![image-3.png](attachment:image-3.png)

As you can see, using the equation we still get an answer of 120.

# 3. Combinations

In a combination, order does not matter. All six of those answers found above are really the same thing. So, there are 120 / 6 = 20 different combinations of animals that we could run into.

The combinations formula for our data would look something like this:

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

As you can see, using the equation we still get an answer of 20.

# 4. Discrete and Continuous Random Variables

**Random Variable:**

A random variable is a variable which has its value determined by a probability experiment.

If you flip a coin once, how many tails could you come up with? Let's create a new random variable called "T". "T" represents the number of tails possible from our probability experiment. After flipping a coin once (a probability experiment), T's value will be either 1 or 0. T is a random variable.

**Discrete Random Variable:**

A discrete random variable is a random variable which has a finite number of values.

Let's say you flip a coin six times. How many tails could you come up with?

![image.png](attachment:image.png)

There are a finite number of possible values. Values such as "1.5" or "2.5923" don't make sense for this type of problem.

**Continuous Random Variable:**

A continuous random variable is a random variable which has an infinite number of values.

Let's say you measure the speed (in miles per hour) of the first car to drive by your house. What kind of values could you obtain?

![image-2.png](attachment:image-2.png)

Maybe the car is going 25mph, or 50mph, or 62.00252mph. The variable (speed) can take on an infinite number of values.

# 5. Probability Distribution

A probability distribution displays the probabilities associated with all possible outcomes of an event.

Here's a probability distribution for one roll of a six-sided die:

![image.png](attachment:image.png)

As you can see, every event has an equal chance of occuring.

# 6. Probability Histogram

A probability histogram is a histogram with possible values on the x axis, and probabilities on the y axis.

Here's a made-up probability distribution(left) with its probability histogram(right):

![image.png](attachment:image.png)

# 7. Mean and Expected Value of Discrete Random Variables

Let's calculate the mean of a discrete random variable:

Below is the probability distribution for a golfer on a par 3 hole, where x = Number of Strokes to Complete Course

![image.png](attachment:image.png)

The mean can be calculated by multiplying each "x" by each "P(x)", then adding the resulting values together:

![image-2.png](attachment:image-2.png)

Here, the mean is 2.65

The mean we just calculated of 2.65 is an expected value. If we were to take a large enough sample of this golfer's performance on par 3 holes, we expect his mean to approach 2.65.

This is a short example of the Law of Large Numbers.

# 8. Variance and Standard Deviation of Discrete Random Variables

To calculate the variance of a discrete random variable, we must first calculate the mean. Here is the mean we calculated from the example in the previous lecture:

![image.png](attachment:image.png)

Now, we can move on to the variance formula:

![image-2.png](attachment:image-2.png)

To find the first part of the equation, we first square every "x". Then, we multiply each squared "x" by "P(x)". Last, we add together all resulting values.

![image-3.png](attachment:image-3.png)

We find the first part of the equation to be 7.75. Now, we can plug in the rest of the values to get our answers:

![image-4.png](attachment:image-4.png)

The variance is 0.73, while the standard deviation is 0.85.

# 9. The Law of Large Numbers

According to the law of large numbers, as a probability experiment is performed many times, the observed value (usually a mean) will arrive at the expected value.

Imagine a probability experiment where a coin is flipped, and the number of heads is measured:

![image.png](attachment:image.png)

As more probability experiments are performed, the actual value will approach the expected value of 0.50. Below are 10 simulated coin flips. As you can see from the line graph on the right, the actual value is approaching the expected value.

![image-2.png](attachment:image-2.png)

# 10. Binomial Distribution

An experiment is a binomial experiment if:

1. It is repeated a fixed number of times.

2. The trials are independent.

3. Trials have two mutually exclusive outcomes, either success or failure.

4. The probability of success is the same for all trials.

Try this example: In a recent survey, it was found that 85% of households in the United States have High-Speed Internet. If you take a sample of 18 households, what is the probability that exactly 15 will have High-Speed Internet?

First, let's check if it meets the four conditions:

**1. Is this experiment being repeated a fixed number of times?**

YES

**2. Are the trials independent?**

YES, discovering that one home has High-Speed Internet will not affect the probability of other homes having High-Speed Internet.

**3. Are there two mutually exclusive outcomes?**

YES, a home either has High-Speed Internet(success) or doesn�t have High-Speed Internet(failure). These events cannot occur together so they are mutually exclusive.

**4. Is the probability of success the same for all trials?**

YES, the probability of success for each trial is 85%.

Now, let's work through the equation. To use this equation, you should already have a pretty good idea about what a combination is.

![image.png](attachment:image.png)

By following the above steps, you should find that the probability of 15 households having High-Speed Internet is .239.

What if I changed the example around a little? In a recent survey, it was found that 85% of households in the United States have High-Speed Internet. If you take a sample of 18 households, what is the probability that at least 15 will have High-Speed Internet?

It says "at least" 15, so that means we have to calculate the probabilties for 15, 16, 17, and 18 homes, then add everything together:

![image-2.png](attachment:image-2.png)

By following these steps, you should find that the probability of at least 15 households having High-Speed Internet is .718.

# 11. Mean and Standard Deviation of Binomial Random Variables

Let's use the data from the last example: In a recent survey, it was found that 85% of households in the United States have High-Speed Internet. If you take a sample of 18 households, what is the probability that exactly 15 will have High-Speed Internet?

Here are the equations for mean and standard deviation of a binomial random variables:

![image.png](attachment:image.png)

We can now easily plug in the number of trials and the probability of success to come up with our answers:

![image-2.png](attachment:image-2.png)

The mean is 15.3, and the standard deviation is 1.515.

# 12. Poisson Distribution

The Poisson probability distribution is used when computing the probability of a certain number of successes within a specified interval.

An experiment follows the Poisson process if:

1. The probability of two successes in a small enough interval is 0%.

2. The probability of a success is the same for any two intervals which share the same length.

3. Successes are independent of successes in other intervals.

Here's an example: At a theme park, there is a roller coaster that sends an average of three cars through its circuit every minute between 6pm and 7pm. A random variable, X, represents the number of roller coaster cars to pass through the circuit between 6pm and 6:10pm.

First, let's check if this contains a Poisson random varilable:

**Is the probability of two successes in a small enough interval 0%?**

YES, in a small enough interval (say, 1 second) it would be impossible for two successes (cars through the circuit) to occur.

**Are the probabilities of success equal for any two intervals of equal length?**

YES, between any two equal intervals (say, 6pm-6:15pm, and 6:30pm-6:45pm), the average (probability of success) remains 3 cars.

**Are successes independent of successes in other intervals?**

YES, a success in one interval is independent of a success in any other interval.

What is the probability that 35 cars will pass through the circuit between 6pm and 6:10pm?

![image.png](attachment:image.png)

The probability that 35 cars will pass through the circuit between 6pm and 6:10pm is 0.045.

# 13. Mean and Standard Deviation of Poisson Random Variables

Here's my previous example: At a theme park, there is a roller coaster that sends an average of three cars through its circuit every minute between 6pm and 7pm. A random variable, X, represents the number of roller coaster cars to pass through the circuit between 6pm and 6:10pm. What is the probability that 35 cars will pass through the circuit between 6pm and 6:10pm?

We can use this information to calculate the mean and standard deviation of the Poisson random variable, as shown below:

![image.png](attachment:image.png)

The mean of this variable is 30, while the standard deviation is 5.477.

# 14. Coordinate (Cartesian) Planes

Below is a Coordinate Plane:

![image.png](attachment:image.png)

This two dimensional coordinate plane displays points on two dimensions: x, and y. Any point on this plane will have an x value, and a y value. The point in the very center of the plane is (0, 0).

![image-2.png](attachment:image-2.png)

A red point has been plotted on the plane above. Where is the point located? Points are described in the format (x, y). So, this point is located at (5, 4).

# 15. Quadrants 

![image.png](attachment:image.png)

Above is a coordinate plane with the point (5, 4) plotted on it. What quadrant is this point in?

![image-2.png](attachment:image-2.png)

Coordinate planes have four quadrants, as shown above. From this, we can conclude that the point is in Quadrant I.

# 16. Scatter Plots

Scatter plots are a method of graphically displaying bivariate data.

Example bivariate data is displayed below. Our two variables are age and yearly income.

![image.png](attachment:image.png)

What is the relationship between age and yearly income?

The relationship can be plotted on a scatter plot. Below, age is on the x-axis, while yearly income is on the y-axis:

![image-2.png](attachment:image-2.png)

Just by looking at the graph, it would appear that as age inceases, yearly income increases. It is a fairly good bet that our variables are related.

# 17. Covariance

In mathematics and statistics, covariance is a measure of the relationship between two random variables. The metric evaluates how much – to what extent – the variables change together. In other words, it is essentially a measure of the variance between two variables. However, the metric does not assess the dependency between variables.

Unlike the correlation coefficient, covariance is measured in units. The units are computed by multiplying the units of the two variables. The variance can take any positive or negative values. The values are interpreted as follows:

- Positive Covariance: Indicates that two variables tend to move in the same direction.

- Negative Covariance: Reveals that two variables tend to move in inverse directions.

**Formula:**

The covariance formula is similar to the formula for correlation and deals with the calculation of data points from the average value in a dataset. For example, the covariance between two random variables X and Y can be calculated using the following formula (for population):

![image.png](attachment:image.png)

For a sample covariance, the formula is slightly adjusted:

![image-2.png](attachment:image-2.png)

Where:

- Xi – the values of the X-variable
- Yj – the values of the Y-variable
- X̄ – the mean (average) of the X-variable
- Ȳ – the mean (average) of the Y-variable
- n – the number of data points

**Example:**

John is an investor. His portfolio primarily tracks the performance of the S&P 500 and John wants to add the stock of ABC Corp. Before adding the stock to his portfolio, he wants to assess the directional relationship between the stock and the S&P 500.

John does not want to increase the unsystematic risk of his portfolio. Thus, he is not interested in owning securities in the portfolio that tend to move in the same direction.

John can calculate the covariance between the stock of ABC Corp. and S&P 500 by following the steps below:

1. Obtain the data.

First, John obtains the figures for both ABC Corp. stock and the S&P 500. The prices obtained are summarized in the table below:

![image-3.png](attachment:image-3.png)

2. Calculate the mean (average) prices for each asset.

![image-4.png](attachment:image-4.png)

3. For each security, find the difference between each value and mean price.

![image-5.png](attachment:image-5.png)

4. Multiply the results obtained in the previous step.

5. Using the number calculated in step 4, find the covariance.

![image-6.png](attachment:image-6.png)

In such a case, the positive covariance indicates that the price of the stock and the S&P 500 tend to move in the same direction.

# 18. Covariance vs. Correlation

Covariance and correlation both primarily assess the relationship between variables. The closest analogy to the relationship between them is the relationship between the variance and standard deviation.

**Covariance** measures the total variation of two random variables from their expected values. Using covariance, we can only gauge the direction of the relationship (whether the variables tend to move in tandem or show an inverse relationship). However, it does not indicate the strength of the relationship, nor the dependency between the variables.

On the other hand, **correlation** measures the strength of the relationship between variables. Correlation is the scaled measure of covariance. It is dimensionless. In other words, the correlation coefficient is always a pure value and not measured in any units.

The relationship between the two concepts can be expressed using the formula below:

![image.png](attachment:image.png)

Where:

- ρ(X,Y) – the correlation between the variables X and Y
- Cov(X,Y) – the covariance between the variables X and Y
- σX – the standard deviation of the X-variable
- σY – the standard deviation of the Y-variable

# 17. Pearson's r Correlation

In the previous lecture on scatter plots, we made a scatter plot for some sample bivariate data and concluded that the two variables were probably related.

![image.png](attachment:image.png)

We can use this data to calculate Pearson's r.

Pearson's r measures the strength of the linear relationship between two variables. Pearson's r is always between -1 and 1.

Here is a perfect positive relationship. r is equal to 1.0:

![image-2.png](attachment:image-2.png)

Here is a perfect negative relationship. r is equal to -1.0:

![image-3.png](attachment:image-3.png)

Here is an example of data that has no relationship. r is somewhere close to 0.0:

![image-4.png](attachment:image-4.png)

Pearson's r is calculated with the following equation:

![image-5.png](attachment:image-5.png)

Plugging in the values from our original example with ages and yearly incomes, we can calculate the following r:

![image-6.png](attachment:image-6.png)

This r is almost 1.0, so we can conclude that x(Age) and y(Yearly Income) have a strong positive relationship. As one increases, the other tends to increase as well.

# 18. Hypothesis Testing with Pearson's r

Just like with other tests such as the z-test or ANOVA, we can conduct hypothesis testing using Pearson's r.

To test if age and income are related, researchers collected the ages and yearly incomes of 10 individuals, shown below. Using alpha = 0.05, are they related?

![image.png](attachment:image.png)

Steps for Hypothesis Testing with Pearson's r

- Define Null and Alternative Hypotheses
- State Alpha
- Calculate Degrees of Freedom
- State Decision Rule
- Calculate Test Statistic
- State Results
- State Conclusion

1. Define Null and Alternative Hypotheses

![image-2.png](attachment:image-2.png)

2. State Alpha

alpha = 0.05

3. Calculate Degrees of Freedom

Where n is the number of subjects you have:

df = n - 2 = 10 � 2 = 8

4. State Decision Rule

Using our alpha level and degrees of freedom, we look up a critical value in the r-Table. We find a critical r of 0.632.

If r is greater than 0.632, reject the null hypothesis.

5. Calculate Test Statistic

We calculate r using the same method as we did in the previous lecture:

![image-3.png](attachment:image-3.png)

6. State Results

r = 0.99

Reject the null hypothesis.

7. State Conclusion

There is a relationship between age and yearly income, r(8) = 0.99, p < 0.05

# 19. The Spearman Correlation

The Spearman correlation is used when:

1. Measuring the relationship between two ordinal variables.

2. Measuring the relationship between two variables that are related, but not linearly.

Below is an example of some data that is related in a non-linear fashion. For this, we would use the Spearman correlation:

![image.png](attachment:image.png)

Let's calculate the Spearman correlation for the following data set:

![image-2.png](attachment:image-2.png)

To calculate the Spearman correlation, we must first rank the scores:

![image-3.png](attachment:image-3.png)

We then calculate the correlation using these new ranks:

![image-4.png](attachment:image-4.png)

We find an r of -1.00, meaning that our data has a negative relationship. As x increases, y decreases. As x decreases, y increases.

# 20. Linear Regression

In a previous lecture on Pearson's r, we found two sets of data to be highly correlated:

![image.png](attachment:image.png)

If we know that two variables are strongly correlated, we can use one variable to predict the other using the following equations:

![image-2.png](attachment:image-2.png)

Here, we first calculate beta1 and beta0 and place them in the top equation. Then, if we plug an x into the equation, we can predict what our y value will be.

The stronger your correlation (that is, the closer r is to -1 or 1), the more accurate your prediction will be.

First, we solve for beta1:

![image-3.png](attachment:image-3.png)

We then use beta1's value to solve for beta0:

![image-4.png](attachment:image-4.png)

Now, putting those values into the original equation, we have our completed regression equation:

![image-5.png](attachment:image-5.png)

Predict the yearly income of someone who is 33 years old.

![image-6.png](attachment:image-6.png)

We would expect someone who is 33 years old to make approximately $36,963 a year.

# 21. Correlation vs. Causation

**Causation:**

Causation means that one variable causes a change in another variable.

**Correlation:**

To say that two variables are correlated is to say that they share some kind of relationship.

In order to imply causation, a true experiment must be performed where subjects are randomly assigned to different conditions.

Here's an example of a true experiment where causation can be implied:

Researchers want to test a new anti-anxiety medication. They split participants into three conditions (0mg, 50mg, and 100mg), then ask them to rate their anxiety level on a scale of 1-10. Are there any differences between the three conditions using alpha = 0.05?

![image.png](attachment:image.png)

This is a true experiment because participants are randomly being assigned to different conditions. Any differences between the three groups should only be due to the effects of dosage.

Here's an example of correlational data:

![image-2.png](attachment:image-2.png)

Here, we see that students who spend more time studying for tests tend to score better than students who spend less time studying. However, because this is not a true experiment we cannot imply that studying causes better test scores. Perhaps the high scoring students in this sample were just better test takers.