# Introduction to Probabilisitic Methods

**2025 Introduction to Quantiative Methods in Finance**

**The Erdös Institute**

**Question**: What is the expected value of the roll of a fair dice with $6$ sides?

### Mathematical Solution

Let $X$ be the random variable of rolling a single dice. The probability distribution has a uniform value of $1/6$ of any possible outcome. Therefore the expected value is

$$E[X]=\frac{1}{6}*1 + \frac{1}{6}*2 + \frac{1}{6}*3 + \frac{1}{6}*4 + \frac{1}{6}*5 + \frac{1}{6}*6 = 3.5.$$

### Probabilistic Solution

Approaching our objective from a probabilistic point of view, we simulate the outcomes of playing the game many times and see what happens on average.

In [None]:
#import numpy and pyplot

In [None]:
#Run dice throwing simulation with 100 rolls of the dice.
N = 100

game_outcomes =  # Create an array of random integers simulating dice rolls

average_outcome = # Find mean of simulated dice

In [None]:
#Run a second simulation with 100 rolls of the dice.

### Observation

The average values of the two simulation of rolling a dice 100 will likely have significant difference.

Increasing the number of simulations will naturally decrease the variance in simulated outcomes.

In [None]:
#Run simulation with 10,000 trials.

In [None]:
#Run a second simulation with 10,000 trials

**The Central Limit Theorem** Let $X_1, X_2, \ldots, X_n$ be a sequence of independent and identically distributed random variables with mean $\mu$ and finite variance $\sigma^2$. As $n$ becomes large, the distribution of the sample mean $\bar{X}$ approaches a **normal distribution** with mean $\mu$ and variance $\frac{\sigma^2}{n}$.


The central limit theorem informs of two pieces of important information:

1) As we increase the number of simulations we are more likely provided with a more reliable estimate.
2) We can measure how likely an estimate differs from the true expected value within a given distance.

In [None]:
#Repeat the above simulation a large number of times.

N =  #750 total simulations 
M =  # 10,000 dice rolls per simulation

game_outcomes = # Create an array of random integers simulating dice rolls

average_outcomes =  # Create an array of the average outcomes

#Plot histogram of simulated means

In [None]:
 #Find sample mean and standard deviation

In [None]:
# Create histogram of random draws from normal distribution
# with mean and standard deviation that matches the sample

# **The 68-95-99.7 Rule**

If $\sigma$ is the sample standard deviation of an experiment with simulated value $E$ with true mean $\mu$, then there is approximately a $68\%$ chance that $|E-\mu|\leq \sigma$, approximately a $95\%$ that $|E-\mu|\leq 2\sigma$, and a $99.7\%$ chance that $|E-\mu|\leq 3\sigma$.

In [None]:
#Print what we expect by the 68-95-99.7 rule

In [None]:
#Sort experiments if sample mean is within 1,2, and 3 standard deviations of the true mean of 3.5
average_outcomes_one_std = 
average_outcomes_two_std = 
average_outcomes_three_std = 


#Find if the percentage of average outcomes matches expectations.
percent_one_std = 
percent_two_std = 
percent_three_std = 

#Print the percentage of experiments within 1,2, and 3 standard deviations.

## Probabilistic Concepts in Dice Roll Simulation

- The **sample space** of the simple dice game is the possible outcomes of playing the game, i.e. rolling a 1, 2, 3, 4, 5 or 6.

<br>

- The **probability distribution** of the simple dice game is an assignment of a probability to each outcome. The dice is fair, so each outcome has a probability of $1/6$.

<br>

- **Simulation** is used to estimate the expected value. Simulation techniques are the core of finding value to financial instruments and investment strategies.

<br>

- As we increse the number of simulations of the simple dice game, the average values seem to stabilize around a common value, namely the expected value. This phenomena is precisely the content of the **The Theorem of Large Numbers**.

<br>

- As we increase the number of simulations, the variance between simulated expected values and the true expected value decreases. This process aligns with the **Central Limit Theorem**, which states that the sample mean of a sufficiently large number of independent and identically distributed random variables will follow a normal distribution. This theorem is fundamental to quantitative finance, enabling us to apply **probabilisitic methodology** to value portfolios, determine fair stock option pricing, and perform risk assessment with confidence. We can then use **statistical methodology** to measure the confidence of our simulated measurements.

## More Monte-Carlo Simulations


**Estimating $\sqrt{2}$**

Consider the line segment joining the point $(0,0)$ and $(1,1)$. This line segment is of length $\sqrt{2}$ and the subline segment joining $(0,0)$ to $\left(\frac{\sqrt{2}}{{2}}, \frac{\sqrt{2}}{{2}}\right)$ has length $1$. Therefore the probability of randomly selectly a point on this line within the subline segment is $\frac{1}{\sqrt{2}} = \frac{\sqrt{2}}{2}$. Every point on the line segment is of the form $(x,x)$ with $0\leq x\leq 1$ and such a point lies within the subline segment of length $1$ if and only if $2x^2 \leq 1$. We therefore can estimate $\sqrt{2}$ by selecting $x$ from the unit interval $[0,1]$ with uniform distribution and measure the ratio of $x$-values with the property that $2x^2\leq 1$. Multiplying the result by $2$ yields an estimate for $\sqrt{2}$.  

In [None]:
#Run 10,000 experiments to simulate the value of \sqrt{2}
N = 

X =  # select N random values in the unit interval

Y = # store random values that satisfy 2x^2<= 1

sqrt_2_estimate = #Estimate sqrt 2 from the simulation

#Print the simulated value of \sqrt{2}.

In [None]:
#Compare simulated value with the actual value of \sqrt{2} given by numpy.

**Question**: Consider the uniform distribution on the unit interval $[0,1]$. 

What is the expected number of draws needed for the sum to exceed $1$?

In [None]:
#Define a function that performs a simulation of drawing numbers at random from
#the uniform distribution until the sum exceeds 1 and returns average number of draws needed


def exceed_one():
    s  # s will be the sum of random draws from unit interval
    n  # n is the number of draws needed until the sum exceeds 1
    while :
        
    return n


#Perform 10,000 simulations
N =  #Number of simulations
X =  # store number of draws needed over N simulations

#Print average number of draws needed.

## Mathematical Solution

With the right trick, it is not difficult to prove that the true expected value is Euler's constant $e$.

**Sketch of Proof**

Given a real number $x\geq 0$ let $f(x)$ be the expected number of draws from the unit interval so that the sum exceeds $x$. 

Given $x\in \mathbb{R}$, to compute the the probability $f(x)$ we consider the conditional relation that the expected number of random draws needed for the sum to exceed $x$ is $1$ more than average number of expected draws needed over the interval $(x-1,x]$. This is expressed as an equation

$$
f(x)= 1 + \int_{x-1}^{x} f(z)\,dz.
$$

If $z<0$ then $f(z)=0$. We therefore restrict our attention to $x$-values which lie in the unit interval $[0,1]$ so that

$$
f(x)= 1 + \int_{x-1}^{x} f(z)\,dz = 1 + \int_{0}^{x} f(z)\,dz.
$$

Taking derivatives of the furtherest-left and furtherest-right expressions in the equation, and applying the Fundamental Theorem of Calculus;

$$
f'(x) = f(x).
$$

We have an initial condition that $f(0)=1$. Therefore the solution function for $0\leq x\leq 1$ is $f(x)=e^x$. In particular, $f(1)=e$ is the expected number of draws needed from the unit interval with uniform distribution for the sum of the random variables to exceed $1$.