In [None]:
from dataclasses import dataclass


@dataclass
class Point:
    x: float
    y: float

    # lets add a dunder method to nicely format the point if we print it
    def __repr__(self) -> str:
        return f"Point({self.x:.2f}, {self.y:.2f})"

Let's start with creating a dataclass. They were introduced with python 3.7 with [PEP 557](https://peps.python.org/pep-0557/).

For more examples, and alternatives to dataclasses, see the [codestyle library](https://github.com/raoulg/codestyle/blob/main/docs/pydantic.md) entry on dataclasses and pydantic


In [None]:
p = Point(2.0, 3.0)
p

This notebook wants to showcase a basic monte carlo techniques. In general, monte carlo techniques are very useful when it is difficult to calculate the exact solution. Sometimes it is much easier (or sometimes the only option!) to get the answer with sampling techniques.

The basic idea is this: yes, we know we can derive $\pi$, that described the relationship between the circumference and the diameter of a circle, analytical. But what if we didn't know that? How could we estimate $\pi$? 

The idea is: let's take a circle inside a square, and lets drop random balls in the square. Some balls will fall inside the circle, some will fall outside the circle. The totality of balls dropped in the square gives us an idea of the area of the square. The balls that fall inside the circle give us an idea of the area of the circle. If we divide the area of the circle by the area of the square, we get an estimate of $\pi$:

First, we will create a `MonteCarloCircle` class, where we store the radius of the circle, and we prepare a uniform distribution that will simulate dropping random balls inside the square.

We will add a `.generate_points` method, that will randomly generate a point within the square (where the ball is dropped).

Then, we will add a method that finds out if the ball is inside the circle, using pythagoras theorem.

Finally, we add a method that gives us a generator that will drop balls and tell us if they are inside the circle or not until we interupt it.

In [None]:
from scipy import stats


class MonteCarloCirle:
    def __init__(self, radius: float) -> None:
        self.radius = radius
        self.dist = stats.uniform(loc=-radius, scale=2 * radius)

    def __repr__(self) -> str:
        return f"MonteCarloCirle(radius={self.radius:.2f})"

    def generate_points(self) -> Point:
        x, y = self.dist.rvs(2)
        return Point(x, y)

    def is_in_circle(self, point: Point) -> bool:
        return (point.x**2 + point.y**2) <= self.radius**2

    # yield points and whether they are in the circle
    def generate(self) -> tuple[Point, bool]:
        while True:
            point = self.generate_points()
            yield point, self.is_in_circle(point)

Lets check how this works:

In [None]:
pointgenerator = MonteCarloCirle(radius=1).generate()
next(pointgenerator)

Looks good. We get coordinates of points inside the square, and we get a boolean that tells us if the point is inside the circle or not.

Now, lets run this simulation for 500 times, and plot the result.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(figsize=(5, 5))

for _ in range(500):
    point, inside = next(pointgenerator)
    ax.scatter(point.x, point.y, color="red" if inside else "blue", alpha=0.3)

The area of the square is $(r * 2)^2$, while the area of the circle is $\pi r^2$. The ratio of the two is:

$$\frac{\pi r^2}{(2r)^2} = \frac{\pi}{4}$$

Because we can rewrite $(r*2)^2$ as $4r^2$, and remove $r^2$ from the numerator and denominator.

This means, if we randomly sample points in the square, the ratio of points in the circle to the total number of points should be $\pi / 4$.

So we can estimate $\pi$ by multiplying the ratio $\frac{inside}{total}$ by 4.

Lets wrap this in a method that approximates $\pi$ for us:

In [None]:
class ApproxPi:
    def __init__(self, maximum: int, report: int) -> None:
        self.max = maximum
        self.pointgenerator = MonteCarloCirle(radius=1).generate()
        self.inside = 0
        self.total = 0
        self.report = report
        self.log = []

    def run(self) -> list:
        for _ in range(self.max):
            self.total += 1
            _, is_inside = next(self.pointgenerator)
            if is_inside:
                self.inside += 1

            if self.total % self.report == 0:
                self.log.append(4 * self.inside / self.total)

        return self.log

Now lets test it!

In [None]:
import numpy as np

approx_pi = ApproxPi(maximum=40000, report=100).run()
approx_pi[-1] - np.pi

Ok, not bad! When I did this, I got a difference of 0.008. However, this is a random process! This means that if we run it again, we will get a different result!

Also, the first few results (lets say, for the first 1000 points) can be way off. The process will converge towards the correct value, so it makes sense to drop the first few results. This is often called the "burn-in" period.

Often, in monte carlo simulations, we want to take the average or standard deviation after a burn in. In our case, we take can just take the last fraction of the results, because that will be the most accurate.

Lets run 16 threads, and see how the results converge.

In [None]:
avg_result = []
threads = 16
max_runs = 20000
report_every = 100
custom_palette = sns.color_palette("rocket", threads)

for i in range(threads):
    approx_pi = ApproxPi(maximum=max_runs, report=report_every).run()
    # use colors from 'Set1' palette
    plt.plot(range(len(approx_pi)), approx_pi, color=custom_palette[i], alpha=0.6)
    avg_result.append(approx_pi[-1])  # we just keep the last fraction

# add a horizontal line at pi for reference
plt.axhline(y=3.1415, color="red", linestyle="--")
diff = np.pi - np.mean(avg_result)
print(f"The difference between the average of the {threads} runs and pi is {diff:.6f}")