# Pi Estimation Using Monte Carlo

In this exercise, we will use MapReduce and a Monte-Carlo-Simulation to estimate $\Pi$.

If we are looking at this image from this [blog](https://towardsdatascience.com/how-to-make-pi-part-1-d0b41a03111f), we see a unit circle in a unit square:

![Circle_Box](https://miro.medium.com/max/700/1*y-GFdC5OM0ZtYfbfkjjB2w.png)

The area:

- for the circle is $A_{circle} = \Pi*r^2 = \Pi * 1*1 = \Pi$
- for the square is $A_{square} = d^2 = (2*r)^2 = 4$

The ratio of the two areas are therefore $\frac{A_{circle}}{A_{square}} =  \frac{\Pi}{4}$

The Monte-Carlo-Simulation draws multiple points on the square, uniformly at random. For every point, we count if it lies within the circle or not.

And so we get the approximation:

$\frac{\Pi}{4} \approx \frac{\text{points_in_circle}}{\text{total_points}}$

or

$\Pi \approx 4* \frac{\text{points_in_circle}}{\text{total_points}}$



If we have a point $x_1,y_1$ and we want to figure out if it lies in a circle with radius $1$ we can use the following formula:

$\text{is_in_circle}(x_1,y_1) = 
\begin{cases}
    1,& \text{if } (x_1)^2 + (y_1)^2 \leq 1\\
    0,              & \text{otherwise}
\end{cases}$

## Implementation
Write a MapReduce algorithm for estimating $\Pi$

### Running the Job


Unfortunately, the library does not work without an input file. I guess this comes from the fact that the hadoop streaming library also does not support this feature, see [stack overflow](https://stackoverflow.com/questions/22821005/hadoop-streaming-job-with-no-input-file).

We fake the number of mappers with different input files. Not the most elegant solution :/


In [None]:
!python pi.py /data/dataset/text/small.txt

In [None]:
!python pi.py /data/dataset/text/holmes.txt