# Playing the Roulette Experiment

Consider a European-style roulette which includes numbers between 0 and 36. In roulette there are many different betting alternatives but we consider here the simplest case where we bet on a single number. If we pick the right number then we win 35 times what we bet.

Consider the following scenario: 
* We start playing roulette with a fixed budget.
* Every spin of the roulette costs one unit of budget. 
* We play until we run out of budget and we always bet on the same number. 

Create a function called `roulette` that implements this game. It takes two inputs: the budget and the number we decided to play all the time. It outputs our budget throughout the whole game until it ends.

In [34]:
# code here

Let’s play one game with a budget of 15 and betting on the number 8. Use your new function applying these two parameters with a seed of `2023`.

In [35]:
# code here

We can ask ourselves many questions about such a game. For instance:

* What is the probability that the game last exactly 15 spins if I start with a budget of 15 and bet on the number 8?
* What is the average length of the game starting with a budget of 15 and betting on the number 8?
* What is the average maximum wealth I have during a game started with a budget of 15 and betting on the number 8?

We will develop various Monte Carlo experiments to answer all the above questions. In each case, we simply need to modify the function roulette to output some summary about the game.

**What is the probability that the game last exactly 15 spins if I start with a budget of 15 and bet on the number 8?**

If a game where I started with a budget of 15 ends after 15 spins, it means that the number 8 never showed up. We can adapt the function roulette to output `True` if the length of a new vector `wealth` that tracks the money we have is exactly equal to `budget` (this would be the same as having the same budget as games played).

In [36]:
# code here

So if we execute the new function we should obtain either a True or a False.

In [37]:
# code here

Let’s replicate the experiment 1000 times. The proportion of True we observe is our estimate of this probability.

In [38]:
# code here

[INTERPRET HERE THE RESULTS OF THE EXPERIMENT AT THIS POINT]

Notice that actually we could have also computed this probability exactly. How can we do this using the Random Variables studied in class?

In [39]:
# code here

**Question**: Are we correctly approximating well this probability? [ANSWER HERE THE QUESTION]

**What is the average length of the game starting with a budget of 15 and betting on the number 8?**

We can answer this question by adapting the roulette function to output the length of the vector wealth.

In [40]:
# code here

Let’s replicate the experiment 1000 times and summarize the results with a plot (it may take some time to run the code!)

In [41]:
# code here

[INTERPRET HERE THE RESULTS IN THE PLOT]

Because of the distribution we are obtaining we should use a specific measure as the central tendency measurement. Calculate that central tendency statistic and explain this result.

In [42]:
# code here

[WRITE YOUR ANSWER HERE]

**What is the average maximum wealth we have during a game started with a budget of 15 and betting on the number 8?**

We can answer this question by adapting the roulette function to output the maximum of the vector `wealth`.

In [43]:
# code here

Again, replicate the experiment, aggregate the results and explain what you have obtained.

In [44]:
# code here

[RESULTS INTERPRETATION]

# Two samples t-test Experiment

In just a few minutes we can simulate thousands of experiments using Monte Carlo Simulation. In real life any one of these experiments might take weeks or months to be conducted.

A Monte Carlo t-test is therefore simply a repetitive simulation of a random sample and statistical testing performed on it. 

You will have to create a Monte Carlo experiment that generates a random sample, performs a t-test, and calculates a p-value associated to that t-test.

Hundreds or thousands of these cycles should be runned for a given simulation. You will calculate the long-run success of an experiment based upon a set of assumptions.

## Information in the experiment:

Let us imagine that we are carrying out a study with a medical team, and we want to measure the difference in means between smokers and non-smokers in the RWT variable (an echocardiography parameter that helps us to identify possible myocardial infarcts).

These doctors ask us to simulate different samples to determine how many patients should be used for the study, ensuring a p-value lower than 0.05. They tell us that, according to the scientific literature, the average RWT (a normally distributed variable) for non-smokers is 1.98 (SD = 0.25) and 2.12 (SD = 0.18) for smokers. They also tell us they are going to control the number of patients in each group, so there will be half smokers and half non-smokers.

Remember the steps to follow in any Monte Carlo simulation:

* Define the domain of the variables to be simulated.
* Choose the appropriate distribution to generate random numbers for each of your variables.
* Determine the statistic of interest for your study (hint: in this case it is specified in the exercise).
* Aggregate the results of several replicates and summarize them.

## Instructions of the experiment

1. You have to generate, using the tools that we have learned, the response of two samples of people (smokers and non-smokers) in a echocardiographic variable named RWT (we don't care too much about the meaning of this variable, but RWT is for Relative Wall Thickness of the Hart) given some predefined parameters (mean and sd) for each group. The sample size at the beginning is not important, since this will be a variable that will vary in future steps.
2. Secondly, you should use the results of that variables you have simulated to compute a t-test that compares the means of the two groups. At this point we are interested in the p-value that results from this analysis.
3. Thirdly, you have to store in a function all that experiment, since we will replicate the experiment at least 100 times. 
4. Once you have your function, you will have to replicate the experiment 100 times for different sample sizes (that's why we said before that the sample size will vary). You can do it for a sequence of sample sizes in between 5 and 100 people in each group.
5. Finally, you have to aggregate the results of the 100 replicas in each sample size and summarize and interpret everything so we can answer to the main question of this study: **What is the minimum sample size we will need to perform the medical experiment with a significant p-value?**

This second part of the lab is non-guided, but you can always ask your professor if you have any doubt about the experiment.

In [54]:
import numpy as np
import matplotlib as plt
from scipy.stats import norm
import random
import simpy

# Function that generates an array of 10 random Relative Wall Thikness for a random smoker (using random library)
def RWTOfRandomSmoker(sampleSize):
    return np.random.normal(2.12, 0.18, sampleSize) # Smoker data

# Function that generates an array of 10 random Relative Wall Thikness for a random non-smoker (using random library)
def RWTOfRandomNonSmoker(sampleSize):
    return np.random.normal(1.98, 0.25, sampleSize) # Non smoker data


# set variable names for each random sample 
# This is not necessary if the values are already in 
# nonSmokerSample, smokerSample = RWTOfRandomNonSmoker(), RWTOfRandomSmoker()

In [106]:
# Obtain the results of the simulation of a t-test
# Each run will give a different p value because the functions are
# being called every time this function runs.

def experiment(sampleSize):
    from scipy.stats import ttest_ind
    statistic, pvalue = ttest_ind(RWTOfRandomNonSmoker(sampleSize), RWTOfRandomSmoker(sampleSize))
    return pvalue
# Conduct the experiment with a sample size of 10 in each group
experiment(10)

0.7473733290888769

In [130]:
# Create a list named pvalues
pvalues = []
sampleSizes = []

# Execute the experiment 100 times with different, random sample sizes
# Between the numbers 5 and 100
def manyExperiments():
    # Call the pvalues list from outside the function
    global pvalues
    global sampleSizes
    # doing the experiment 100 times
    for i in range(100):
        # Generate a random sampleSize for this iteration
        sampleSize = random.randint(5, 100)
        # Append to the pvalues list the each significant pvalue 
        pvalue = experiment(sampleSize)
        if pvalue <= 0.05:
            pvalues.append(pvalue)
            sampleSizes.append(sampleSize)
manyExperiments()

# Print the answer of the minimum p value significant sample size
print(f"The minimum sample size is around {min(sampleSizes)}")

The minimum sample size is around 5
