# Simulation of a simple example of a replication crisis for a scientific method
### Michael Cohen

In this simple simulation, we have a method to test hypotheses  (called hypothsis_testing_method).
If the hypothsis is true, the method will say so.
If the hypothsis is in fact false, the method will say it is true with a low probably (called my_p_value).

Nevertheless, if among our hypothses, the vast majority of them are in fact false (determied by setting rate_of_true_hypothses to have alow value), then among the hypothses that our method say that are true, many in fact will be false. To see the exact values, run the cell.

This simulation demonstrates how it is possible that a scientific method that seems to have very good performance (in terms of its recall and p-value) can still produce many false positives, if the majority of the hypotheses considered are in fact false. If this sounds to you like the base rate fallacy, you are in right track!

For further information, see the example mentioned in this Veritasium video (minute 2:30 onwards):
[Is Most Published Research Wrong?
](https://www.youtube.com/watch?v=42QuXLucH3Q)

For the connection between the base rate fallacy and the replication crisis, see for instance:

[Understanding the Replication Crisis as a Base Rate Fallacy
](https://www.journals.uchicago.edu/doi/full/10.1093/bjps/axy051)


In [13]:
import random


# this value is the probablity that our method will say 
# that a hypothsis is true, if it is in fact false.
my_p_value = 0.04 # value like in the tutorial exercise 

# This function takes a boolean (actual truth value of a given hypothsis)
# and returns a boolean (the predicted truth value of the hypthesis according to the method).
def hypothsis_testing_method(hypothesis):
    
    # This means that whenever the hypothsis is true
    # the method will predict so
    if hypothesis is True:
        return True
    
    
    # This means that whenever the hypothsis is false
    # the method will say its false (1 - my_p_value) of the times   
    else:    
        random_bool = random.random() > my_p_value
        return not random_bool
        
    
# The number of hypothses we are going to check
hypothses_num = 1300

# the rate of true hypothses among them
rate_of_true_hypothses = 0.038 # like in the tutorial exercise 

#we will cuont these integers in our experiment
true_hypothses = 0
false_hypothses = 0
method_predicted_true = 0
method_predicted_false = 0

for i in range(hypothses_num):
    
    # deciding if the current hypothsis is true or false
    current_hypothesis = random.random() < rate_of_true_hypothses
    
    if current_hypothesis:
        true_hypothses += 1
    else: 
        false_hypothses += 1
    
    # Testing the current hypothsis with our method
    if hypothsis_testing_method(current_hypothesis):
        method_predicted_true += 1
    
    else:
        method_predicted_false += 1
    
   
print("Our method predicted true ",method_predicted_true, " times")
print("The hypothses was in fact true ", true_hypothses, "times")
print("The probablity that the hypothsis is in fact true if the method says so is")
print(true_hypothses/method_predicted_true)
    

Our method predicted true  114  times
The hypothses was in fact true  64 times
The probablity that the hypothsis is in fact true if the method says so is
0.5614035087719298
