<a href="https://colab.research.google.com/github/pastapie/predictionValueAnalysis/blob/main/powerAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Assumptions in `simulate_market`:
- **Participant Distribution**: Assumes that 10% of participants are 'strategic', meaning they potentially have better information or models, reflected in their lower variability and smaller biases.
- **Biases**:
  - Strategic participants have biases uniformly distributed between -0.1 and 0.1, suggesting they are less biased.
  - Normal participants have biases uniformly distributed within a wider range (-0.5 to 0.5), indicating greater variability in their judgment or information.
- **Standard Deviations**:
  - Strategic participants have standard deviations (a measure of prediction variability) uniformly distributed between 0.1 and 0.5, implying more precise predictions.
  - Normal participants have standard deviations uniformly distributed between 0.5 and 2, indicating less precision in their predictions.
- **Prediction Distribution**: Uses a heavy-tailed t-distribution for generating predictions, which accounts for the likelihood of extreme values more realistically than a normal distribution. The degrees of freedom parameter (set to 5) controls the fatness of the tails.
- **Market Consensus**: The market consensus is determined by the median of all predictions, which is robust against outliers and does not disproportionately allow extreme predictions to sway the overall market prediction.
- **Error Measurement**: The accuracy of the market is assessed based on whether the absolute difference between the market consensus and the true value is within a specified accuracy threshold (5%).

### Assumptions in `find_minimum_participants`:
- **Statistical Power**: It requires that at least 80% of trials must achieve the desired accuracy to consider the market reliable. This threshold is a common standard in statistics to ensure a high level of confidence in the results.
- **Maximum Participants**: Limits the search for the minimum required participants to 5000, implying that the simulation does not explore scenarios with more than 5000 participants.
- **Iteration Through Participants**: The function iteratively increases the number of participants from 1 to the maximum specified, simulating the market for each participant count and checking if the desired accuracy is achieved in 80% of trials. This assumes a linear exploration of participant numbers might be sufficient to find the threshold.

### General Assumptions:
- **Independence**: Assumes that each participant's prediction is independent of others', not accounting for potential correlations or information cascades that might occur in real prediction markets.
- **Static Parameters**: All parameters (biases, standard deviations, degrees of freedom) are fixed throughout the simulations. It does not account for possible changes in market dynamics over time or under different conditions.
- **Uniform Distribution of Parameters**: The choice of uniform distribution for biases and standard deviations assumes equal likelihood of all values within the range, which may not accurately reflect real-world distributions.

In [None]:
def simulate_market(n, true_value, desired_accuracy=0.05, bias_range=(-0.5, 0.5), sd_range=(0.5, 2), degrees_of_freedom=5):
    """
    Simulate a prediction market with n participants using a heavy-tailed t-distribution for predictions.
    Introduce a subset of 'strategic' participants who have lower variability and potential informational advantages.
    """
    # Assuming the top 10% of participants are 'strategic'
    strategic_count = int(n * 0.1)
    normal_count = n - strategic_count

    # Strategic participants have smaller biases and variances
    strategic_biases = np.random.uniform(-0.1, 0.1, strategic_count)
    strategic_sds = np.random.uniform(0.1, 0.5, strategic_count)
    strategic_predictions = np.random.standard_t(degrees_of_freedom, strategic_count) * strategic_sds + strategic_biases + true_value

    # Normal participants
    normal_biases = np.random.uniform(bias_range[0], bias_range[1], normal_count)
    normal_sds = np.random.uniform(sd_range[0], sd_range[1], normal_count)
    normal_predictions = np.random.standard_t(degrees_of_freedom, normal_count) * normal_sds + normal_biases + true_value

    # Combine predictions
    predictions = np.concatenate((strategic_predictions, normal_predictions))
    market_consensus = np.median(predictions)

    return abs(market_consensus - true_value) <= desired_accuracy

def find_minimum_participants(true_value, trials=1000, max_participants=5000, desired_accuracy=0.05):
    """
    Find the minimum number of participants needed for the prediction market to achieve the desired accuracy.
    Includes strategic behavior and heavy-tailed prediction distributions.
    """
    for n in range(1, max_participants):
        successful_trials = sum(simulate_market(n, true_value, desired_accuracy) for _ in range(trials))
        if successful_trials / trials >= 0.8:  # 80% of the trials must be successful to be considered reliable
            return n
    return None

In [None]:
minimum_participants = find_minimum_participants(true_value=100, max_participants=1000)
print(f"Minimum participants needed: {minimum_participants}")

Minimum participants needed: 779


In [None]:
minimum_participants = find_minimum_participants(true_value=100, max_participants=1000, desired_accuracy=0.10)
print(f"Minimum participants needed: {minimum_participants}")

Minimum participants needed: 206


In [None]:
minimum_participants = find_minimum_participants(true_value=100, max_participants=10000, desired_accuracy=0.10)
print(f"Minimum participants needed: {minimum_participants}")

Minimum participants needed: 200


In [None]:
minimum_participants = find_minimum_participants(true_value=100, max_participants=50000, desired_accuracy=0.10)
print(f"Minimum participants needed: {minimum_participants}")

Minimum participants needed: 208
