<a href="https://colab.research.google.com/github/microprediction/monteprediction_colab_examples/blob/main/monteprediction_entry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sector Monte Carlo Game

A game where you submit one million Monte Carlo samples of sector ETF weekly returns and are rewarded based on how many of your samples are close to the ground truth (compared, that is, to how many of your competitors are also close - precise details below). Challenge yourself against other participants wielding what they claim to be SOTA methods for modeling joint distributions. You can enter merely by modifying this example notebook and running it. You can choose to follow the same rough pattern, or devise an entirely different method of generating representative samples of returns of the eleven sector ETFs measured from close to close over a one week period.  



In [None]:
!pip install scikit-learn
!pip install scipy
!pip install --upgrade monteprediction

import yfinance as yf
import pandas as pd

import numpy as np
from datetime import datetime, timedelta
import time
from monteprediction import SPDR_ETFS
from monteprediction.calendarutil import get_last_wednesday
from monteprediction.submission import send_in_chunks

# Factory defaults
num_samples_per_chunk = int(1048576/8)
num_chunks = 8
num_samples = num_chunks*num_samples_per_chunk

## Step 1. Create a dataframe with just over one million hypothetical weekly returns for each sector.   

Do this however you like this is just an example. One column per sector.

In [None]:
# This example uses Quasi-Monte Carlo on the empirical covariance
# There is absolutely no requirement you follow this pattern

from scipy.stats.qmc import MultivariateNormalQMC
from sklearn.covariance import EmpiricalCovariance

# Get historical weekly returns
last_wednesday = get_last_wednesday()
num_weeks = int(52+4*52*np.random.rand())
start_date = last_wednesday - timedelta(weeks=num_weeks)
data = yf.download(SPDR_ETFS, start=start_date, end=last_wednesday, interval="1wk")
weekly_prices = data['Adj Close']
weekly_returns = weekly_prices.pct_change().dropna()

# Use cov estimation to generate samples
cov_matrix = EmpiricalCovariance().fit(weekly_returns).covariance_
qmc_engine = MultivariateNormalQMC(mean=np.zeros(len(SPDR_ETFS)), cov=cov_matrix)
samples = qmc_engine.random(num_samples)
df = pd.DataFrame(columns=SPDR_ETFS, data = samples)
print(df[:3])

# Verify submission
assert len(df.index)==num_samples,f'Expecting exactly {num_samples} samples'
assert list(df.columns)==SPDR_ETFS,'Columns should match SPDR_ETFS in order'


[*********************100%%**********************]  11 of 11 completed


        XLB       XLC       XLE       XLF       XLI       XLK       XLP  \
0  0.028453  0.004129  0.066391  0.032033  0.015521 -0.002501  0.011955   
1 -0.020548 -0.012266 -0.063999 -0.054679 -0.027622  0.005228 -0.014539   
2 -0.007326 -0.016351 -0.000677  0.014795  0.015111  0.003505  0.032404   

       XLRE       XLU       XLV       XLY  
0  0.038596  0.006206  0.008662  0.020514  
1 -0.017256 -0.016046 -0.010144 -0.026960  
2 -0.004032  0.034872  0.023066 -0.010453  


## Step 2. Submit the dataframe

In [None]:
YOUR_EMAIL = 'monteprediction_entry@monteprediction.com'  # Be sure to change this
send_in_chunks(df, num_chunks=num_chunks, email=YOUR_EMAIL)

Chunk 0 of 8 sent successfully.
Chunk 1 of 8 sent successfully.
Chunk 2 of 8 sent successfully.
Chunk 3 of 8 sent successfully.
Chunk 4 of 8 sent successfully.
Chunk 5 of 8 sent successfully.
Chunk 6 of 8 sent successfully.
Chunk 7 of 8 sent successfully.


# Explaining the game ...
Here's how the reward system works assuming you are participant $i$. Your samples $\{x_{ik}\}_{k=0}^{n-1}$, where each $x_{ik}$ for $k=0,\dots,2^{20}-1$ is an 11-vector, are used to imply an unnormalized prediction density for $z \in \mathbf{R}^{11}$ as:

$$\rho_i(z) = \frac{1}{n} \sum_{k=0}^{n-1} \exp(-a \|x_{ik}-z \|_2) $$

where $a$ is a system parameter set at approximately $a=300$.

Let us suppose you have an initial wealth $W_i$. A system parameter $b_i=0.1$ is the fraction of your total wealth you deploy. You are considered to invest $\Omega_i = b_i W_i$ and similarly for other participants yielding a total investment of $\Omega = \sum_i \Omega_i$. This pot will be split when the truth $z$ is revealed.

To this end your 'mass' is $Q_i(z) = \Omega_i \rho_i(z)$ represents loosely how many of your samples are close to $z$ weighted by your wealth, and the total mass near $z$ supplied by all participants is $Q(z) = \sum_i Q_i(z)$. Your payout is your propotional share of total investment, namely $\Omega \frac{Q_i(z)}{Q(z)}$. Your net profit is $\delta_i(z) = \Omega \frac{Q_i(z)}{Q(z)} - \Omega_i$.  *italicized text*

It should be apparent that $Q$ plays the role of an unnormalized market probability (i.e. risk-neutral density) and further, that a participant with perfect knowledge of the true density $P$ will at worst break even against any opponents' play, subject only to the ability to approximate $P$ with a collection of Monte Carlo paths in this fashion.

Because this entry has been included in the mix, and is not particularly clever, there is a subsidy for participation for anyone taking even a moment to reflect statistically upon the problem (for instance, by applying shrinkage to the covariance estimation or fixing the 1-margins).


In [None]:
# The 'score" is your density
#    distances = np.linalg.norm(samples - z, axis=1)
#    score = np.sum(np.exp(-h * distances))

from monteprediction.truth import get_most_recent_truth
from monteprediction.scoring import compute_score
z = get_most_recent_truth()
score = compute_score(samples=df.values,z=z)
print(f"Total Score: {score}")


[*********************100%%**********************]  11 of 11 completed


Total Score: 0.06990740503342224


### Some suggestions

Ask GPT!

*   It will send you to LedoitWolf or ShrunkCovariance from [sklearn.covariance](https://scikit-learn.org/stable/modules/covariance.html)
*   Feel free to use covariance estimation methods from [precise](https://github.com/microprediction/precise/tree/main/precise/skaters/covariance)


Copula approach:


* For each financial return series in weekly_returns, fit a marginal distribution. This could be any distribution that fits your data well (e.g., normal, t-distribution, etc.). Use techniques like Maximum Likelihood Estimation (MLE) or Kernel Density Estimation (KDE) for this fitting.
Transform to Uniform Marginals:


* Apply the cumulative distribution function (CDF) of the fitted marginal distributions to transform your data into uniform marginals.

* Choose an appropriate copula to model the dependence structure. Common choices include Gaussian, t, Clayton, Gumbel, and Frank copulas. The choice depends on the nature of dependence you're modeling (e.g., tail dependence).
You may need to test different copulas to see which one fits your data best.

* Fit the selected copula to the uniform marginals. This usually involves estimating parameters that best capture the dependence structure in your data.
Techniques like Maximum Likelihood or Inversion of Kendall's Tau can be used for parameter estimation.

* Once the copula is fitted, use it to generate new samples of uniform marginals that maintain the dependence structure of your original data.
Transform these uniform marginals back to the original scale using the inverse CDFs of the marginal distributions you fitted initially. These transformed values are your simulated data points.
Reproducibility and Randomness:

Python libraries like scipy, statsmodels, or copulas can be useful.
