<a href="https://colab.research.google.com/github/microprediction/monteprediction_colab_examples/blob/main/monteprediction_entry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Imports
Just run this. No need to modify.

In [11]:
!pip install scikit-learn
!pip install scipy
!pip install --upgrade monteprediction

import yfinance as yf
import pandas as pd
from scipy.stats.qmc import MultivariateNormalQMC
import numpy as np
import json
import sys
from datetime import datetime, timedelta
import pandas as pd
import time
from monteprediction import SPDR_ETFS
from monteprediction.calendarutil import get_last_wednesday
from monteprediction.submission import send_in_chunks

# Factory defaults
num_samples_per_chunk = int(1048576/8)
num_chunks = 8
num_samples = num_chunks*num_samples_per_chunk

## Step 1. Create a dataframe with just over one million hypothetical weekly returns for each sector.   

Do this however you like this is just an example. One column per sector.

In [14]:
# This example uses Quasi-Monte Carlo on the empirical covariance
# There is absolutely no requirement you follow this pattern

last_wednesday = get_last_wednesday()
num_weeks = int(52+4*52*np.random.rand())
start_date = last_wednesday - timedelta(weeks=num_weeks)
data = yf.download(SPDR_ETFS, start=start_date, end=last_wednesday, interval="1wk")
weekly_prices = data['Adj Close']
weekly_returns = weekly_prices.pct_change().dropna()
from sklearn.covariance import EmpiricalCovariance         # See sklearn for many alternatives
cov_matrix = EmpiricalCovariance().fit(weekly_returns).covariance_
qmc_engine = MultivariateNormalQMC(mean=np.zeros(len(SPDR_ETFS)), cov=cov_matrix)
samples = qmc_engine.random(num_samples)
df = pd.DataFrame(columns=SPDR_ETFS, data = samples)
print(df[:3])

# Verify submission
assert len(df.index)==num_samples,f'Expecting exactly {num_samples} samples'
assert list(df.columns)==SPDR_ETFS,'Columns should match SPDR_ETFS in order'


[*********************100%%**********************]  11 of 11 completed


        XLB       XLC       XLE       XLF       XLI       XLK       XLP  \
0  0.028453  0.004129  0.066391  0.032033  0.015521 -0.002501  0.011955   
1 -0.020548 -0.012266 -0.063999 -0.054679 -0.027622  0.005228 -0.014539   
2 -0.007326 -0.016351 -0.000677  0.014795  0.015111  0.003505  0.032404   

       XLRE       XLU       XLV       XLY  
0  0.038596  0.006206  0.008662  0.020514  
1 -0.017256 -0.016046 -0.010144 -0.026960  
2 -0.004032  0.034872  0.023066 -0.010453  


## Step 2. Submit the dataframe

In [15]:
YOUR_EMAIL = 'monteprediction_entry@monteprediction.com'  # Be sure to change this
send_in_chunks(df, num_chunks=num_chunks, email=YOUR_EMAIL)

Chunk 0 of 8 sent successfully.
Chunk 1 of 8 sent successfully.
Chunk 2 of 8 sent successfully.
Chunk 3 of 8 sent successfully.
Chunk 4 of 8 sent successfully.
Chunk 5 of 8 sent successfully.
Chunk 6 of 8 sent successfully.
Chunk 7 of 8 sent successfully.


### Just for interest...
Here's how the reward system works. Your samples $\{x_{ik}\}_{k=0}^{n-1}$ are used to imply a prediction density

$$\rho_i(z) = \frac{1}{n} \sum_{k=0}^{n-1} \exp(-a \|x_{ik}-z \|_2) $$

We don't need to worry about the normalizing constant. $a$ is a system parameter set at approximately $a=300$.

Let us suppose you have an initial wealth $W_i$. A system parameter $b_i=0.1$ is the fraction of your total wealth you deploy. Your investment is therefore $Q_i = b_i W_i$ and similarly for other participants yielding a total investment of $Q = \sum_i Q_i$. This pot will be split when the truth $z$ is revealed.

To this end your 'mass' is $m_i(z) = b_i w_i \rho(i)$ and the total mass is $m(z) = \sum_i m_i(z)$. Your payout is your density-share of total investment, namely the amount $Q \frac{m_i(z)}{m(z)}$. Your net profit is $\delta_i(z) = Q \frac{m_i(z)}{M(z)} - b_i w_i$.  




In [16]:
# The 'score" is your density
#    distances = np.linalg.norm(samples - z, axis=1)
#    score = np.sum(np.exp(-h * distances))

from monteprediction.truth import get_most_recent_truth
from monteprediction.scoring import compute_score
z = get_most_recent_truth()
score = compute_score(samples=df.values,z=z)
print(f"Total Score: {score}")


[*********************100%%**********************]  11 of 11 completed


Total Score: 0.06990740503342224
