<a href="https://colab.research.google.com/github/microprediction/endersnotebooks/blob/main/regression_attacker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install --upgrade git+https://github.com/microprediction/endersgame.git
# It's probably fine to use the simpler import by the time your read this :)
#!pip install --upgrade endersgame

Collecting git+https://github.com/microprediction/endersgame.git
  Cloning https://github.com/microprediction/endersgame.git to /tmp/pip-req-build-ahgc85_7
  Running command git clone --filter=blob:none --quiet https://github.com/microprediction/endersgame.git /tmp/pip-req-build-ahgc85_7
  Resolved https://github.com/microprediction/endersgame.git to commit 0ccd0e66c6171baa04bafc36886e836b5e4aceb5
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting river (from endersgame==0.4.3)
  Downloading river-0.21.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.0 kB)
Downloading river-0.21.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: endersgame
  Building wheel for endersgame (setup.py) ... [?25l[?25hdone
  Created wheel for endersgame: filename=endersgame-0.4.3-py3-none-an

# Regression Attacker
This notebook demonstrates how to create an `Attacker` described in [attacker.md](https://github.com/microprediction/endersgame/blob/main/endersgame/attackers/attacker.md). You may want to glance at this [notebook](https://github.com/microprediction/endersnotebooks/blob/main/mean_reversion_attacker.ipynb) also, if you seek more context or wish to know how these attackers can be used in a new tournament.

Here we'll use the river package to update a running regression.

In [9]:
from endersgame import Attacker, HORIZON, EPSILON
from river import linear_model
from collections import deque
from endersgame import Attacker
from endersgame import stream_generator_generator
from pprint import pprint
from endersgame.accounting.pnlutil import zero_pnl_summary, add_pnl_summaries

### Creating a Momentum based Attacker
We derive from `Attacker` and use `linear_model.LinearRegression` from the river package to maintain a regression estimate of the value `HORIZON` steps ahead. Then, we `buy` if the prediction is considerably higher than `EPSILON` above the current value, and conversely.



In [5]:
class MyAttacker(Attacker):
    """
    An attacker that uses an online linear regression model to predict future values
    and make trading decisions based on the expected profit exceeding EPSILON.
    """

    def __init__(self, num_lags=5, threshold:float=1.0, burn_in=1000, **kwargs):
        """
        Initializes the attacker.

        Parameters:
        - lag (int): Number of lagged values to use as features.
        """
        super().__init__(**kwargs)
        self.num_lags = num_lags                      # Number of lagged values to use as features
        self.model = linear_model.LinearRegression(   # Online linear regression model
            intercept_init=0.0,  # Initialize intercept to 0
            intercept_lr=0.0     # Freeze the intercept (no learning)
        )
        self.input_queue = deque()                    # Queue to store input vectors and time indices
        self.current_ndx = 0                          # Observation index
        self.threshold = threshold
        self.burn_in = burn_in

    def tick(self, x):
        """
        Processes the new data point.

        - Updates the time index.
        - Maintains a queue of input vectors.
        - When the future value arrives after HORIZON steps, updates the model.

        Parameters:
        - x (float): The new data point.
        """
        # The history is maintained by the parent class; no need to call tick_history()

        self.current_ndx += 1
        X_t = self.get_recent_history(n=self.num_lags)
        if len(X_t) >= self.num_lags:
            self.input_queue.append({'ndx': self.current_ndx, 'X': X_t})

        # Check if we can update the model with data from HORIZON steps ago
        while self.input_queue and self.input_queue[0]['ndx'] <= self.current_ndx - HORIZON:
            # Retrieve the input vector and its time index
            past_data = self.input_queue.popleft()
            X_past = past_data['X']

            # The target value y is the data point at time 'time_past + HORIZON'
            # Since we're at 'current_time', and 'current_time = time_past + HORIZON', we can use 'x' as y
            y = x  # Current data point is the target for the input from HORIZON steps ago

            # Prepare the feature dictionary in the form demanded by river package
            X_past_dict = {f'lag_{i}': value for i, value in enumerate(X_past)}

            # Update the model incrementally
            self.model.learn_one(X_past_dict, y)

    def predict(self, horizon=HORIZON):
        """
        Makes a prediction for HORIZON steps ahead and decides whether to buy, sell, or hold.

        Parameters:
        - horizon (int): The prediction horizon (should be HORIZON).

        Returns:
        - int: 1 for buy, -1 for sell, 0 for hold.
        """
        if self.current_ndx < self.burn_in:
            return 0   # Not enough data for model to be reliable

        # Ensure we have enough history to make a prediction
        if len(self.history) >= self.num_lags:
            # Create the input vector using the most recent 'lag' values
            X_t = list(self.history)[-self.num_lags:]
            X_t_dict = {f'lag_{i}': value for i, value in enumerate(X_t)}

            # Predict the future value HORIZON steps ahead
            y_pred = self.model.predict_one(X_t_dict)

            # Get the last known value
            last_value = X_t[-1]

            # Calculate the expected profit
            expected_profit = y_pred - last_value

            # Decide based on whether expected profit exceeds a multiple of EPSILON
            if expected_profit > self.threshold*EPSILON:
                return 1  # Buy
            elif expected_profit < -self.threshold*EPSILON:
                return -1  # Sell
            else:
                return 0  # Hold
        else:
            return 0  # Not enough history to make a prediction


### Explanation

### `tick` Method

The `tick` method processes a new incoming data point and updates the attacker's state accordingly:

- **Increment the Time Index**: The method updates `self.current_ndx` to track the current observation index.
- **Maintain Input History**: It retrieves the recent history of `num_lags` values and appends the new input vector (`X_t`) to the `input_queue`, associating it with the current index.
- **Update the Model**: The method checks if it has received enough future data (after `HORIZON` steps) to use an earlier input vector as a training example. If so, it pairs the input vector from `HORIZON` steps ago with the current data point `x` (used as the target value `y`) and incrementally updates the online regression model.

### `predict` Method

The `predict` method makes a decision based on the model’s prediction for the value `HORIZON` steps ahead:

- **Burn-in Check**: If the number of processed data points is less than the `burn_in` threshold, the model refrains from making predictions.
- **Prepare Input Features**: It checks if there's enough history to form an input vector of `num_lags` values. If there is, it prepares a dictionary of lagged values (`X_t_dict`) to be used by the model.
- **Prediction**: The method predicts the next value `HORIZON` steps ahead using the online regression model.
- **Decision Logic**: It calculates the expected profit by comparing the predicted future value with the last known value. If the expected profit exceeds a threshold (a multiple of `EPSILON`), it returns:
  - `1` (buy) if the profit is positive,
  - `-1` (sell) if the profit is negative,
  - `0` (hold) if the profit is too small to act upon.


## Run the attacker on mock data
We use `tick_and_predict` from the parent class as this will track profit and loss for us.

In [7]:
attacker = MyAttacker()               # Always reset an attacker

xs = [1,3,4,2,4,5,1,5,2,5,10]*100
for x in xs:
   y = attacker.tick_and_predict(x=x)

## Run the attacker on real data
We reset the attacker every time it encounters a new stream, but track aggregate statistics.

In [14]:
gen_gen = stream_generator_generator(category='train')    # <-- You might want to change 'train' to 'test'
attacker = MyAttacker(num_lags=2, threshold=2.0, burn_in=1000)
total_pnl = zero_pnl_summary()
for stream in gen_gen:
    for message in stream:
        attacker.tick_and_predict(x=message['x'])
    stream_pnl = attacker.pnl.summary()
    total_pnl = add_pnl_summaries(total_pnl,stream_pnl)

total_pnl.update({'profit_per_decision':total_pnl['total_profit']/total_pnl['num_resolved_decisions']})
pprint(total_pnl)

{'current_ndx': 19871117,
 'losses': 10143598,
 'num_resolved_decisions': 19788877,
 'profit_per_decision': -0.06400384343235338,
 'total_profit': -1266564.1852100987,
 'wins': 9645279}


And that's all we have. Again, you may want to refer to this [notebook](https://github.com/microprediction/endersnotebooks/blob/main/mean_reversion_attacker.ipynb) also.