# LLM Generated Attackers
This example Notebook can be used to generate attackers for Mid-One CrunchDAO competition.
- The notebook attempts to test the attackers on unseen data and iteratively improve on the next generated attacker.
- When performance drops (based on PnL in unseen test data) a random scientific domain is injected into the prompt
to try and give the LLM some "creativity" to produce a novel attacker idea.

## Notes
- The printed output from the example attacker in `initial_prompt` is used to gather outputs used in subsequent prompts. Be sure to check
logic in `evaluate_script()` when printed output from the example attacker.
- Increasing LLM temp above 0.0 seems to only introduce bugs into the generated attackers without much improvement in "creativity".
- The example attacker uses Optuna for attacker param optimization. Subsets of total train streams are used on each iteration to
try and speed up testing time. Adjust in example attacker `train()` if desired. Swap out other optimization package in example attacker as desired.
    - Also, the example script standardizes the pnl for each tuning trial by number of messages in the subset of streams it's trained on, this way
    we get a consistent metric across optimization trials. Not needed if the training batches are consistent across trials or the entire train set
    is used for each trial.
- OpenAI is used but Adalflow can handle other LLM clients/models easily.
- Any OpenAI models weaker than gpt-4o model seem to be too buggy in generating attacker codes.

## Non-standard Libraries
- Uses [adalflow](https://github.com/SylphAI-Inc/AdalFlow) library for prompting and interacting with LLM. Next steps could be to utilize its
built-in prompt optimization methods. In this Notebook the prompting is basic conditional logic.
- Uses [optuna](https://github.com/optuna/optuna) for optimizing attacker parameters.
    - The import and optimization setup is given to the LLM in the example attacker in `initial_prompt`

## Before Running
- Setup Mid-One CrunchDAO environment and place this notebook in model folder.
- Create `.env` file and in CrunchDAO env model folder and add variable `OPENAI_API_KEY="****"` or other LLM API key.

In [None]:
# Packages required to run Notebook
!pip install dotenv numpy openai adalflow optuna

In [None]:
import logging
import subprocess
import traceback

import dotenv
import numpy as np

from adalflow import Generator, Parameter, ParameterType
from adalflow.components.model_client import OpenAIClient

# Set up logging
logging.basicConfig(level=logging.WARNING)
log = logging.getLogger(__name__)

# Load the environment variables from the .env file
dotenv.load_dotenv()

# Initialize the OpenAI client
model_client = OpenAIClient()

In [None]:
# Define the initial prompt
initial_prompt = '''
You are a brilliant mathematical statistician and python expert. I need you to come up with creative 
"Attackers" that can predict the future price movements of a financial asset. There is a good working example
below that uses the Hurst exponent to predict future price movements. Please use this as a starting point and
come up with your own creative ideas. The goal is to maximize the profit on the test dataset.

You response should only include a description and script code in this format (without the <> brackets):

<Brief description of your attacker>
```python
<script code>
```
<blank>


Rules for the script:

1. MUST be a complete script that can be run from start to finish and return the profit value on the test dataset.
2. MUST print the profit value exactly like so: print(f"Profit (Test): {round(profit)}")
4. Trains for 100 n_trial for optimization in train() method.
5. IMPORTANT: During tick() and predict() the model is not learning. The only "training"
    is done in the train() method which optimizes the parameters of the Attacker model.
6. In the objective() method is sure to predict across all streams. The stream index is the trial number.
7. Should only need to modify the Attacker class and then the corresponding params in the train() and infer() methods.
8. Ensures the script adheres to the competition's constraints, including the 20-millisecond tick() + predict() time limit.

Example template of the script:

```python
import json
import os
import typing

import numpy as np
import optuna
import pandas as pd
from midone import HORIZON, Attacker
from tqdm.auto import tqdm

import crunch


class HurstAttacker(Attacker):

    def __init__(self, window_size, upper_threshold, lower_threshold):
        super().__init__()

        # State
        self.buffer = []
        self.window_size = window_size

        # Parameters
        self.upper_threshold = upper_threshold
        self.lower_threshold = lower_threshold

    def tick(self, x: float):
        # Append the new data point to the buffer
        if not np.isnan(x):
            self.buffer.append(x)
            if len(self.buffer) > self.window_size:
                self.buffer.pop(0)  # Remove the oldest data point

    def predict(self, horizon=HORIZON) -> float:
        if len(self.buffer) < self.window_size:
            return 0  # Not enough data to make a prediction

        # Estimate the Hurst exponent
        hurst_exponent = self._estimate_hurst_exponent(np.array(self.buffer))

        if hurst_exponent > self.upper_threshold:
            return 1  # Predict upward movement
        elif hurst_exponent < self.lower_threshold:
            return -1  # Predict downward movement
        else:
            return 0  # Hold

    def _estimate_hurst_exponent(self, time_series: np.ndarray) -> float:
        """Estimate the Hurst exponent using the rescaled range (R/S) method."""

        N = len(time_series)
        if N < 20:
            return 0.5  # Not enough data

        # Calculate the mean and subtract it from the series
        mean_ts = np.mean(time_series)
        centered_ts = time_series - mean_ts

        # Calculate the cumulative deviate series
        cumulative_deviation = np.cumsum(centered_ts)

        # Range (R)
        R = np.max(cumulative_deviation) - np.min(cumulative_deviation)

        # Standard deviation (S)
        S = np.std(time_series)

        if S == 0:
            return 0.5  # Avoid division by zero

        # Rescaled range (R/S)
        rescaled_range = R / S

        # Estimate Hurst exponent
        hurst_exponent = np.log(rescaled_range) / np.log(N)

        # Ensure H is between 0 and 1
        hurst_exponent = max(min(hurst_exponent, 1), 0)

        return hurst_exponent


def attacker_performance(
    streams: typing.List[typing.Iterable[dict]],
    params: typing.Dict,
    verbose: bool = True,
):
    """
    params: dict with keys 'window_size', 'upper_threshold', 'lower_threshold'
    streams: Supplies a collection of individual streams
    """
    window_size = int(params["window_size"])
    upper_threshold = params["upper_threshold"]
    lower_threshold = params["lower_threshold"]

    total_profit = 0
    std_profit = 0

    for stream in streams:
        # Reset the attacker each stream
        attacker = HurstAttacker(
            window_size=window_size,
            upper_threshold=upper_threshold,
            lower_threshold=lower_threshold,
        )

        # Run it over the stream
        for message in stream:
            x = message["x"]
            attacker.tick_and_predict(x=x, horizon=HORIZON)

        pnl = attacker.pnl.summary()
        total_profit += pnl["total_profit"]
        # Standardize profit by stream by dividing by the number possible trades
        # in the stream. Useful when tuning parameters after each stream.
        if "current_ndx" in pnl:
            stream_std_profit = pnl["total_profit"] / pnl["current_ndx"]
            std_profit += stream_std_profit
        else:
            std_profit += 0

    if verbose:
        print(
            f"Using window_size={window_size}, upper_threshold={upper_threshold}, lower_threshold={lower_threshold}, total profit={total_profit}"
        )

    # Return total profit for maximization
    return total_profit, std_profit


def get_parameter_file_path(model_directory_path: str):
    return os.path.join(model_directory_path, "mid_one_model.json")


def train(
    streams: typing.List[typing.Iterable[dict]],
    model_directory_path: str,
):
    best_params_overall = None
    best_profit_overall = -np.inf

    # Initialize parameters
    initial_params = {
        "window_size": 79,
        "upper_threshold": 0.6187887621347552,
        "lower_threshold": 0.1910861914485038,
    }

    # Total number of trials
    n_trials = len(streams) - 1

    # Create an Optuna study
    study = optuna.create_study(direction="maximize")
    # Enqueue initial parameter values
    study.enqueue_trial(initial_params)

    # Define the objective function for Optuna
    def objective(trial):
        # Pick random stream to optimize on
        trial_index_list = np.random.choice(len(streams), 10, replace=False)
        # Only keep trial_index_list elements from streams list
        subset_streams = [streams[i] for i in trial_index_list]

        window_size = trial.suggest_int("window_size", 20, 100)
        upper_threshold = trial.suggest_float("upper_threshold", 0.5, 1.0)
        lower_threshold = trial.suggest_float("lower_threshold", 0.0, 0.5)

        # Ensure lower_threshold is less than upper_threshold
        if lower_threshold >= upper_threshold:
            return float(
                "-inf"
            )  # Return negative infinity to indicate invalid parameters

        params = {
            "window_size": window_size,
            "upper_threshold": upper_threshold,
            "lower_threshold": lower_threshold,
        }

        # Evaluate the attacker on the current stream
        profit, std_profit = attacker_performance(
            params=params, streams=subset_streams, verbose=False
        )

        return std_profit

    # Run the optimization
    study.optimize(objective, n_trials=100)

    best_params = study.best_params
    best_std_profit = study.best_value

    print(f"Best standardized profit: {best_std_profit}, best params: {best_params}")

    # Save the best parameters
    parameter_file_path = get_parameter_file_path(model_directory_path)
    with open(parameter_file_path, "w") as fd:
        json.dump(best_params, fd)
        print(f"Saved best parameters to {parameter_file_path}")

    # Check we can load it again
    try:
        with open(parameter_file_path, "r") as fd:
            params = json.load(fd)
            print(f"Loaded Best Parameters: {params}")
    except Exception as e:
        print(f"Parameter reload test failed: {e}")


def infer(
    stream: typing.Iterator[dict],
    model_directory_path: str,
):
    # Load the best parameters
    parameter_file_path = get_parameter_file_path(model_directory_path)
    with open(parameter_file_path, "r") as fd:
        params = json.load(fd)
        window_size = int(params["window_size"])
        upper_threshold = params["upper_threshold"]
        lower_threshold = params["lower_threshold"]

    # Instantiate your attacker
    attacker = HurstAttacker(
        window_size=window_size,
        upper_threshold=upper_threshold,
        lower_threshold=lower_threshold,
    )

    # NOTE: DO NOT REMOVE THIS YIELD
    # Signals to the system that your attacker is initialized and ready.
    yield

    for message in stream:
        decision = attacker.tick_and_predict(message["x"])

        # Be sure to yield, even if the decision is zero.
        yield decision


if __name__ == "__main__":
    # Load the data
    crunch = crunch.load_notebook()
    x_train, x_test = crunch.load_streams()

    # Train the model
    train(
        streams=x_train,
        model_directory_path="resources/",
    )

    # Load parameters for testing
    parameter_file_path = get_parameter_file_path("resources/")
    with open(parameter_file_path, "r") as fd:
        params = json.load(fd)

    # Evaluate on the test set
    profit, std_profit = attacker_performance(x_test, params=params, verbose=True)
    print(f"Profit (Test): {round(profit)}")

```
'''

In [None]:
# Create a Parameter object for the prompt
prompt_param = Parameter(
    data=initial_prompt,
    role_desc="Prompt for generating Python scripts for Attacker models",
    requires_opt=True,
    param_type=ParameterType.PROMPT,
)

In [None]:
# Define the generator with the prompt parameter
generator = Generator(
    model_client=model_client,
    model_kwargs={
        "model": "gpt-4o",
        "max_tokens": 16384,
        "temperature": 0.0,
    },
    prompt_kwargs={
        "input_str": prompt_param.data
    },
)

In [None]:
def evaluate_script(script_content, script_path):
    """
    Evaluate the provided Python script by running it and extracting the PnL value from the output.

    Args:
        script_content (str): The content of the Python script to evaluate.
        script_path (str): The path to save the script to.

    Returns:
        tuple: A tuple containing the PnL value, a boolean indicating if the process timed out,
            the traceback error message (if any), and the parameter info line from the script output.
    """
    # Save the script to a temporary file
    with open(script_path, "w") as file:
        file.write(script_content)

    # Initialize the return variables
    result = []
    is_timeout = False
    traceback_error = None
    param_info = None
    try:
        process = subprocess.Popen(
            ["python3", script_path],
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        # Wait for the process to complete within the specified timeout
        stdout, _ = process.communicate(timeout=60 * 20)
        result.extend(stdout.splitlines())
        for line in result:
            print(line)
    except subprocess.TimeoutExpired:
        process.kill()
        is_timeout = True
        log.warning("Process timed out and was killed.")
    except Exception as e:
        log.error(f"Error occurred: {e}")
        traceback_error = traceback.format_exc()

    # Extract PnL from the script's output
    pnl = None
    for line in result:
        if line.startswith("Profit (Test): "):
            log.debug(f"PnL line: {line}")
            pnl = float(line.split(": ")[1].strip())
        elif line.startswith("Using "):
            log.debug(f"Using line: {line}")
            param_info = line

    if pnl is None:
        raise ValueError("PnL not found in script output.")

    return pnl, is_timeout, traceback_error, param_info

In [None]:
# Domains injected into prompt to spur some new ideas from the LLM
science_domains = [
    "Astrophysics",
    "Biochemistry",
    "Quantum Mechanics",
    "Ecology",
    "Neuroscience",
    "Genetics",
    "Geology",
    "Meteorology",
    "Oceanography",
    "Robotics",
    "Artificial Intelligence",
    "Materials Science",
    "Immunology",
    "Anthropology",
    "Linguistics",
    "Paleontology",
    "Nanotechnology",
    "Biophysics",
    "Cognitive Science",
    "Environmental Science",
    "Astronomy",
    "Evolutionary Biology",
    "Chemical Engineering",
    "Nuclear Physics",
    "Forensic Science",
    "Pharmacology",
    "Hydrology",
    "Sociology",
    "Cybersecurity",
    "Agronomy",
    "Microbiology",
    "Virology",
    "Molecular Biology",
    "Cell Biology",
    "Systems Biology",
    "Developmental Biology",
    "Marine Biology",
    "Bioinformatics",
    "Biotechnology",
    "Theoretical Physics",
    "Condensed Matter Physics",
    "Plasma Physics",
    "Particle Physics",
    "Optics",
    "Acoustics",
    "Thermodynamics",
    "Classical Mechanics",
    "Statistical Mechanics",
    "Applied Mathematics",
    "Computational Science",
    "Information Theory",
    "Cryptography",
    "Data Science",
    "Machine Learning",
    "Game Theory",
    "Operations Research",
    "Control Systems",
    "Electrical Engineering",
    "Mechanical Engineering",
    "Civil Engineering",
    "Aerospace Engineering",
    "Biomedical Engineering",
    "Environmental Engineering",
    "Food Science",
    "Nutrition Science",
    "Public Health",
    "Epidemiology",
    "Veterinary Medicine",
    "Dentistry",
    "Psychology",
    "Psychiatry",
    "Social Work",
    "Political Science",
    "Economics",
    "Demography",
    "Archaeology",
    "Education Research",
    "Urban Planning",
    "Forestry",
    "Zoology",
    "Botany",
    "Entomology",
    "Ornithology",
    "Herpetology",
    "Ichthyology",
    "Climatology",
    "Glaciology",
    "Volcanology",
    "Seismology",
    "Mineralogy",
    "Petrology",
    "Stratigraphy",
    "Geophysics",
    "Geochemistry",
    "Astrochemistry",
    "Astrogeology",
    "Exoplanetology",
    "Cosmology",
    "Space Science",
    "Atmospheric Science",
    "Quantum Computing",
    "Speech Pathology",
    "Kinesiology",
    "Biomechanics",
    "Hydrogeology",
    "Geoinformatics",
    "Remote Sensing",
    "Spectroscopy",
    "Toxicology",
    "Structural Engineering",
]

In [None]:
# Optimization loop
num_iterations = 500
best_pnl = -np.inf
best_description = ""
best_python_script = ""
previous_pnl = -np.inf
previous_description = ""
previous_python_script = ""

# Initialize prompt feedback (empty)
feedback = ""
param_info = ""
for iteration in range(num_iterations):
    # Save prompt to file for debugging
    with open(f"last_prompt_temp.txt", "w") as file:
        file.write(prompt_param.data + feedback)
        
    # Generate the Python script
    output = generator(prompt_kwargs={"input_str": prompt_param.data + feedback})

    # Extract description and code
    try:
        description, code_block = output.data.strip().split("```", 1)
        python_script = code_block.strip().lstrip("python").rstrip("```").strip()
    except ValueError:
        log.error("Failed to parse the output from the generator.")
        continue

    print(f"LLM Attacker Description:\n{description}")

    # Evaluate the script to obtain PnL
    script_path = f"attacker_{iteration:03d}.py"
    is_timeout = False
    traceback_error = None
    try:
        pnl, is_timeout, traceback_error, param_info = evaluate_script(
            python_script, script_path
        )
        print(f"Iteration {iteration + 1}: PnL = {pnl}")
    except Exception as e:
        log.error(f"Iteration {iteration + 1}: Evaluation failed with error: {e}")
        pnl = None

    # Update the prompt based on PnL
    if traceback_error:
        feedback = (
            "\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            "The script execution failed. Please fix using the traceback below:\n"
            f"{traceback_error}"
        )
    elif is_timeout:
        feedback = (
            "\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            "The script execution timed out. Ensure the tick() and predict() methods combined "
            "can run in 20 milliseconds or less for each tick in the stream."
        )
    elif pnl is None:
        # Handle cases where PnL could not be evaluated
        feedback = (
            "\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            "The script execution failed. Please fix using the traceback below:\n"
            f"{traceback_error}"
        )
    elif pnl > best_pnl:
        feedback = (
            f"\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            f"This was your best model yet! The PnL was {pnl} compared to your previous best of {best_pnl}. "
            f"Try a similar strategy but with a slight tweak :) For reference, your best attacker description was: "
            f"'{description.strip()}'. Your best Python script was:\n\n```python\n{best_python_script}\n```"
        )
        # Update best PnL for model to remember
        best_pnl = pnl
        best_description = description
        best_python_script = python_script
    elif pnl <= previous_pnl:
        # Pick a random science domain to suggest
        science_domain = np.random.choice(science_domains)
        feedback = (
            f"\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            f"This PnL of {pnl} was not an improvement over the last of {previous_pnl}. "
            f"Please try applying a totally different strategy inspired by the field of {science_domain}. "
            f"For reference, your previous attacker description was: '{previous_description.strip()}'. "
            f"Your previous Python script was:\n\n```python\n{previous_python_script}\n```"
        )
    elif pnl > previous_pnl:
        feedback = (
            f"\n\n---\n\nIMPORTANT: Everything above was the initial prompt. "
            "Here is the feedback from your last attempt specifically:\n\n"
            f"This PnL of {pnl} was an improvement from the last of {previous_pnl}. "
            f"Your best so far is {best_pnl}. Keep up the good work and see if you can push the boundaries even further! "
            f"For reference, your previous attacker description was: '{previous_description.strip()}'. "
            f"Your previous Python script was:\n\n```python\n{previous_python_script}\n```"
        )

    # Append best pnl and description to results.txt
    with open("results.txt", "a") as file:
        file.write(
            f"Iteration {iteration + 1}: PnL = {pnl}\nDescription: {description}\n{param_info}\n"
        )

    # Record the previous PnL for model to remember
    if pnl is not None:
        previous_pnl = pnl
        previous_description = description
        previous_python_script = python_script