# ML Application to Quantitative Trading

In this project, merge sound financial strategies with advanced computer science skills by developing, from scratch in Python, a manual trading strategy, a custom-built market simulator, and a sophisticated machine learning-based Q-Learner with Dyna optimization. The goal was not to develop a groundbreaking trading strategy, but to demonstrate my proficiency in translating financial strategies into code and developing advanced learners without relying on external libraries. ...and to have fun!

- **Market Simulator**: I developed this simulator to emulate realistic trading conditions, enabling effective testing of both manual and AI-driven strategies under practical constraints such as transaction fees and share limits.
- **Manual Trading Strategy**: I crafted this strategy using traditional financial indicators, aiming to establish a solid foundation for assessing the enhancements that machine learning techniques could bring.
- **Machine Learning Strategy**: My self-developed Q-Learner, enhanced with Dyna functionality, was designed to dynamically adapt its trading decisions, showcasing the potential for uncovering and leveraging complex market patterns.

This project further validates my ability to excel in innovative settings, where transforming financial insights into sophisticated algorithmic solutions is crucial.

---

### Disclaimer and Copyright Notice

All functional implementations presented in this report have been independently developed by me, Franz Adam. If code or methodology is used please reference me accoridngly. I advise current students enrolled in courses that cover similar material to adhere to their academic institution's policies regarding integrity and plagiarism before continuing. 

## Problem Space and Constraints

### Market Simulation and Trading Constraints

This project necessitates a sophisticated simulation of the trading environment to effectively evaluate and demonstrate the capabilities of both manual and AI-driven trading strategies. Utilizing official stock prices for JPMorgan Chase & Co. (JPM), the simulation covers a training period from **January 1, 2008, to December 31, 2009**, and a testing period **January 1, 2010, to December 31, 2011**. The primary focus is not merely on achieving superior returns but on showcasing my expertise in crafting and implementing a comprehensive market simulation and trading algorithms from scratch.

### Financial and Trading Specifications

- **Starting Capital**: Each strategy begins with a simulated portfolio of `\$ 100,000` in cash.
- **Trading Positions**: The simulation permits positions to be either `1000 shares long` or `1000 shares short`. However, trading activity can involve up to `2000 shares` at a time, provided the net position does not exceed the constraints of being 1000 shares long or short.
- **Benchmarking**: The performance of the trading strategies is measured against a benchmark scenario where `$100,000` is used to purchase `1000 shares of JPM` and held throughout the testing period. This benchmark serves to provide a comparative baseline for the performance of the developed strategies.

### Transaction Costs

- **Commissions and Market Impact**: Each trade in the simulation incorporates transaction costs, specifically a commission fee of `$9.95` per trade and a market impact cost of `0.005`. These costs are critical in mimicking real-world trading conditions and evaluating the net profitability of the strategies.

This framework of market specifications and trading constraints is crucial for a realistic and rigorous assessment of the strategies. The implementation detail of these elements underscores my ability to develop a robust simulation environment that adheres closely to the operational realities of financial markets.


## Market Simulator Overview

### Functionality of the Market Simulator

The market simulator I developed plays a pivotal role in evaluating the trading strategies implemented in this project. It is encapsulated within a Python function, `compute_portvals()`, which takes trading orders from a DataFrame and calculates the daily value of the trading portfolio over a specified period. This function returns a DataFrame with one column, representing the portfolio's total value on each trading day, indexed by date. This output facilitates a detailed performance analysis of the trading strategies under consideration.

### Input and Output Specifications

- **Input**: The function accepts a DataFrame `df_trades` which contains columns for the date of the trade, the symbol of the traded stock, the order type (BUY or SELL), and the number of shares traded.
- **Output**: The resulting DataFrame, indexed by date, provides the total portfolio value for each day from the start date to the end date, inclusive. This value is calculated as the sum of the cash and the current value of all stock holdings.

### Working of the Market Simulator

The simulator keeps track of stock holdings and cash balance for each day, adjusting these according to the trades executed:
- **Buying Stocks**: When a BUY order is placed, the simulator increases the stock count and deducts the cost (based on the adjusted closing price) from the cash balance.
- **Selling Stocks**: Conversely, a SELL order decreases the stock count and adds the corresponding value to the cash balance.

Negative shares indicate a short position, while negative cash signifies borrowing from the broker. The function ensures accurate portfolio valuation by updating these values daily based on the executed trades and market prices.

### Code Implementation

Below is the core implementation of the `compute_portvals()` function, tailored to handle the specified trading operations efficiently:
```python
import pandas as pd
import numpy as np

def compute_portvals(
        df_trades,
        symbol="JPM",
        start_val=100000,
):
    # Define the trading period
    dates = pd.date_range(start=df_trades.index[0], end=df_trades.index[-1])
    
    # Initialize stock count and profit variables
    symbol_count, profit = 0, 0
    
    # Fetch stock prices for the given dates and symbol
    prices = get_data([symbol], dates)  # Automatically includes SPY for market comparison
    prices = prices.drop(columns='SPY')  # Focus solely on the portfolio symbol
    
    # Initialize the DataFrame to store portfolio values
    portfolio_values = pd.DataFrame(index=df_trades.index, columns=['Value'], data=np.nan)

    # Iterate over each trading day to update portfolio values
    for trade_day in df_trades.index:
        share_price = prices.at[trade_day, symbol]
        symbol_count += df_trades.at[trade_day, 'Trades']
        profit += share_price * (-1) * df_trades.at[trade_day, 'Trades']
        port_val = start_val + profit + symbol_count * share_price
        portfolio_values.loc[trade_day, 'Value'] = port_val

    return portfolio_values


### Hypothetical Optimal Trading Strategy (TOS) Over the In-Sample Period

#### Concept and Execution of TOS

To understand the upper limits of potential returns under ideal conditions, a Hypothetical Optimal Trading Strategy (TOS) was developed for the JPM stock over the in-sample period from January 1, 2008, to December 31, 2009. This strategy involved making the most profitable trade possible on each day, assuming perfect foresight of the next dayâ€™s stock price movements. Essentially, this means buying, selling, or holding based on whether the stock price would increase, decrease, or stay the same the following day.

#### Strategy Details

- **Trading Actions**: On any given day, the optimal action (buy, sell, or hold) was determined based on the next day's price:
  - **Buy**: If the next day's price was higher than the current day.
  - **Sell**: If the next day's price was lower.
  - **Hold**: If the next day's price was the same, or if the maximum amount of shares were already bought or sold in previous transactions.
- **Constraints**: The strategy adhered to a limit of holding or shorting no more than 1000 shares at a time, and only one transaction type (buy or sell) was allowed per day.
- **Performance**: The TOS yielded a cumulative return of 578.61% over the two-year period, showcasing what could theoretically be achieved with perfect market foresight.

#### Mathematical and Practical Considerations

The TOS provides an illustrative benchmark for the maximum possible returns under the given constraints but lacks practical applicability due to the unrealistic advantage of knowing future market movements. It serves as a theoretical upper boundary against which we can measure the performance of more realistic strategies like the manual and Q-Learning based strategies.

### Code Revision for TOS Implementation

Below is a revised version of the `testPolicy` function used to implement the TOS. This version enhances readability, incorporates best coding practices, and simplifies the control flow of the trading logic.

```python
import pandas as pd
import numpy as np

def testPolicy(symbol="JPM", sd=pd.Timestamp('2008-01-01'), ed=pd.Timestamp('2009-12-31'), sv=10000):
    dates = pd.date_range(start=sd, end=ed)
    prices = get_data([symbol], dates).drop(columns='SPY')
    df_trades = pd.DataFrame(0, index=prices.index, columns=['Trades'])
    stock_count = 0
    lot_size = 1000

    for i in range(len(prices) - 1):
        today_price = prices.iloc[i][symbol]
        next_day_price = prices.iloc[i + 1][symbol]
        action = np.sign(next_day_price - today_price) * lot_size

        if stock_count + action > 1000 or stock_count + action < -1000:
            action = 0  # Prevent exceeding holding constraints
        df_trades.iloc[i] = action
        stock_count += action

    return df_trades


## Manual Trading Strategy

### Overview of Technical Indicators

The manual trading strategy in this project employs a set of technical indicators as key components for generating buy and sell signals. These indicators, namely Bollinger Bands Percentage (B%), Momentum, and Relative Strength Index (RSI), are selected for their ability to effectively gauge market conditions and guide trading decisions. The emphasis of this strategy is not on discovering a groundbreaking or overly complex approach, but rather on demonstrating my ability to implement a practical and effective strategy into code, which can then be evaluated against a machine learning-based approach.

### Bollinger Bands Percentage (B%)

Bollinger Bands are a statistical chart characterizing the prices and volatility over time of a financial instrument or commodity, using a formulaic method propounding standard deviations from a moving average. The Bollinger Bands % (B%) is calculated as follows:

$ B\% = \left(\frac{\text{Price} - \text{Lower Band}}{\text{Upper Band} - \text{Lower Band}} \right) \times 100 $

where:
- **Price** is the current closing price of the stock.
- **Lower Band** is the SMA - 2 standard deviations.
- **Upper Band** is the SMA + 2 standard deviations.
- **SMA** is the Simple Moving Average over the last 20 days.

**Signals**:
- **Buy**: B% value near 0 indicates the price is at the lower band, suggesting an oversold market condition.
- **Sell**: B% value near 100 indicates the price is at the upper band, suggesting an overbought condition.

### Momentum

Momentum is a measure of the rate of change in stock prices, identifying the strength of price movements. It is calculated using the following equation:

$\text{Momentum} = \left(\frac{\text{Current Price}}{\text{Price of N days ago}} \right) - 1$

**Signals**:
- **Buy**: A positive crossing above a specified threshold suggests that an upward price trend is likely to continue.
- **Sell**: A negative crossing below a specified threshold suggests a continuing downward trend.

### Relative Strength Index (RSI)

The RSI is a momentum oscillator that measures the speed and change of price movements. It is calculated using the following formula:

$\text{RSI} = 100 - \left( \frac{100}{1 + \frac{\text{Average Gain}}{\text{Average Loss}}} \right)$

where:
- **Average Gain** and **Average Loss** are the average of the gains and losses over the last 14 days, respectively.

**Signals**:
- **Buy**: RSI below 30 suggests oversold conditions, potentially indicating an upcoming bullish reversal.
- **Sell**: RSI above 70 indicates overbought conditions, suggesting a possible bearish reversal.

### Strategy Implementation

The implementation of this strategy in code involves a systematic application of these indicators to generate trading signals. This not only demonstrates the capability to translate financial analysis into actionable trading decisions but also sets the groundwork for a comparative evaluation with an AI-driven trading strategy. The goal is to showcase the practical application of these indicators within a structured trading strategy, emphasizing the ability to implement and test such strategies rigorously.

## Optimization and Parameter Tuning in the Manual Strategy

### Overview of Optimization Process

For the manual trading strategy, fine-tuning the parameters of the technical indicators is crucial for maximizing performance. The parameters for Bollinger Bands, Momentum, and RSI were systematically varied to identify the configuration that yields the best returns. This process involved exploring a range of values for each parameter and assessing their impact on the strategy's effectiveness.

### Parameter Values Explored

- bb_lookback_options = [30, 25, 20, 15]
- bb_low_options = [5, 10, 15, 20]
- bb_up_options = [85, 90, 95, 100]
- mom_lookback_options = [4, 8, 12]
- mom_low_options = [-10, -5, 0]
- mom_up_options = [0, 5, 10, 15]
- rsi_lookback_options = [4, 8, 12]
- rsi_buy_up_options = [110, 105, 100, 90, 85]
- rsi_buy_low_options = [10, 20, 30, 40]

These values were chosen based on preliminary analysis, which suggested they encompass a reasonable range that could capture different market dynamics. This relative small range of parameters creates 138,240 possible parameter value combinations.

### Utilizing Python's `concurrent.futures` for Parallel Execution

To enhance the efficiency of the optimization process, I leveraged Python's `concurrent.futures` module, which simplifies the execution of function calls asynchronously. This module provides a high-level interface for asynchronously executing callables using pools of threads or processes. By using thread pools via `ThreadPoolExecutor` or process pools using `ProcessPoolExecutor`, it allows multiple function calls to be executed concurrently. This is particularly beneficial for computationally intensive tasks such as the parameter tuning phase in trading strategy optimization, where multiple configurations need to be tested in parallel to speed up the finding of an optimal set of parameters.

### Code Implementation for Multi-threading Optimization

Below is the code snippet from the project that demonstrates the use of `concurrent.futures` for optimizing the trading strategy parameters. This part of the code handles the submission of different parameter combinations to be processed in parallel, collecting the results, and identifying the combination that produces the highest portfolio value.

Here is the improved code snippet with best practices applied, such as meaningful variable names, using a context manager for handling the process pool, and better documentation. The provided code is designed for clarity, maintainability, and efficiency, particularly in the context of Python's concurrency features.

```python
from concurrent.futures import ProcessPoolExecutor
import itertools

def optimize_strategy_parallel(symbol, start_date, end_date, fetch_prices, compute_portfolio_values, parameter_combinations):
    """
    Optimizes trading strategy parameters in parallel using process pooling.

    Args:
        symbol (str): Stock symbol to optimize for.
        start_date (datetime): Start date for data retrieval.
        end_date (datetime): End date for data retrieval.
        fetch_prices (callable): Function to fetch prices given a symbol and date range.
        compute_portfolio_values (callable): Function to compute portfolio values given trades and prices.
        parameter_combinations (list): List of parameter tuples to explore.

    Returns:
        tuple: Best parameter set and its corresponding portfolio value.
    """
    prices = fetch_prices(symbol=symbol, sd=start_date, ed=end_date)
    best_portfolio_value = float('-inf')
    best_parameters = None

    with ProcessPoolExecutor() as executor:
        futures = [executor.submit(evaluate_parameters, params, prices) for params in parameter_combinations]
        for future in concurrent.futures.as_completed(futures):
            portfolio_value, params = future.result()
            if portfolio_value > best_portfolio_value:
                best_portfolio_value = portfolio_value
                best_parameters = params

    return best_parameters, best_portfolio_value

def evaluate_parameters(parameters, prices):
    """
    Evaluates a single set of trading parameters.

    Args:
        parameters (tuple): Tuple containing parameters for Bollinger Bands, Momentum, and RSI.
        prices (DataFrame): DataFrame containing price data.

    Returns:
        tuple: Last value of the portfolio and the parameters used.
    """
    bb_lookback, bb_low, bb_up, mom_lookback, mom_low, mom_up, rsi_lookback, rsi_buy_up, rsi_buy_low = parameters
    df_trades = apply_trading_strategy(prices, bb_lookback, bb_low, bb_up, mom_lookback, mom_low, mom_up, rsi_lookback, rsi_buy_up, rsi_buy_low)
    port_vals = compute_portfolio_values(df_trades, prices)
    return port_vals.iloc[-1], parameters  # Return the last value of the portfolio and the parameters


### ML-Based Trading Strategy: Q-Learning and Dyna-Q

#### Introduction to Q-Learning

In this project, I apply a reinforcement learning strategy using Q-Learning to the same technical indicators used in the manual strategy, with the aim of teaching a machine to learn a policy for trading. Q-Learning is a model-free reinforcement learning algorithm that learns the value of an action in a particular state by using a Q-value, which is essentially a measure of the expected future rewards for a given action taken in a given state. It uses the Bellman Equation to iteratively update Q-values based on the equation:

$Q(s, a) = Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$

where:
- $ s $  is the current state,
- $  a $  is the current action,
- $  r $  is the reward received after performing the action,
- $  s' $  is the new state after the action,
- $  \alpha $  is the learning rate,
- $  \gamma $  is the discount factor,
- $  \max_{a'} Q(s', a') $  is the maximum predicted reward achievable in the new state.

The Q-value for a state-action pair represents the expected future rewards that can be achieved by starting in that state, taking that action, and following the optimal policy thereafter. 

#### Dyna-Q Integration

Dyna-Q integrates direct learning from experience (real interactions with the environment) and planning (simulations of the environment using a learned model). Each time a real action is taken and the agent transitions from state $ s$  to $ s'$ , receiving reward $  r $ , the model is updated. Then, the Dyna-Q algorithm revisits previous experiences, randomly samples them, and updates Q-values based on simulated experiences. This enhances learning efficiency by leveraging past experiences multiple times.

#### Implementation Overview

For this project, I implemented a Q-Learner from scratch which incorporates both Q-learning and Dyna-Q mechanisms. The key components of this Q-Learner include:

- **Action Space**: Three possible actions - Buy, Sell, or Hold.
- **State Space**: Discretized states based on multiple indicators such as price relative to moving averages, momentum, etc.
- **Reward Function**: Designed to encourage profit maximization and cost minimization, including factors like transaction costs.
- **Parameters**: Includes learning rate $ ( \alpha )$ , discount factor $ ( \gamma )$ , initial random action probability $ (rar)$ , and decay rate of randomness $ (radr)$ .

### Code Implementation with Best Practices

Below is the improved code snippet of the Q-Learner class, applying best coding practices for clarity and maintainability.

```python
import numpy as np
import random as rand

class QLearner(object):
    """
    Implementation of a Q-Learner with Dyna-Q functionality.
    
    Attributes:
        num_states (int): Total number of states.
        num_actions (int): Total number of actions.
        alpha (float): Learning rate.
        gamma (float): Discount factor.
        rar (float): Random action rate.
        radr (float): Decay rate of the random action rate.
        dyna (int): Number of Dyna-Q simulated experience updates per real experience.
        verbose (bool): Enables verbose mode for debugging.
    """

    def __init__(self, num_states=1728, num_actions=3, alpha=0.3, gamma=0.9,
                 rar=0.65, radr=0.99, dyna=0, verbose=False):
        self.num_states = num_states
        self.num_actions = num_actions
        self.alpha = alpha
        self.gamma = gamma
        self.rar = rar
        self.radr = radr
        self.dyna = dyna
        self.verbose = verbose
        self.q_table = np.zeros((num_states, num_actions))
        self.model = {}  # Used for Dyna-Q

    def querysetstate(self, s):
        """
        Updates the state without updating the Q-table and decides action.
        
        Args:
            s (int): The new state index.
        
        Returns:
            int: The selected action index.
        """
        self.s = s
        action = rand.randint(0, self.num_actions - 1) if rand.random() <= self.rar else np.argmax(self.q_table[s])
        if self.verbose:
            print(f"Querysetstate: s = {s}, a = {action}")
        return action

    Below is a refined version of the `query` method within the Q-Learner class. This version is structured for clarity, with added comments to enhance understanding and readability. This method handles the updating of the Q-table based on the state transition and determining the next action to take. It also incorporates elements of the Dyna-Q approach to simulate learning from past experiences.


    def query(self, s_prime, r):
        """
        Updates the Q-table based on the transition to a new state and the reward received, then returns an action.
    
        Args:
            s_prime (int): The new state index after taking an action.
            r (float): The immediate reward received after transitioning to the new state.
    
        Returns:
            int: The index of the selected action.
        """

        # 1) Update Q Table using the formula: Q(s, a) = (1 - alpha) * Q(s, a) + alpha * (reward + gamma * max(Q(s', all actions)))
        a_prev = self.a
        s_prev = self.s
        max_future_reward = np.max(self.q_table[s_prime])
        improved_estimate = (r + self.gamma * max_future_reward)

        self.q_table[s_prev][a_prev] = (1 - self.alpha) * self.q_table[s_prev][a_prev] + self.alpha * improved_estimate

        # Dyna-Q integration: Simulate learning from past experiences stored in the model
        self.model[(self.s, self.a)] = (s_prime, r)
        if self.dyna > 0 and len(self.model) > 0:
        # Perform simulated updates from randomly sampled past experiences
            sampled_keys = np.random.choice(list(self.model.keys()), min(self.dyna, len(self.model)), replace=True)
            for s, a in sampled_keys:
                s_prime, r = self.model[(s, a)]
                max_future_reward = np.max(self.q_table[s_prime])
                self.q_table[s][a] += self.alpha * (r + self.gamma * max_future_reward - self.q_table[s][a])

        # 2) Determine action: either random based on the rar or the best possible based on the Q-table
        if rand.uniform(0.0, 1.0) <= self.rar:
            action = rand.randint(0, self.num_actions - 1)
        else:
            action = np.argmax(self.q_table[s_prime])

        # 3) Update state and action for the next step
        self.s = s_prime
        self.a = action
        self.rar *= self.radr  # Update the probability of random action

        if self.verbose:
            print(f"Updated Q-Table State: s = {s_prime}, a = {action}, r = {r}")

        return action


### Overview of In-Sample Testing

To assess the effectiveness of the Q-Learning based strategy and the manual strategy, we conducted a performance evaluation using historical data from the in-sample period, which spans from January 1, 2008, to December 31, 2009. The primary trading symbol for this test was JPM (JPMorgan Chase & Co.), with a starting portfolio value set at $\$ 100,000$. Both strategies were subjected to standardized transaction costs, including a commission of $ \$ 9.95$ per trade and a market impact of 0.005 per share traded.

#### Experiment Setup

- **Data Range**: January 1, 2008, to December 31, 2009.
- **Trading Symbol**: JPM.
- **Starting Portfolio Value**:  100,000.
- **Transaction Costs**: Commission of  9.95 per trade and an impact of 0.005.
- **Trading Conditions**: Both strategies could trade up to 1000 shares per transaction, with the capability to hold long, short, or neutral positions.

The performance metrics analyzed included cumulative return, volatility (measured as the standard deviation of daily returns), and maximum drawdown over the period. These metrics help us understand not only the return on investment but also the risk involved in each strategy.

#### Performance Outcomes and Initial Interpretations

- **Manual Strategy**: Achieved a cumulative return of approximately 89%, indicating strong performance against the market conditions of the period.
- **Strategy Learner (Q-Learner)**: Managed to achieve an even higher cumulative return of approximately 125%, showcasing the potential of machine learning in optimizing trading strategies beyond traditional methods.
- **Benchmark Performance**: The benchmark, which involved buying and holding JPM stock from the first day, resulted in a cumulative return of around -8%. This highlights the superior performance of both the manual and AI-driven strategies over a simple buy-and-hold approach.

![In-Sample Trading Strategy Performance](/images/perf_in.png)


### Out-of-Sample Performance Evaluation

#### Overview of Out-of-Sample Testing

After tuning our strategies using in-sample data, we progress to evaluating their performance using completely unseen out-of-sample data. This test is crucial as it provides insights into how the strategies might perform under real market conditions, where the future is not known and cannot be optimized for in advance. For this analysis, the trading symbol remains JPM (JPMorgan Chase & Co.), and the period extends from January 1, 2010, to December 31, 2011.

#### Performance Metrics

- **Starting Portfolio Value**: \$ 100,000
- **Transaction Costs**: Standardized for both strategies with a commission of \$ 9.95 per trade and an impact of 0.005 per share traded.
- **Trading Conditions**: Up to 1000 shares were allowed per transaction, with positions being long, short, or neutral.

The primary metrics for evaluation are cumulative return and volatility (standard deviation of daily returns). These indicators help us gauge not only the profitability of the strategies but also their risk profiles during the test period.

#### Outcomes and Interpretation

- **Manual Strategy**: Achieved a positive cumulative return of approximately 18% over the two-year out-of-sample period. This performance is significant as it demonstrates the strategy's effectiveness even under varied market conditions.
- **Q-Learning Strategy**: Interestingly, the Q-Learner, which was trained using in-sample data, resulted in no trading activity for the first year of the out-of-sample period. This behavior might suggest that the policy learned was overly conservative or possibly too finely tuned to the in-sample data characteristics, leading to a hesitance or inability to trigger trades under slightly different market conditions. Additionally, it was not able to achieve a positive return over the out-sample period as seen below. 

![In-Sample Trading Strategy Performance](/images/perf_out.png)


### Conclusion: Insights and Forward-Looking Statements

This project was an exploration into implementing and comparing different trading strategies with a focus on learning and adaptation rather than outright profitability. The manual strategy leveraged well-established financial indicators to navigate market conditions, while the ML-based strategy, utilizing Q-Learning, aimed to adapt and learn from evolving data patterns.

The outcomes from these experiments highlighted several key points:
- The manual strategy, though simple, proved effective in out-of-sample tests, reinforcing the value of traditional trading methods under certain conditions.
- The Q-Learning approach, while promising in theory, faced practical challenges such as overfitting, underscoring the need for robust machine learning techniques that generalize well across different market scenarios.

### Transitioning to Future Optimizations

Given the findings and challenges observed, particularly with the Q-Learning strategy, there are several areas of potential optimization and research that could enhance the effectiveness and adaptability of trading algorithms:

1. **Enhanced State Representation**: Integrating more complex state representations that capture a wider array of market dynamics could help in making more informed decisions.
2. **Reward Function Enhancement**: To address the issue of the Q-Learner's inaction during crucial trading periods, refining the reward function could provide more dynamic feedback to the agent. Implementing a **risk-adjusted return measure**, such as the Sharpe ratio, as part of the reward function can encourage not just profit maximization but also optimal risk management. 
3. **Hybrid Models**: Combining Q-Learning with other machine learning approaches like deep neural networks could improve prediction accuracy and decision-making robustness.
4. **Risk Management Integration**: Refining the reward structure to include risk management parameters would likely yield strategies that are not only profitable but also resilient.

These optimizations could potentially bridge the gap between theoretical model performance and practical trading efficacy, leading to more robust trading strategies that can adapt to and capitalize on market complexities.