# **Notebook Overview: Rule-Based Trading with Metaheuristic Rule Discovery**

This notebook presents a complete demonstration of a **rule-based trading system** applied to Ethereum price data. The objective is to illustrate how **Genetic Algorithms (GA)** can be used to *automatically discover interpretable IF–THEN trading rules* based on engineered features from **Phase 1 of the course** .

The implementation follows the conceptual framework introduced in the accompanying lecture materials on **Rule Discovery with Metaheuristics** .

---

## **Purpose of the Notebook**

The notebook serves as a educational example demonstrating:

1. **How trading rules are encoded as chromosomes** within a GA.
2. **How conditions, actions, TP/SL levels, and position sizing are represented genetically.**
3. **How a rule list is decoded and executed via a backtesting engine.**
4. **How the GA iteratively improves candidate rule sets** using fitness feedback from historical performance.

Although the demonstration uses only a subset of features, students may extend the system to incorporate the full set of engineered indicators produced in Phase 1.

---

## **Chromosome Structure and Representation**

Each chromosome represents a **complete ordered rule list**, where:

* Each rule consists of multiple **conditions**, specifying a feature, operator, and threshold.
* Each rule also encodes its **action parameters**:

  * Take-profit (TP),
  * Stop-loss (SL),
  * **Fraction of capital allocated to the trade** (position size).
* Activation flags control whether a rule or condition is included in the effective strategy.

The genetic representation enables exploration of a very large combinational search space while retaining **transparent, human-readable rule structures** once decoded.

---

## **Training Data and Features**

For this demonstration, we use only a small set of the available features to keep the example focused and computationally manageable.
However, students are encouraged to:

* Use **all available engineered features**,
* Modify or extend the chromosome structure,
* Experiment with different operator sets, thresholds, or rule depths.

The system is fully modular, and feature selection is handled implicitly via the genetic encoding.

---

## **Optimization and Fitness Function**

The GA is trained exclusively on the **training dataset**.
The objective function (fitness) is defined as:

> **Final equity obtained after backtesting the rule set**, starting with an initial capital of $1000.

This formulation incorporates:

* Profitability of trades,
* Trading frequency,
* Position sizing decisions,
* Compounding through equity updates.

After training, students must evaluate their discovered rule sets on the **separate test dataset** to assess out-of-sample performance and detect overfitting.

---

## **Student Instructions**

* You may run this notebook directly in **Google Colab**, but remember to mount your Google Drive before accessing datasets.
* All configuration parameters (e.g., population size, mutation rates, TP/SL ranges, position-size bounds) can be modified to explore different design choices.
* The implementation is modular; students may:

  * Add new features,
  * Adjust how rules are represented,
  * Implement alternative fitness functions,
  * Replace GA with other metaheuristic algorithms such as PSO or DE.

---

## **Educational Goals**

By working through this notebook, you will gain hands-on understanding of:

* How rule-based trading systems are formalized and optimized,
* How metaheuristic methods operate on structured, interpretable solutions,
* How backtesting interacts with rule logic, signal generation, and position management,
* How to evaluate trading strategies rigorously using both train and test datasets.

This forms the foundation for the **Phase 2 Rule Discovery Project**.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import List, Optional, Tuple
import random


In [4]:
# === CONFIG: rule structure, GA, and trading ===

MAX_RULES = 5          # N (max rules in the rule list)
MAX_CONDS = 5          # K (max conditions per rule)

TP_MIN, TP_MAX = 0.01, 0.10   # 1% .. 10%
SL_MIN, SL_MAX = 0.01, 0.10   # 1% .. 10%

STARTING_CAPITAL = 1000.0    # starting money for the strategy

POS_MIN_FRAC = 0.05          # 5% of capital per trade (min)
POS_MAX_FRAC = 0.50          # 50% of capital per trade (max)

POP_SIZE = 10
N_GENERATIONS = 20
TOURNAMENT_SIZE = 3
CROSSOVER_RATE = 0.8
MUTATION_RATE = 0.1   # base mutation probability per gene

RANDOM_SEED = 42
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)


In [5]:
# === 1. Load ETH data with engineered features ===

def load_eth_features(csv_path: str,
                      feature_cols: List[str],
                      close_col: str = "close"
                      ) -> Tuple[pd.DataFrame, List[str]]:
    """
    Load ETH OHLCV + engineered features.
    We keep only 'close' and selected feature columns.

    Parameters
    ----------
    csv_path : str
        Path to CSV containing at least 'close' and feature columns.
    feature_cols : list of str
        Subset of ~50 features you want to use in the demo.
    close_col : str
        Name of the close price column.

    Returns
    -------
    df : pd.DataFrame
        DataFrame indexed by time with 'close' and selected features.
    feature_cols_used : list of str
        The actual feature columns we keep (intersection of requested + available).
    """
    df = pd.read_csv(csv_path, parse_dates=True, index_col=0)
    df.columns = [c.lower() for c in df.columns]

    if close_col.lower() not in df.columns:
        raise ValueError(f"Close column '{close_col}' not found in data.")

    # Intersect requested features with available columns
    feature_cols_lower = [c.lower() for c in feature_cols]
    available_features = [c for c in feature_cols_lower if c in df.columns]

    if len(available_features) == 0:
        raise ValueError("None of the requested feature columns are present in the data.")

    cols_to_keep = [close_col.lower()] + available_features
    df = df[cols_to_keep].sort_index()

    return df, available_features


In [6]:
# === 2. Gene and Rule structures (phenotype) ===

@dataclass
class Condition:
    active: bool
    feature_idx: int
    operator: str      # "<" or ">"
    threshold: float   # numeric threshold


@dataclass
class Rule:
    active: bool
    conditions: List[Condition]
    side: str          # "BUY" or "SELL"
    tp: float          # take-profit as decimal (e.g. 0.03 for 3%)
    sl: float          # stop-loss as decimal
    size_frac: float   # fraction of current capital to allocate (0.0–1.0)

In [7]:
# === 2b. Chromosome representation (genotype) ===

@dataclass
class Chromosome:
    # Rule-level genes
    rule_active: np.ndarray      # (MAX_RULES,)  {0,1}
    side_gene: np.ndarray        # (MAX_RULES,)  {0,1}  0=BUY, 1=SELL
    tp_gene: np.ndarray          # (MAX_RULES,)  [0,1]
    sl_gene: np.ndarray          # (MAX_RULES,)  [0,1]
    size_gene: np.ndarray        # (MAX_RULES,)  [0,1]  --> position size fraction

    # Condition-level genes
    cond_active: np.ndarray      # (MAX_RULES, MAX_CONDS)  {0,1}
    feature_idx_gene: np.ndarray # (MAX_RULES, MAX_CONDS)
    operator_gene: np.ndarray    # (MAX_RULES, MAX_CONDS)  {0,1}
    threshold_gene: np.ndarray   # (MAX_RULES, MAX_CONDS)  [0,1]

In [8]:
# === 3. Mapping genes to actual TP/SL and thresholds ===

def map_tp_gene(tp_gene: float) -> float:
    """Map [0,1] gene to TP% in [TP_MIN, TP_MAX]."""
    return TP_MIN + tp_gene * (TP_MAX - TP_MIN)


def map_sl_gene(sl_gene: float) -> float:
    """Map [0,1] gene to SL% in [SL_MIN, SL_MAX]."""
    return SL_MIN + sl_gene * (SL_MAX - SL_MIN)


def map_size_gene(size_gene: float) -> float:
    """
    Map [0,1] size_gene to a fraction of capital to allocate per trade.
    For example, POS_MIN_FRAC=0.05, POS_MAX_FRAC=0.5 => 5%..50%.
    """
    return POS_MIN_FRAC + size_gene * (POS_MAX_FRAC - POS_MIN_FRAC)


def map_operator_gene(op_gene: int) -> str:
    """0 -> '<', 1 -> '>'."""
    return "<" if op_gene == 0 else ">"


def map_threshold_gene_to_value(feature_series: pd.Series, thr_gene: float) -> float:
    """
    Map a [0,1] threshold_gene to a numeric threshold using feature quantiles.

    thr_gene ~ 0.0 => low quantile (e.g. oversold RSI)
    thr_gene ~ 1.0 => high quantile (e.g. overbought RSI)
    """
    # np.nanquantile handles NaNs gracefully
    return float(np.nanquantile(feature_series.values, thr_gene))


In [9]:
# === 3b. Decode Chromosome to Rule List ===

def decode_chromosome(
    chrom: Chromosome,
    df: pd.DataFrame,
    feature_cols: List[str]
) -> List[Rule]:
    rules: List[Rule] = []

    for r in range(MAX_RULES):
        if chrom.rule_active[r] == 0:
            continue

        side = "BUY" if chrom.side_gene[r] == 0 else "SELL"
        tp = map_tp_gene(chrom.tp_gene[r])
        sl = map_sl_gene(chrom.sl_gene[r])
        size_frac = map_size_gene(chrom.size_gene[r])   # NEW

        conds: List[Condition] = []
        for c in range(MAX_CONDS):
            if chrom.cond_active[r, c] == 0:
                continue

            feat_idx = int(chrom.feature_idx_gene[r, c]) % len(feature_cols)
            feat_name = feature_cols[feat_idx]
            feat_series = df[feat_name]

            op = map_operator_gene(int(chrom.operator_gene[r, c]))
            thr_gene = float(chrom.threshold_gene[r, c])
            thr_value = map_threshold_gene_to_value(feat_series, thr_gene)

            conds.append(
                Condition(
                    active=True,
                    feature_idx=feat_idx,
                    operator=op,
                    threshold=thr_value,
                )
            )

        if len(conds) == 0:
            continue

        rules.append(
            Rule(
                active=True,
                conditions=conds,
                side=side,
                tp=tp,
                sl=sl,
                size_frac=size_frac,  # NEW
            )
        )

    return rules

In [10]:
# === 4. Rule evaluation ===

def rule_fires(rule: Rule,
               df: pd.DataFrame,
               feature_cols: List[str],
               t: int) -> bool:
    """
    Check if a rule fires at row index t.

    All active conditions must be true.
    """
    row = df.iloc[t]
    for cond in rule.conditions:
        feat_name = feature_cols[cond.feature_idx]
        x = row[feat_name]

        if np.isnan(x):
            return False  # missing feature => don't fire

        if cond.operator == "<":
            if not (x < cond.threshold):
                return False
        else:  # ">"
            if not (x > cond.threshold):
                return False

    return True


In [11]:
def backtest_rule_list(
    rules: List[Rule],
    df: pd.DataFrame,
    feature_cols: List[str],
    starting_capital: float = STARTING_CAPITAL
) -> Tuple[List[float], float]:
    """
    Backtest a rule list with capital and position sizing.

    Returns
    -------
    trade_returns : list of float
        Per-trade returns (in % terms, like before).
    final_equity : float
        Final money after all trades.
    """
    if len(rules) == 0:
        return [], starting_capital

    close = df["close"].values
    n = len(df)

    equity = starting_capital

    position = 0          # 0 = flat, +1 = long, -1 = short
    entry_price = None
    entry_rule: Optional[Rule] = None
    entry_capital = None  # amount of capital allocated to this trade

    trade_returns: List[float] = []

    for t in range(n):
        price = close[t]
        if np.isnan(price):
            continue

        if position == 0:
            # --- FLAT: look for entry ---
            for rule in rules:
                if rule_fires(rule, df, feature_cols, t):
                    # Capital to allocate = fraction of current equity
                    size_frac = rule.size_frac
                    trade_capital = equity * size_frac

                    if trade_capital <= 0:
                        break  # nothing to allocate

                    position = 1 if rule.side == "BUY" else -1
                    entry_price = price
                    entry_rule = rule
                    entry_capital = trade_capital
                    break

        else:
            # --- IN POSITION: manage trade ---
            assert entry_rule is not None and entry_price is not None and entry_capital is not None

            # Return in direction of position (+ for profit)
            ret = position * (price / entry_price - 1.0)

            tp_hit = ret >= entry_rule.tp
            sl_hit = ret <= -entry_rule.sl

            if tp_hit or sl_hit:
                # Close trade
                pnl = ret * entry_capital      # profit or loss in dollars
                equity += pnl                  # update money
                trade_returns.append(ret)

                # Reset position
                position = 0
                entry_price = None
                entry_rule = None
                entry_capital = None

    return trade_returns, equity


In [12]:
def compute_fitness(
    chrom: Chromosome,
    df: pd.DataFrame,
    feature_cols: List[str]
) -> float:
    """
    Decode chromosome -> rule list -> backtest -> final equity.

    Fitness = final money (higher is better).
    """
    rules = decode_chromosome(chrom, df, feature_cols)

    trade_returns, final_equity = backtest_rule_list(
        rules, df, feature_cols, starting_capital=STARTING_CAPITAL
    )

    # Optional: if you want to slightly penalize "do nothing" strategies:
    if len(trade_returns) == 0:
        return STARTING_CAPITAL - 1.0  # tiny penalty

    return final_equity


In [13]:
# === 6. GA: initialization ===

def random_chromosome(n_features: int) -> Chromosome:
    rule_active = np.random.randint(0, 2, size=(MAX_RULES,))
    if rule_active.sum() == 0:
        rule_active[np.random.randint(0, MAX_RULES)] = 1

    side_gene = np.random.randint(0, 2, size=(MAX_RULES,))
    tp_gene = np.random.rand(MAX_RULES)
    sl_gene = np.random.rand(MAX_RULES)
    size_gene = np.random.rand(MAX_RULES)  # NEW: position size genes in [0,1]

    cond_active = np.random.randint(0, 2, size=(MAX_RULES, MAX_CONDS))
    for r in range(MAX_RULES):
        if rule_active[r] == 1 and cond_active[r].sum() == 0:
            cond_active[r, np.random.randint(0, MAX_CONDS)] = 1

    feature_idx_gene = np.random.randint(0, n_features, size=(MAX_RULES, MAX_CONDS))
    operator_gene = np.random.randint(0, 2, size=(MAX_RULES, MAX_CONDS))
    threshold_gene = np.random.rand(MAX_RULES, MAX_CONDS)

    return Chromosome(
        rule_active=rule_active,
        side_gene=side_gene,
        tp_gene=tp_gene,
        sl_gene=sl_gene,
        size_gene=size_gene,          # NEW
        cond_active=cond_active,
        feature_idx_gene=feature_idx_gene,
        operator_gene=operator_gene,
        threshold_gene=threshold_gene,
    )


In [14]:
# === 6b. Parent selection (tournament) ===

def tournament_selection(population: List[Chromosome],
                         fitnesses: List[float],
                         k: int = TOURNAMENT_SIZE) -> Chromosome:
    """
    Tournament selection: pick k random individuals, return the best.
    """
    idxs = np.random.choice(len(population), size=k, replace=False)
    best_idx = idxs[0]
    best_fit = fitnesses[best_idx]
    for i in idxs[1:]:
        if fitnesses[i] > best_fit:
            best_fit = fitnesses[i]
            best_idx = i
    return population[best_idx]


In [None]:
# === 6c. Crossover ===

def crossover(parent1: Chromosome,
              parent2: Chromosome) -> Tuple[Chromosome, Chromosome]:
    """
    One-point crossover applied independently to each gene array.
    """
    child1 = Chromosome(
        rule_active=parent1.rule_active.copy(),
        side_gene=parent1.side_gene.copy(),
        tp_gene=parent1.tp_gene.copy(),
        sl_gene=parent1.sl_gene.copy(),
        size_gene=parent1.size_gene.copy(),
        cond_active=parent1.cond_active.copy(),
        feature_idx_gene=parent1.feature_idx_gene.copy(),
        operator_gene=parent1.operator_gene.copy(),
        threshold_gene=parent1.threshold_gene.copy(),
    )

    child2 = Chromosome(
        rule_active=parent2.rule_active.copy(),
        side_gene=parent2.side_gene.copy(),
        tp_gene=parent2.tp_gene.copy(),
        sl_gene=parent2.sl_gene.copy(),
        size_gene=parent2.size_gene.copy(),
        cond_active=parent2.cond_active.copy(),
        feature_idx_gene=parent2.feature_idx_gene.copy(),
        operator_gene=parent2.operator_gene.copy(),
        threshold_gene=parent2.threshold_gene.copy(),
    )

    def one_point(arr1, arr2):
        flat1 = arr1.reshape(-1)
        flat2 = arr2.reshape(-1)
        point = np.random.randint(1, len(flat1))
        new1 = np.concatenate([flat1[:point], flat2[point:]])
        new2 = np.concatenate([flat2[:point], flat1[point:]])
        return new1.reshape(arr1.shape), new2.reshape(arr2.shape)

    # Apply crossover with some probability
    if np.random.rand() < CROSSOVER_RATE:
        child1.rule_active, child2.rule_active = one_point(
            child1.rule_active, child2.rule_active
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.side_gene, child2.side_gene = one_point(
            child1.side_gene, child2.side_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.tp_gene, child2.tp_gene = one_point(
            child1.tp_gene, child2.tp_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.sl_gene, child2.sl_gene = one_point(
            child1.sl_gene, child2.sl_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.cond_active, child2.cond_active = one_point(
            child1.cond_active, child2.cond_active
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.feature_idx_gene, child2.feature_idx_gene = one_point(
            child1.feature_idx_gene, child2.feature_idx_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.operator_gene, child2.operator_gene = one_point(
            child1.operator_gene, child2.operator_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.threshold_gene, child2.threshold_gene = one_point(
            child1.threshold_gene, child2.threshold_gene
        )

    if np.random.rand() < CROSSOVER_RATE:
        child1.size_gene, child2.size_gene = one_point(
            child1.size_gene, child2.size_gene
        )


    return child1, child2


In [16]:
# === 6d. Mutation ===

def mutate(chrom: Chromosome, n_features: int) -> None:
    """
    In-place mutation of a chromosome.
    """
    # Rule activation bits
    for r in range(MAX_RULES):
        if np.random.rand() < MUTATION_RATE:
            chrom.rule_active[r] = 1 - chrom.rule_active[r]  # flip 0/1

    # Ensure at least one active rule
    if chrom.rule_active.sum() == 0:
        chrom.rule_active[np.random.randint(0, MAX_RULES)] = 1

    # Side (BUY/SELL)
    for r in range(MAX_RULES):
        if np.random.rand() < MUTATION_RATE:
            chrom.side_gene[r] = 1 - chrom.side_gene[r]

    # TP/SL genes (small Gaussian noise)
    for r in range(MAX_RULES):
        if np.random.rand() < MUTATION_RATE:
            chrom.tp_gene[r] = np.clip(
                chrom.tp_gene[r] + np.random.normal(scale=0.1), 0.0, 1.0
            )
        if np.random.rand() < MUTATION_RATE:
            chrom.sl_gene[r] = np.clip(
                chrom.sl_gene[r] + np.random.normal(scale=0.1), 0.0, 1.0
            )
        if np.random.rand() < MUTATION_RATE:
            chrom.size_gene[r] = np.clip(
                chrom.size_gene[r] + np.random.normal(scale=0.1), 0.0, 1.0
            )

    # Condition activation bits
    for r in range(MAX_RULES):
        for c in range(MAX_CONDS):
            if np.random.rand() < MUTATION_RATE:
                chrom.cond_active[r, c] = 1 - chrom.cond_active[r, c]

        # Ensure at least one active condition per active rule
        if chrom.rule_active[r] == 1 and chrom.cond_active[r].sum() == 0:
            chrom.cond_active[r, np.random.randint(0, MAX_CONDS)] = 1

    # Feature index, operator, threshold_gene
    for r in range(MAX_RULES):
        for c in range(MAX_CONDS):
            if np.random.rand() < MUTATION_RATE:
                chrom.feature_idx_gene[r, c] = np.random.randint(0, n_features)
            if np.random.rand() < MUTATION_RATE:
                chrom.operator_gene[r, c] = 1 - chrom.operator_gene[r, c]
            if np.random.rand() < MUTATION_RATE:
                chrom.threshold_gene[r, c] = np.clip(
                    chrom.threshold_gene[r, c] + np.random.normal(scale=0.1),
                    0.0,
                    1.0,
                )


In [17]:
# === 7. GA main loop ===

def run_ga(df: pd.DataFrame,
           feature_cols: List[str]
           ) -> Tuple[Chromosome, float]:
    """
    Run a simple GA to discover a good rule list.

    Returns best_chromosome, best_fitness.
    """
    n_features = len(feature_cols)

    # --- Initialize population ---
    population: List[Chromosome] = [
        random_chromosome(n_features) for _ in range(POP_SIZE)
    ]

    # Evaluate initial population
    fitnesses = [
        compute_fitness(chrom, df, feature_cols) for chrom in population
    ]

    best_idx = int(np.argmax(fitnesses))
    best_chrom = population[best_idx]
    best_fit = fitnesses[best_idx]

    print(f"Initial best fitness: {best_fit:.6f}")

    for gen in range(1, N_GENERATIONS + 1):
        new_population: List[Chromosome] = []

        # --- Elitism: keep the best individual ---
        new_population.append(best_chrom)

        # --- Generate rest of population ---
        while len(new_population) < POP_SIZE:
            # Select parents
            p1 = tournament_selection(population, fitnesses)
            p2 = tournament_selection(population, fitnesses)

            # Crossover
            child1, child2 = crossover(p1, p2)

            # Mutation
            mutate(child1, n_features)
            mutate(child2, n_features)

            new_population.append(child1)
            if len(new_population) < POP_SIZE:
                new_population.append(child2)

        population = new_population
        fitnesses = [
            compute_fitness(chrom, df, feature_cols) for chrom in population
        ]

        gen_best_idx = int(np.argmax(fitnesses))
        gen_best_fit = fitnesses[gen_best_idx]

        # Update global best
        if gen_best_fit > best_fit:
            best_fit = gen_best_fit
            best_chrom = population[gen_best_idx]

        print(f"Generation {gen:3d}: best fitness = {gen_best_fit:.6f}, global best = {best_fit:.6f}")

    return best_chrom, best_fit


In [18]:
# === 8. Pretty-print rule list ===

def pretty_print_rules(chrom: Chromosome,
                       df: pd.DataFrame,
                       feature_cols: List[str]) -> None:
    rules = decode_chromosome(chrom, df, feature_cols)

    if len(rules) == 0:
        print("No active rules.")
        return

    print("\n=== Discovered Rule List (ordered) ===\n")
    for i, rule in enumerate(rules, start=1):
        cond_strs = []
        for cond in rule.conditions:
            feat_name = feature_cols[cond.feature_idx]
            cond_strs.append(f"{feat_name} {cond.operator} {cond.threshold:.4f}")

        cond_part = " AND ".join(cond_strs)
        print(f"Rule {i}: IF {cond_part}")
        print(f"        THEN {rule.side} with TP = {rule.tp*100:.1f}%, SL = {rule.sl*100:.1f}%, Capital Fraction = {rule.size_frac*100:.1f}%\n")


# Main program

In [19]:
# Example usage (you adapt the paths and feature names):
FEATURE_COLS = [
    "vol_var_20",
    "cand_close_open_ratio_1",
    "rob_median_abs_dev_30",
    "trend_ema_12",
    "mom_roc_10",
    "vol_cmf_20",
    "rob_iqr_20",
    "trend_ema_26",
    "vol_mfi_14",
    "rob_kurt_30",
    "trend_sma_20",
    "rob_hurst_100",
    "vol_pvo_12_26",
    "vol_vpt_1",
    "vol_std_20",
    "cand_shadow_lower_1",
    "trend_tema_20",
    "trend_hma_21",
    "cand_range_1",
    "vol_zclose_60",
    "mom_willr_14",
    "mom_macd_12_26",
    "mom_stoch_d_14_3_3",
    "cand_up_down_vol_ratio_20",
    "rob_autocorr_20",
    "ent_return_30",
    "vol_bbw_20_2",
    "vol_logret_std_20",
    "vol_range_ratio_14",
    "cand_shadow_upper_1",
    "mom_ppo_12_26",
    "trend_wma_14",
    "vol_high_low_corr_20",
    "vol_vroc_10",
    "mom_rsi_14"
]

df, FEATURE_COLS = load_eth_features("/content/drive/MyDrive/Metaheuristic Course/Phase 2/eth_5m_with_features.csv", FEATURE_COLS)
df_test, FEATURE_COLS = load_eth_features("/content/drive/MyDrive/Metaheuristic Course/Phase 2/eth_5m_with_features_test.csv", FEATURE_COLS)

In [20]:
df.dropna(inplace=True)
df_test.dropna(inplace=True)

In [23]:
if __name__ == "__main__":
    # 1) Load your ETH data with features

    # 2) Run GA on (train) data (for demo we use the whole df)
    best_chrom, best_fit = run_ga(df, FEATURE_COLS)
    print(f"\nBest fitness found: {best_fit:.6f}")

    # 3) Show the final rule list
    best_rules = decode_chromosome(best_chrom, df, FEATURE_COLS)
    pretty_print_rules(best_chrom, df, FEATURE_COLS)

    # 4) Evaluate on TRAIN and TEST (3-month out-of-sample)
    train_returns, final_train_eq = backtest_rule_list(best_rules, df, FEATURE_COLS)
    test_returns, final_test_eq  = backtest_rule_list(best_rules, df_test,  FEATURE_COLS)

    print(f"Train final equity: {final_train_eq:.2f}, Number of positions: {len(train_returns)}")
    print(f"Test  final equity: {final_test_eq:.2f}, Number of positions: {len(test_returns)}")


Initial best fitness: 1494.411711
Generation   1: best fitness = 1494.411711, global best = 1494.411711
Generation   2: best fitness = 1494.411711, global best = 1494.411711
Generation   3: best fitness = 1494.411711, global best = 1494.411711
Generation   4: best fitness = 1495.739968, global best = 1495.739968
Generation   5: best fitness = 1521.499211, global best = 1521.499211
Generation   6: best fitness = 1521.499211, global best = 1521.499211
Generation   7: best fitness = 1521.499211, global best = 1521.499211
Generation   8: best fitness = 1521.499211, global best = 1521.499211
Generation   9: best fitness = 1521.499211, global best = 1521.499211
Generation  10: best fitness = 1521.499211, global best = 1521.499211
Generation  11: best fitness = 1521.499211, global best = 1521.499211
Generation  12: best fitness = 1521.499211, global best = 1521.499211
Generation  13: best fitness = 1521.499211, global best = 1521.499211
Generation  14: best fitness = 1571.126274, global best 