# Crypto Momentum / Trend

This notebook includes analysis on momentum effects in Cryptocurrencies. We analyze both cross-sectional and time-series momentum, and look for possibile methods of monetizing these effects.

Our dataset comes from trading history for pairs on the Kraken exchange ([link](https://support.kraken.com/hc/en-us/articles/360047543791-Downloadable-historical-market-data-time-and-sales-)). We further constrain our analysis to USD pairs only. Kraken is our primary trading exchange due to geographical restrictions (US...zzz) and Kraken's superior fee structure as compared to other US exchanges like Coinbase.

## Load Data

In [None]:
import argparse
import pandas as pd
import plotly.express as px
from scipy import stats
import numpy as np
from datetime import datetime
import pytz
from pathlib import Path
from numba import njit, jit

from analysis.analysis import analysis

from simulation.vbt import vbt
from simulation.simulation import simulation
from simulation.backtest import backtest, backtest_crypto
from signal_generation.signal_generation import (
    create_analysis_signals,
    create_trading_signals,
)
from signal_generation.constants import SignalType
from signal_generation.rohrbach import create_rohrbach_signals
from signal_generation.common import (
    ema,
    volatility_ema,
    bins,
    volatility,
    future_volatility,
)
from position_generation.benchmark import generate_benchmark_btc
from position_generation.constants import (
    SCALED_SIGNAL_COL,
    NUM_LONG_ASSETS_COL,
    NUM_SHORT_ASSETS_COL,
    NUM_KEPT_ASSETS_COL,
    NUM_OPEN_LONG_POSITIONS_COL,
    NUM_OPEN_SHORT_POSITIONS_COL,
    NUM_OPEN_POSITIONS_COL,
)
from position_generation.utils import nonempty_positions
from position_generation.generate_positions import generate_positions, compute_idm
from position_generation.utils import Direction
from data.utils import load_ohlc_to_hourly_filtered, load_ohlc_to_daily_filtered
from core.utils import filter_universe
from core.constants import (
    TIMESTAMP_COL,
    TICKER_COL,
    RETURNS_COL,
    POSITION_COL,
    PAST_7D_RETURNS_COL,
    in_universe_excl_stablecoins,
)

np.set_printoptions(linewidth=1000)
pd.set_option("display.width", 2000)
pd.set_option("display.precision", 3)
pd.set_option("display.float_format", "{:.3f}".format)

In [None]:
input_path = "/home/elo/data/usd_ohlc_fixed.csv"
input_freq = "1h"
start_date = "2014/01/01"
end_date = "2023/12/31"
tz = pytz.timezone("UTC")
start_date = tz.localize(datetime.strptime(start_date.replace("/", "-"), "%Y-%m-%d"))
end_date = tz.localize(datetime.strptime(end_date.replace("/", "-"), "%Y-%m-%d"))

# Parse data
df_daily = load_ohlc_to_daily_filtered(
    input_path, input_freq=input_freq, tz=tz, whitelist_fn=in_universe_excl_stablecoins
)

# Create signals
df_analysis = create_analysis_signals(df_daily, periods_per_day=1)

# Validate dates
data_start = df_analysis["timestamp"].min()
if start_date < data_start:
    print(f"Input start_date is before start of data! Setting to {data_start}")
    start_date = data_start
data_end = df_analysis["timestamp"].max()
if end_date > data_end:
    print(f"Input end_date is after end of data! Setting to {data_end}")
    end_date = data_end

# Exclude 2013 and older
df_analysis = df_analysis.loc[df_analysis["timestamp"] >= start_date]

## Data Analysis

### Introductory Analysis - Past Returns vs Future Returns

Let's try the simplest, dumbest thing first. Look for a relationship between past (30d) returns and future (14-28d) returns.

In [None]:
analysis(
    df_analysis,
    feature="15d_log_returns",
    target="next_1d_log_returns",
    bin_feature="15d_log_returns_decile",
)

We see some slight evidence of momentum effects using 15 day past returns.

The effect seems to be quite noisy year to year. Is there a relationship between momentum and whether the market was going up/down? Let's use BTC as a proxy for the market and look at returns per year.

In [None]:
# Get years where BTC returns were positive
df_tmp = (
    df_analysis.loc[df_analysis["ticker"] == "BTC/USD"]
    .groupby(["year", "ticker"])
    .agg(
        {
            "returns": "sum",
        }
    )
    .reset_index(0)
)
df_tmp["up_year"] = df_tmp["returns"] > 0
df_tmp

In [None]:
# Analyze momentum effects for up years
up_years = [2015, 2016, 2017, 2019, 2020, 2021, 2023]

feature = "15d_log_returns"
bin_feature = "15d_log_returns_decile"
target = "next_1d_log_returns"

# Plot de-meaned future returns over 30d return deciles per year
for year in up_years:
    df_tmp = (
        df_analysis.loc[df_analysis["year"] == year]
        .groupby(
            [
                bin_feature,
            ]
        )
        .agg({target: "mean"})
        .reset_index()
    )
    # De-mean
    df_tmp[target] = df_tmp[target] - df_tmp[target].mean()
    # Plot
    fig = px.bar(
        df_tmp,
        x=bin_feature,
        y=target,
        title=year,
    )
    fig.show()

There are still some exceptions (2015, 2020), but overall the effect seems to persist. The exceptional years support the notion that harnessing this effect is somewhat shitty, and so is likely to persist.

### Effects of Volume on Relationship

Our hypothesis for why momentum exists includes both behavioral reasons (FOMO, flows yolo-ing into coins going up) as well as limits to arbitrage (kinda risky/shitty to take the other side of such a volatile trade). If this is true, we would expect to see the relationship strengthen for the really shitty shitcoins (where limits to arbitrage are greater due to limited capacity).

In [None]:
from scipy import stats

df_effect_vs_volume = pd.DataFrame(
    {
        "max_dollar_volume": pd.Series(dtype="int"),
        "slope": pd.Series(dtype="float"),
        "r2": pd.Series(dtype="float"),
        "num_data_points": pd.Series(dtype="int"),
    }
)
feature = "30d_log_returns"
target = "next_1d_log_returns"
bin_feature = "30d_log_returns_decile"
for dollar_volume in [
    np.inf,
    100e6,
    10e6,
    5e6,
    1e6,
    500000,
    400000,
    300000,
    200000,
    100000,
    50000,
    40000,
    30000,
    20000,
    10000,
    5000,
    2000,
    1000,
    100,
]:
    volume_mask = (df_analysis["dollar_volume"] <= dollar_volume) & (
        df_analysis["dollar_volume"] > 0
    )
    df_tmp = df_analysis.loc[volume_mask].dropna()
    # Linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        df_tmp[feature], df_tmp[target]
    )
    num_data_points = len(df_tmp)
    df_effect_vs_volume.loc[len(df_effect_vs_volume.index)] = [
        dollar_volume,
        slope,
        r_value**2,
        num_data_points,
    ]

df_effect_vs_volume

I don't see this supported in the data at all, actually.

#### Prior analysis, back when shitcoins seemed to be a better opportunity

A follow-up question: is this driven by some shitcoin outliers? Let's plot the scatterplot for the low dollar volume datapoints to see.

In [None]:
feature = "15d_log_returns"
target = "next_1d_log_returns"
bin_feature = "15d_log_returns_decile"
volume_mask = (df_analysis["dollar_volume"] <= 100000) & (
    df_analysis["dollar_volume"] > 5000
)
analysis(
    df_analysis.loc[volume_mask],
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

In [None]:
# Print outliers
df_analysis.dropna().loc[volume_mask].sort_values(
    by="next_1d_log_returns", ascending=False
)[
    [
        "ticker",
        "timestamp",
        "volume",
        "dollar_volume",
        "30d_returns",
        "15d_returns",
        "next_1d_log_returns",
    ]
].head(
    50
)

In [None]:
# Filter outliers from analysis
feature = "15d_log_returns"
target = "next_1d_log_returns"
bin_feature = "15d_log_returns_decile"
analysis(
    df_analysis.loc[
        volume_mask
        & (np.abs(df_analysis[target]) <= 2.0)
        & (np.abs(df_analysis[feature]) <= 2.0)
    ],
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

Even after we filter out the egregiously high returns (both historical and future), the relationship seems to hold up.

At this point, we can be reasonably confident of the following:
- Momentum effects exist in cryptocurrency markets
- (Harnessable) Effects do not seem to strengthen inversely proportional to daily traded volume

### Trend Overextension

There's a phenomenon known as "trend overextension" which describes the fact that very large signals of trend may actually predict reversion in future returns (due to the trend being overextended, more capital is willing to take the other side of the trade to bring prices back down to "fair value").

Do we see this in our cryptocurrency data?

In [None]:
# Create centiles
bin_feature = "15d_log_returns_centile"
df_analysis[bin_feature] = bins(df_analysis, column="30d_log_returns", num_bins=50)
# Remove outliers
df_analysis_filtered = df_analysis.loc[df_analysis["next_1d_log_returns"] < 2.0]
target = "next_1d_log_returns"

# All Data
df_tmp = df_analysis_filtered.groupby([bin_feature]).agg({target: "mean"}).reset_index()
fig = px.bar(df_tmp, x=bin_feature, y=target, title="All Data")
fig.show()

# Low Volume
df_tmp = (
    df_analysis_filtered.loc[df_analysis["dollar_volume"] <= 10000]
    .groupby([bin_feature])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(df_tmp, x=bin_feature, y=target, title="Dollar Volume <= $10,000")
fig.show()

# High Volume
df_tmp = (
    df_analysis_filtered.loc[df_analysis["dollar_volume"] >= 1000000]
    .groupby([bin_feature])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(df_tmp, x=bin_feature, y=target, title="Dollar Volume >= $1,000,000")
fig.show()

I can't really say I see evidence of overextension to be honest...I thought I had seen it in a previous analysis but I can't really reproduce. I mean, there's maybe some evidence beyond the top ~10% but then it shoots back up in the last 4%.

There's maybe some evidence for it in the low volume tickers.

This is relevant for deciding which activation function to use (sigmoid vs $x * exp(-x^2)$).

## Trend Signal (Rohrbach et. al 2017)

Rohrbach and coauthors published a 2017 paper titled "Momentum and trend following trading strategies for currencies and bitcoin" ([link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2949379)) in which they describe a formula for a trend signal. Let's take a look at how well this predicts returns.

In [None]:
# The Rohrbach signal is generated under "trend_signal"
df_analysis = create_rohrbach_signals(df_analysis, periods_per_day=1)

feature = "trend_signal"
target = "next_1d_log_returns"
bin_feature = "trend_decile"
analysis(
    df_analysis,
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

The shape of the decile (cross-sectional) bar plots look roughly the same as the plots using 30d returns, which is a good sign. In the scatterplot, the r_value is only marginally higher but the plot's outliers look a lot better (more up and to the right).

How about the volume-filtered analysis?

In [None]:
df_trend_vs_volume = pd.DataFrame(
    {
        "max_dollar_volume": pd.Series(dtype="int"),
        "slope": pd.Series(dtype="float"),
        "r2": pd.Series(dtype="float"),
        "num_data_points": pd.Series(dtype="int"),
    }
)
feature = "trend_signal"
target = "next_1d_log_returns"
bin_feature = "trend_decile"
for dollar_volume in [
    np.inf,
    100e6,
    10e6,
    5e6,
    1e6,
    500000,
    400000,
    300000,
    200000,
    100000,
    50000,
    40000,
    30000,
    20000,
    10000,
    5000,
    2000,
    1000,
    100,
]:
    volume_mask = (df_analysis["dollar_volume"] <= dollar_volume) & (
        df_analysis["dollar_volume"] > 0
    )
    df_tmp = df_analysis.loc[volume_mask].dropna()
    # Linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        df_tmp[feature], df_tmp[target]
    )
    num_data_points = len(df_tmp)
    df_trend_vs_volume.loc[len(df_trend_vs_volume.index)] = [
        dollar_volume,
        slope,
        r_value**2,
        num_data_points,
    ]

df_trend_vs_volume

In [None]:
feature = "trend_signal"
target = "next_1d_log_returns"
bin_feature = "trend_decile"
volume_mask = (df_analysis["dollar_volume"] <= 100000) & (
    df_analysis["dollar_volume"] > 10000
)
analysis(
    df_analysis.loc[volume_mask],
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

In [None]:
# Filter outliers from analysis
feature = "trend_signal"
target = "next_1d_log_returns"
bin_feature = "trend_decile"
analysis(
    df_analysis.loc[
        volume_mask
        & (np.abs(df_analysis[target]) <= 2.0)
        & (np.abs(df_analysis[feature]) <= 2.0)
    ],
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

As before, effects are weaker as volume decreases (and go negative around \$10k daily volume). Rough shape of relationship persist even after removing egregious outliers.

What happens with the higher volume data?

In [None]:
# Look at higher volume data only
feature = "trend_signal"
target = "next_1d_log_returns"
bin_feature = "trend_decile"
volume_mask = (df_analysis["dollar_volume"] >= 1e6) & (df_analysis["dollar_volume"] > 0)
analysis(
    df_analysis.loc[volume_mask],
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

Returns are more strongly/positively correlated with the top decile, as expected.

At this point, we can probably state the following:
- Trend effects are weaker on average in lower volume tickers (and actualy go negative beyond \$10k daily volume).
- We want to include as many "mature" assets as possible, excluding anything below \$10k to \$1M daily volume.

### Sigmoid Activation Function

In [None]:
df_analysis["trend_sigmoid_decile"] = bins(
    df_analysis, column="trend_signal_sigmoid", num_bins=10
)

# The Rohrbach signal w/ sigmoid activation is already generated under "trend_signal_sigmoid"
feature = "trend_signal_sigmoid"
target = "next_1d_log_returns"
bin_feature = "trend_sigmoid_decile"
analysis(
    df_analysis,
    feature=feature,
    target=target,
    bin_feature=bin_feature,
)

Sigmoid seems more or less the same as exponential, at least in data analysis. Correlation and slope are both roughly the same (as expected, both are just transformations of the same signal).

Does the answer change as a function of volume?

In [None]:
df_trend_vs_volume = pd.DataFrame(
    {
        "max_dollar_volume": pd.Series(dtype="int"),
        "slope_exponential": pd.Series(dtype="float"),
        "r2_exponential": pd.Series(dtype="float"),
        "slope_sigmoid": pd.Series(dtype="float"),
        "r2_sigmoid": pd.Series(dtype="float"),
        "num_data_points": pd.Series(dtype="int"),
    }
)
feature_exponential = "trend_signal"
feature_sigmoid = "trend_signal_sigmoid"
target = "next_1d_log_returns"
for dollar_volume in [
    np.inf,
    100e6,
    10e6,
    1e6,
    500000,
    400000,
    300000,
    200000,
    100000,
    50000,
    40000,
    30000,
    20000,
    10000,
    5000,
    2000,
    1000,
    100,
]:
    volume_mask = (df_analysis["dollar_volume"] <= dollar_volume) & (
        df_analysis["dollar_volume"] > 0
    )
    df_tmp = df_analysis.loc[volume_mask].dropna()
    # Linear regression
    (
        slope_exponential,
        intercept,
        r_value_exponential,
        p_value,
        std_err,
    ) = stats.linregress(df_tmp[feature_exponential], df_tmp[target])
    slope_sigmoid, intercept, r_value_sigmoid, p_value, std_err = stats.linregress(
        df_tmp[feature_sigmoid], df_tmp[target]
    )
    num_data_points = len(df_tmp)
    df_trend_vs_volume.loc[len(df_trend_vs_volume.index)] = [
        dollar_volume,
        slope_exponential,
        r_value_exponential**2,
        slope_sigmoid,
        r_value_sigmoid**2,
        num_data_points,
    ]

df_trend_vs_volume

In [None]:
# The Rohrbach signal w/ sigmoid activation is already generated under "trend_signal_sigmoid"
target = "next_1d_log_returns"
bin_feature_exponential = "trend_decile"
bin_feature_sigmoid = "trend_sigmoid_decile"

# All Data
df_tmp = (
    df_analysis.groupby([bin_feature_exponential]).agg({target: "mean"}).reset_index()
)
fig = px.bar(
    df_tmp, x=bin_feature_exponential, y=target, title="All Data - Exponential"
)
fig.show()
df_tmp = df_analysis.groupby([bin_feature_sigmoid]).agg({target: "mean"}).reset_index()
fig = px.bar(df_tmp, x=bin_feature_sigmoid, y=target, title="All Data - Sigmoid")
fig.show()

# Low Volume
low_volume_mask = (df_analysis["dollar_volume"] <= 100000) & (
    df_analysis["dollar_volume"] > 5000
)
df_tmp = (
    df_analysis.loc[low_volume_mask]
    .groupby([bin_feature_exponential])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(
    df_tmp,
    x=bin_feature_exponential,
    y=target,
    title="Dollar Volume <= $100,000 - Exponential",
)
fig.show()
df_tmp = (
    df_analysis.loc[low_volume_mask]
    .groupby([bin_feature_sigmoid])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(
    df_tmp, x=bin_feature_sigmoid, y=target, title="Dollar Volume <= $100,000 - Sigmoid"
)
fig.show()

# High Volume
high_volume_mask = df_analysis["dollar_volume"] >= 5000000
df_tmp = (
    df_analysis.loc[high_volume_mask]
    .groupby([bin_feature_exponential])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(
    df_tmp,
    x=bin_feature_exponential,
    y=target,
    title="Dollar Volume >= $5,000,000 - Exponential",
)
fig.show()
df_tmp = (
    df_analysis.loc[high_volume_mask]
    .groupby([bin_feature_sigmoid])
    .agg({target: "mean"})
    .reset_index()
)
fig = px.bar(
    df_tmp,
    x=bin_feature_sigmoid,
    y=target,
    title="Dollar Volume >= $5,000,000 - Sigmoid",
)
fig.show()

It really seems like a wash between these two signals.

**Takeaways:**
- **Backtest both exponential and sigmoid activation functions**
- **If incorporating cross-sectional momentum,**
  - **For low volume universe keep top 3 deciles (top 30%)**
  - **For high volume universe keep top 2 deciles (top 15-20%)**

## Combined Model - Multivariate OLS

What other factors can we combine with trend to predict returns? Some ideas I've seen mentioned in other places include: funding rates, basis, borrowing rates.

In [None]:
import statsmodels.formula.api as sm

# Rename column starting with number
df_analysis["thirty_day_returns"] = df_analysis["30d_returns"]
df_analysis["thirty_day_dollar_volume"] = df_analysis["30d_dollar_volume"]

features = ["thirty_day_returns", "thirty_day_dollar_volume"]
target = "next_1d_returns"
result = sm.ols(
    formula=f"{target} ~ {' + '.join(features)}", data=df_analysis.dropna()
).fit()
print(result.summary())

print(result.rsquared)

# Workspace

### Trend Signal Rolling Absolute Averages

In [None]:
print(f'Trend Signal Mean: {df_analysis["trend_signal"].dropna().mean():.2f}')
print(
    f'Positive Trend Signal Mean: {df_analysis.loc[df_analysis["trend_signal"] >= 0]["trend_signal"].dropna().mean():.2f}'
)
print(
    f'Negative Trend Signal Mean: {df_analysis.loc[df_analysis["trend_signal"] < 0]["trend_signal"].dropna().mean():.2f}'
)
print(
    f'Absolute Trend Signal Mean: {np.abs(df_analysis["trend_signal"].dropna()).mean():.2f}'
)
print(f'Trend Signal Median: {df_analysis["trend_signal"].dropna().median():.2f}')
fig = px.histogram(df_analysis, x="trend_signal")
fig.show()

df_analysis["abs_trend_signal"] = np.abs(df_analysis["trend_signal"])
df_tmp = (
    df_analysis.groupby(["timestamp"]).agg({"abs_trend_signal": "mean"}).reset_index()
)
df_tmp["abs_trend_signal_30d_ema"] = (
    df_tmp["abs_trend_signal"].ewm(span=180, adjust=True, ignore_na=False).mean()
)
fig = px.line(df_tmp, x="timestamp", y="abs_trend_signal_30d_ema")
fig.show()

df_positive_only = df_analysis.loc[df_analysis["trend_signal"] > 0]
df_tmp = (
    df_positive_only.groupby("timestamp").agg({"trend_signal": "mean"}).reset_index()
)
df_tmp["pos_trend_signal_30d_ema"] = (
    df_tmp["trend_signal"].ewm(span=180, adjust=True, ignore_na=False).mean()
)
fig = px.line(df_tmp, x="timestamp", y="pos_trend_signal_30d_ema")
fig.show()

In [None]:
print(f'Trend Signal Mean: {df_analysis["trend_signal_sigmoid"].dropna().mean():.2f}')
print(
    f'Positive Trend Signal Mean: {df_analysis.loc[df_analysis["trend_signal_sigmoid"] >= 0]["trend_signal_sigmoid"].mean():.2f}'
)
print(
    f'Negative Trend Signal Mean: {df_analysis.loc[df_analysis["trend_signal_sigmoid"] < 0]["trend_signal_sigmoid"].mean():.2f}'
)
print(
    f'Absolute Trend Signal Mean: {np.abs(df_analysis["trend_signal_sigmoid"].dropna()).mean():.2f}'
)
print(
    f'Trend Signal Median: {df_analysis["trend_signal_sigmoid"].dropna().median():.2f}'
)
fig = px.histogram(df_analysis, x="trend_signal_sigmoid")
fig.show()

df_analysis["abs_trend_signal_sigmoid"] = np.abs(df_analysis["trend_signal_sigmoid"])
df_tmp = (
    df_analysis.groupby(["timestamp"])
    .agg({"abs_trend_signal_sigmoid": "mean"})
    .reset_index()
)
df_tmp["abs_trend_signal_sigmoid_30d_ema"] = (
    df_tmp["abs_trend_signal_sigmoid"]
    .ewm(span=180, adjust=True, ignore_na=False)
    .mean()
)
fig = px.line(df_tmp, x="timestamp", y="abs_trend_signal_sigmoid_30d_ema")
fig.show()

df_positive_only = df_analysis.loc[df_analysis["trend_signal_sigmoid"] > 0]
df_tmp = (
    df_positive_only.groupby("timestamp")
    .agg({"trend_signal_sigmoid": "mean"})
    .reset_index()
)
df_tmp["pos_trend_signal_sigmoid_30d_ema"] = (
    df_tmp["trend_signal_sigmoid"].ewm(span=180, adjust=True, ignore_na=False).mean()
)
fig = px.line(df_tmp, x="timestamp", y="pos_trend_signal_sigmoid_30d_ema")
fig.show()

### Number of Open Positions

In [None]:
signal = "trend_signal"
periods_per_day = 1
direction = Direction.LongOnly
volatility_target = 0.35
cross_sectional_percentage = None
cross_sectional_equal_weight = False
min_daily_volume = 10000  # Minimum avg daily volume [USD]
max_daily_volume = None  # Maximum avg daily volume [USD]

df_analysis = create_trading_signals(
    df_daily, periods_per_day=periods_per_day, signal_type=SignalType.Rohrbach
)
df_positions = generate_positions(
    df_analysis,
    signal=signal,
    periods_per_day=periods_per_day,
    direction=direction,
    volatility_target=volatility_target,
    cross_sectional_percentage=cross_sectional_percentage,
    cross_sectional_equal_weight=cross_sectional_equal_weight,
    min_daily_volume=min_daily_volume,
    max_daily_volume=max_daily_volume,
)

In [None]:
df_positions.loc[
    (df_positions["filter_volume"])
    & (df_positions["avg_1d_dollar_volume_over_30d"] > 10000)
][
    [
        "ticker",
        "timestamp",
        "avg_1d_dollar_volume_over_30d",
        "trend_signal_scaled",
        "volume_above_min",
        "volume_below_max",
        "filter_volume",
    ]
].sort_values(
    by="avg_1d_dollar_volume_over_30d"
)

In [None]:
# bins=[1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10]
dollar_volume_col = "avg_1d_dollar_volume_over_30d"
fig = px.histogram(
    df_positions.loc[df_positions[dollar_volume_col] < 10e6],
    x=dollar_volume_col,
    nbins=20,
)
fig.show()

In [None]:
initial_capital = 12000
rebalancing_freq = None
volume_max_size = 0.01
rebalancing_buffer = 0.001

pf_portfolio = backtest(
    df_positions,
    periods_per_day=periods_per_day,
    initial_capital=initial_capital,
    rebalancing_freq=rebalancing_freq,
    start_date=start_date,
    end_date=end_date,
    with_fees=True,
    volume_max_size=volume_max_size,
    rebalancing_buffer=rebalancing_buffer,
    verbose=False,
)

In [None]:
def get_trade_volume(pf_portfolio: vbt.Portfolio) -> pd.Series:
    entry_trades = pf_portfolio.entry_trades.records_readable
    entry_trades["Entry Size [$]"] = (
        entry_trades["Size"] * entry_trades["Avg Entry Price"]
    )
    entry_volume = entry_trades[["Entry Timestamp", "Entry Size [$]"]]
    entry_volume = entry_volume.rename(columns={"Entry Timestamp": "timestamp"})
    entry_volume = (
        entry_volume.sort_values(by="timestamp")
        .groupby("timestamp")
        .agg({"Entry Size [$]": "sum"})
        .reset_index()
    )

    exit_trades = pf_portfolio.exit_trades.records_readable
    exit_trades["Exit Size [$]"] = entry_trades["Size"] * entry_trades["Avg Exit Price"]
    exit_volume = exit_trades[["Exit Timestamp", "Exit Size [$]"]]
    exit_volume = exit_volume.rename(columns={"Exit Timestamp": "timestamp"})
    exit_volume = (
        exit_volume.sort_values(by="timestamp")
        .groupby("timestamp")
        .agg({"Exit Size [$]": "sum"})
        .reset_index()
    )

    df_volume = entry_volume.merge(exit_volume, how="outer", on="timestamp").fillna(
        value=0
    )
    df_volume["Traded Size [$]"] = (
        df_volume["Entry Size [$]"] + df_volume["Exit Size [$]"]
    )

    trade_volume = df_volume["Traded Size [$]"]
    trade_volume.index = df_volume["timestamp"]
    return trade_volume


def get_turnover(pf_portfolio: vbt.Portfolio) -> pd.Series:
    trade_volume = get_trade_volume(pf_portfolio)
    pf_value = pf_portfolio.value()
    trade_volume = trade_volume.reindex(pf_value.index, fill_value=0)
    turnover = (trade_volume / pf_portfolio.value()).rename("Turnover [%]")
    return turnover

In [None]:
trade_volume = get_trade_volume(pf_portfolio)
idx = pd.date_range(trade_volume.index.min(), trade_volume.index.max())
trade_volume = trade_volume.reindex(idx, fill_value=0)

fig = px.line(trade_volume)
fig.show()

In [None]:
turnover = get_turnover(pf_portfolio)
fig = px.line(turnover)
fig.show()

turnover

In [None]:
entry_trades = pf_portfolio.entry_trades.records_readable
entry_trades.sort_values(by="Entry Timestamp").loc[
    entry_trades["Entry Timestamp"] >= "2022-12-31"
]

In [None]:
df_positions.loc[
    (df_positions["timestamp"] == "2023-01-01") & (df_positions["scaled_position"] > 0)
][["ticker", "timestamp", "scaled_position"]].sort_values(
    by=["timestamp", "scaled_position"], ascending=[True, False]
)

In [None]:
price = df_positions[["timestamp", "ticker", "close"]]
price = pd.pivot_table(price, index="timestamp", columns="ticker", values="close")
price.fillna(value=0)

volume = df_positions[["timestamp", "ticker", "volume"]]
volume = pd.pivot_table(volume, index="timestamp", columns="ticker", values="volume")
volume.fillna(value=0)

### Inspect IDM & FDM

In [None]:
idm_ser = compute_idm(df_positions, feature_column=PAST_7D_RETURNS_COL)
idm_30d_ema = idm_ser.ewm(span=30, adjust=True, ignore_na=False).mean()

In [None]:
df_tmp = (
    df_positions.groupby("timestamp")
    .agg({"idm": "first", "idm_30d_ema": "first"})
    .reset_index()
)
fig = px.line(df_tmp, x="timestamp", y=["idm", "idm_30d_ema"])
fig.show()

In [None]:
idm_trend_ser = compute_idm(df_positions, feature_column="trend_signal")
idm_trend_30d_ema = idm_trend_ser.ewm(span=30, adjust=True, ignore_na=False).mean()

In [None]:
idm_product_ser = idm_ser * idm_trend_ser
idm_product_30d_ema = idm_product_ser.ewm(span=30, adjust=True, ignore_na=False).mean()
# idm_product_30d_ema = idm_30d_ema * idm_trend_30d_ema

df_tmp = (
    pd.DataFrame.from_dict(
        {
            "idm": idm_ser,
            "idm_30d_ema": idm_30d_ema,
            "idm_trend": idm_trend_ser,
            "idm_trend_30d_ema": idm_trend_30d_ema,
            "idm_product": idm_product_ser,
            "idm_product_30d_ema": idm_product_30d_ema,
        }
    )
    .reset_index()
    .rename(columns={"index": "timestamp"})
)
fig = px.line(
    df_tmp,
    x="timestamp",
    y=[
        "idm",
        "idm_30d_ema",
        "idm_trend",
        "idm_trend_30d_ema",
        "idm_product",
        "idm_product_30d_ema",
    ],
)
fig.show()