# Chapter 4: Sample Weights for Financial Machine Learning

This notebook demonstrates the sample weighting techniques from **Advances in Financial Machine Learning** by Marcos López de Prado.

## Why Sample Weights Matter

Most machine learning algorithms assume that observations are **Independent and Identically Distributed (IID)**. This assumption is fundamentally violated in financial applications because:

1. **Overlapping outcomes**: When using the triple-barrier method, labels can depend on overlapping time periods
2. **Serial correlation**: Financial returns exhibit autocorrelation
3. **Regime changes**: Market conditions evolve over time, making older observations less relevant

This chapter addresses these issues through:
- **Uniqueness estimation**: Measuring how much independent information each sample contains
- **Sequential bootstrap**: Sampling method that accounts for overlaps
- **Return attribution**: Weighting samples by their informational content
- **Time decay**: Reducing the influence of older observations

## Table of Contents

1. [The Problem: Overlapping Outcomes](#1.-The-Problem:-Overlapping-Outcomes)
2. [Concurrent Labels and Uniqueness](#2.-Concurrent-Labels-and-Uniqueness)
3. [The Sequential Bootstrap](#3.-The-Sequential-Bootstrap)
4. [Sample Weights by Return Attribution](#4.-Sample-Weights-by-Return-Attribution)
5. [Time Decay](#5.-Time-Decay)
6. [Putting It All Together](#6.-Putting-It-All-Together)

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
from pathlib import Path

# Add parent directory to path for imports
sys.path.insert(0, str(Path.cwd().parent.parent))

# Import our modules
from afml.labeling import (
    get_daily_volatility,
    get_events,
    add_vertical_barrier,
    get_labels,
)
from afml.sample_weights import (
    get_num_concurrent_events,
    get_average_uniqueness,
    compute_sample_uniqueness,
    get_indicator_matrix,
    get_average_uniqueness_from_matrix,
    sequential_bootstrap,
    compare_bootstrap_methods,
    get_sample_weights_by_return,
    compute_sample_weights,
    apply_time_decay,
    apply_exponential_decay,
)

# Plotting settings
try:
    plt.style.use('seaborn-v0_8-whitegrid')
except:
    try:
        plt.style.use('seaborn-whitegrid')
    except:
        pass
plt.rcParams['figure.figsize'] = (12, 6)

# Reproducibility
np.random.seed(42)

## Generate Synthetic Data

We'll use synthetic price data to demonstrate the concepts. This allows us to control the data properties and clearly illustrate the effects of different weighting schemes.

In [None]:
def generate_synthetic_prices(
    n_days: int = 250,
    initial_price: float = 100.0,
    annual_volatility: float = 0.20,
    freq: str = 'h',
) -> pd.Series:
    """Generate synthetic price data using geometric Brownian motion."""
    periods = n_days * 24 if freq == 'h' else n_days
    index = pd.date_range(start='2020-01-01', periods=periods, freq=freq)
    
    dt = 1 / 252 if freq == 'D' else 1 / (252 * 24)
    daily_vol = annual_volatility * np.sqrt(dt)
    
    returns = daily_vol * np.random.randn(periods)
    prices = initial_price * np.exp(np.cumsum(returns))
    
    return pd.Series(prices, index=index, name='close')


# Generate price data
close_prices = generate_synthetic_prices(n_days=250)

# Plot
fig, ax = plt.subplots(figsize=(14, 5))
close_prices.plot(ax=ax, linewidth=0.8)
ax.set_title('Synthetic Price Series', fontsize=14)
ax.set_xlabel('Date')
ax.set_ylabel('Price')
plt.tight_layout()
plt.show()

print(f"Price data: {len(close_prices)} bars")
print(f"Date range: {close_prices.index[0]} to {close_prices.index[-1]}")

---

## 1. The Problem: Overlapping Outcomes

### Why IID Matters

Most ML algorithms assume that training samples are **independent** - knowing one sample tells you nothing about another. In finance, this assumption is violated when we use the triple-barrier method.

### The Blood Sample Analogy

Imagine you're a medical researcher with blood samples from 100 patients. You want to predict cholesterol levels from diet and exercise data.

**Normal case (IID)**: Each tube contains blood from exactly one patient. The samples are independent.

**Financial ML case (non-IID)**: Someone accidentally spills blood from each tube into the next 9 tubes. Now:
- Tube 10 contains blood from patients 1-10
- Tube 11 contains blood from patients 2-11
- And so on...

The "spillage" in financial ML comes from **overlapping holding periods**.

### Visualizing the Overlap

Let's create events using the triple-barrier method and visualize how they overlap.

In [None]:
# Create events using triple-barrier method
sample_indices = close_prices.index[::24]  # Sample daily
timestamp_events = pd.DatetimeIndex(sample_indices[:-20])

# Compute volatility for barrier thresholds
target_returns = get_daily_volatility(close_prices, lookback_span=50)

# Add vertical barriers (5-day holding period)
vertical_barriers = add_vertical_barrier(
    timestamp_events=timestamp_events,
    close_prices=close_prices,
    num_days=5,
)

# Get events
events = get_events(
    close_prices=close_prices,
    timestamp_events=timestamp_events,
    profit_taking_stop_loss=[2.0, 2.0],
    target_returns=target_returns,
    min_return=0.0001,
    num_threads=1,
    vertical_barrier_times=vertical_barriers,
    side=None,
)

print(f"Number of events: {len(events)}")
print(f"\nSample events (showing start time and end time t1):")
print(events.head(10))

In [None]:
def visualize_event_overlaps(events_df, num_events=15):
    """
    Visualize how events overlap in time.
    Each horizontal bar represents an event's lifespan.
    """
    fig, ax = plt.subplots(figsize=(14, 8))
    
    subset = events_df.head(num_events)
    colors = plt.cm.tab20(np.linspace(0, 1, num_events))
    
    for i, (start_time, row) in enumerate(subset.iterrows()):
        end_time = row['t1']
        if pd.notna(end_time):
            ax.barh(
                y=i,
                width=(end_time - start_time).total_seconds() / 3600,
                left=(start_time - subset.index[0]).total_seconds() / 3600,
                height=0.8,
                color=colors[i],
                alpha=0.7,
                edgecolor='black',
                linewidth=0.5,
            )
    
    ax.set_xlabel('Hours from first event', fontsize=12)
    ax.set_ylabel('Event Index', fontsize=12)
    ax.set_title('Event Lifespans: Visualizing Overlapping Outcomes', fontsize=14)
    ax.set_yticks(range(num_events))
    
    # Add grid lines
    ax.grid(True, axis='x', alpha=0.3)
    
    plt.tight_layout()
    plt.show()


visualize_event_overlaps(events, num_events=15)

print("""\n
OBSERVATION: Notice how events overlap significantly.
- Event 0 might overlap with Events 1, 2, 3, and 4
- This means their labels are NOT independent
- Standard ML methods will treat them as if they were independent
- This leads to overfitting and inflated accuracy estimates
""")

### The Consequences of Ignoring Overlap

When we ignore overlapping outcomes:

1. **Redundant information**: The same market move affects multiple labels
2. **Overfitting**: The model memorizes the redundant patterns
3. **Inflated metrics**: Out-of-bag and cross-validation scores are too optimistic
4. **Poor generalization**: The model fails on truly new data

The solution is to **measure and account for the overlap** using uniqueness scores and sample weights.

---

## 2. Concurrent Labels and Uniqueness

### Concurrency: Counting Overlaps

Two labels $y_i$ and $y_j$ are **concurrent** at time $t$ when both depend on the return at time $t$.

For each time point, we count how many labels are "active" (their event spans that time):

$$c_t = \sum_{i=1}^{I} \mathbb{1}_{t,i}$$

where $\mathbb{1}_{t,i} = 1$ if event $i$ spans time $t$, and 0 otherwise.

### Uniqueness: Measuring Independence

The **uniqueness** of label $i$ at time $t$ is:

$$u_{t,i} = \frac{1}{c_t}$$

The **average uniqueness** of label $i$ over its lifespan is:

$$\bar{u}_i = \frac{\sum_{t=t_0}^{t_1} u_{t,i}}{\sum_{t=t_0}^{t_1} \mathbb{1}_{t,i}}$$

**Interpretation:**
- $\bar{u}_i = 1$: Label $i$ has no overlap with any other label (fully unique)
- $\bar{u}_i = 0.5$: On average, label $i$ shares its information with one other label
- $\bar{u}_i \to 0$: Label $i$ is highly redundant with many other labels

In [None]:
# Compute concurrent events at each time point
event_end_times = events['t1']

num_concurrent = get_num_concurrent_events(
    close_index=close_prices.index,
    event_end_times=event_end_times,
)

# Plot concurrency over time
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Price
price_subset = close_prices.loc[num_concurrent.index[0]:num_concurrent.index[-1]]
axes[0].plot(price_subset.index, price_subset.values, linewidth=0.8)
axes[0].set_title('Price Series', fontsize=12)
axes[0].set_ylabel('Price')

# Concurrency
axes[1].fill_between(num_concurrent.index, 0, num_concurrent.values, alpha=0.7)
axes[1].set_title('Number of Concurrent Labels at Each Time Point', fontsize=12)
axes[1].set_ylabel('Concurrent Labels')
axes[1].set_xlabel('Date')

plt.tight_layout()
plt.show()

print(f"Concurrency statistics:")
print(f"  Mean: {num_concurrent.mean():.2f}")
print(f"  Max:  {num_concurrent.max():.0f}")
print(f"  Min:  {num_concurrent.min():.0f}")

In [None]:
# Compute average uniqueness for each event
# Handle duplicates and reindex
num_concurrent_clean = num_concurrent.loc[~num_concurrent.index.duplicated(keep='last')]
num_concurrent_reindexed = num_concurrent_clean.reindex(close_prices.index).fillna(1)

avg_uniqueness = get_average_uniqueness(
    event_end_times=event_end_times,
    num_concurrent_events=num_concurrent_reindexed,
)

# Plot histogram of uniqueness
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(avg_uniqueness.dropna(), bins=30, edgecolor='black', alpha=0.7)
axes[0].axvline(x=avg_uniqueness.mean(), color='red', linestyle='--', 
                label=f'Mean: {avg_uniqueness.mean():.3f}')
axes[0].set_title('Distribution of Average Uniqueness', fontsize=12)
axes[0].set_xlabel('Average Uniqueness')
axes[0].set_ylabel('Frequency')
axes[0].legend()

# Time series
axes[1].plot(avg_uniqueness.index, avg_uniqueness.values, linewidth=0.8)
axes[1].axhline(y=avg_uniqueness.mean(), color='red', linestyle='--', alpha=0.7)
axes[1].set_title('Average Uniqueness Over Time', fontsize=12)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Average Uniqueness')

plt.tight_layout()
plt.show()

print(f"\nAverage uniqueness statistics:")
print(f"  Mean: {avg_uniqueness.mean():.4f}")
print(f"  Std:  {avg_uniqueness.std():.4f}")
print(f"  Min:  {avg_uniqueness.min():.4f}")
print(f"  Max:  {avg_uniqueness.max():.4f}")

### Interpreting Uniqueness

The histogram above shows how uniqueness is distributed across our events:

- **Peak location**: Where most events fall in terms of uniqueness
- **Spread**: How much variation exists in overlap patterns
- **Low values**: Events that are highly redundant (share information with many others)

**Key insight**: If average uniqueness is low (e.g., 0.2), it means each observation effectively represents only 20% of a truly independent sample. Standard bootstrap would oversample by a factor of 5!

---

## 3. The Sequential Bootstrap

### The Problem with Standard Bootstrap

Standard bootstrap assumes IID observations and samples with uniform probability. When observations overlap:

1. **Redundant draws**: High probability of drawing overlapping samples
2. **Oversampling**: The same information is represented multiple times
3. **Overfitting**: Random forests become collections of similar overfit trees
4. **Inflated OOB scores**: Out-of-bag samples are too similar to in-bag samples

### The Sequential Bootstrap Solution

Sequential bootstrap adjusts sampling probabilities to reduce overlap:

1. **First draw**: Uniform probability (all events equally likely)
2. **Subsequent draws**: Probability proportional to uniqueness given already-selected events
3. **Effect**: Events overlapping with selected ones become less likely to be drawn

### The Indicator Matrix

The foundation of sequential bootstrap is the **indicator matrix** $\{\mathbb{1}_{t,i}\}$:

- Rows: Time points (bars)
- Columns: Events
- Value: 1 if event spans that time, 0 otherwise

In [None]:
# Simple example from the book (Section 4.5.3)
# y1: r[0,3], y2: r[2,4], y3: r[4,6]

example_bar_idx = pd.RangeIndex(7)  # bars 0-6
example_t1 = pd.Series([2, 3, 5], index=[0, 2, 4])

# Build indicator matrix
example_ind_matrix = get_indicator_matrix(example_bar_idx, example_t1)

print("Indicator Matrix (Book Example):")
print("================================")
print("Rows = time points (bars 0-6)")
print("Columns = events (0, 1, 2)")
print()
print(example_ind_matrix)
print()
print("Event spans:")
print("  Event 0: bars 0-2 (depends on returns r[0,1], r[1,2])")
print("  Event 1: bars 2-3 (depends on returns r[2,3])")
print("  Event 2: bars 4-5 (depends on returns r[4,5])")
print()
print("Overlaps:")
print("  Events 0 and 1 overlap at bar 2")
print("  Event 2 has no overlap with others")

In [None]:
# Visualize the indicator matrix
fig, ax = plt.subplots(figsize=(8, 6))

im = ax.imshow(example_ind_matrix.values, cmap='Blues', aspect='auto')
ax.set_xticks(range(3))
ax.set_xticklabels(['Event 0', 'Event 1', 'Event 2'])
ax.set_yticks(range(7))
ax.set_yticklabels([f'Bar {i}' for i in range(7)])
ax.set_title('Indicator Matrix Visualization', fontsize=14)
ax.set_xlabel('Events')
ax.set_ylabel('Time (Bars)')

# Add text annotations
for i in range(7):
    for j in range(3):
        text = ax.text(j, i, example_ind_matrix.iloc[i, j],
                       ha='center', va='center', fontsize=12,
                       color='white' if example_ind_matrix.iloc[i, j] else 'gray')

plt.colorbar(im, ax=ax, label='Active (1) / Inactive (0)')
plt.tight_layout()
plt.show()

In [None]:
# Compute average uniqueness from indicator matrix
example_uniqueness = get_average_uniqueness_from_matrix(example_ind_matrix)

print("Average Uniqueness:")
print("==================")
for i, uniq in enumerate(example_uniqueness):
    print(f"  Event {i}: {uniq:.4f}")

print()
print("Interpretation:")
print("  Event 0: 0.833 - spans 3 bars, 1 bar overlaps with Event 1")
print("  Event 1: 0.750 - spans 2 bars, 1 bar overlaps with Event 0") 
print("  Event 2: 1.000 - no overlap with any other event (fully unique)")

### Sequential Bootstrap: Step by Step

Let's trace through the sequential bootstrap algorithm:

**Draw 1**: All events have equal probability (1/3 each)
- Suppose we draw Event 2

**Draw 2**: Update probabilities based on overlap with {Event 2}
- Event 0: No overlap with Event 2 → High probability
- Event 1: No overlap with Event 2 → High probability  
- Event 2: Perfect overlap with itself → Lower probability

**Draw 3**: Update probabilities based on overlap with drawn events
- Probabilities adjust to favor less overlapping events

In [None]:
# Run sequential bootstrap multiple times
print("Sequential Bootstrap Samples (10 runs):")
print("=======================================")
for i in range(10):
    sample = sequential_bootstrap(example_ind_matrix, random_state=i)
    sample_uniqueness = get_average_uniqueness_from_matrix(
        example_ind_matrix.iloc[:, sample]
    ).mean()
    print(f"  Run {i}: {sample} - Avg uniqueness: {sample_uniqueness:.3f}")

In [None]:
# For large real-world datasets, building the full indicator matrix can be memory-intensive
# Here's how you would do it (commented out for performance):
#
# real_ind_matrix = get_indicator_matrix(
#     bar_index=close_prices.index,
#     event_end_times=event_end_times,
# )

# For our demonstration, we'll use the small example matrix
# which clearly shows the sequential bootstrap advantage
print("For real-world applications:")
print(f"  - You would have ~{len(close_prices)} bars x ~{len(events)} events")
print(f"  - That's a {len(close_prices)} x {len(events)} indicator matrix")
print("  - Sequential bootstrap can be computationally expensive")
print("  - Consider using the max_samples approach instead for large datasets")

In [None]:
# Compare standard vs sequential bootstrap
# Note: Sequential bootstrap is computationally expensive on large datasets
# We'll use the small example for demonstration

# Compare bootstrap methods on the small example
comparison = compare_bootstrap_methods(
    indicator_matrix=example_ind_matrix,
    num_iterations=100,
    random_state=42,
)

# Plot comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histograms
axes[0].hist(comparison['standard_uniqueness'], bins=15, alpha=0.7, 
             label='Standard Bootstrap', edgecolor='black')
axes[0].hist(comparison['sequential_uniqueness'], bins=15, alpha=0.7,
             label='Sequential Bootstrap', edgecolor='black')
axes[0].axvline(x=comparison['standard_uniqueness'].median(), color='blue',
                linestyle='--', label=f'Std Median: {comparison["standard_uniqueness"].median():.3f}')
axes[0].axvline(x=comparison['sequential_uniqueness'].median(), color='orange',
                linestyle='--', label=f'Seq Median: {comparison["sequential_uniqueness"].median():.3f}')
axes[0].set_title('Bootstrap Sample Uniqueness Comparison', fontsize=12)
axes[0].set_xlabel('Average Uniqueness')
axes[0].set_ylabel('Frequency')
axes[0].legend()

# Box plot
axes[1].boxplot([comparison['standard_uniqueness'], comparison['sequential_uniqueness']],
                labels=['Standard', 'Sequential'])
axes[1].set_title('Uniqueness Distribution by Method', fontsize=12)
axes[1].set_ylabel('Average Uniqueness')

plt.tight_layout()
plt.show()

print("\nStatistical Comparison:")
print(comparison.describe())
print(f"\nImprovement: Sequential bootstrap achieves {(comparison['sequential_uniqueness'].mean() - comparison['standard_uniqueness'].mean()) / comparison['standard_uniqueness'].mean() * 100:.1f}% higher average uniqueness")

### Key Takeaway

Sequential bootstrap produces samples with **higher average uniqueness** than standard bootstrap. This means:

1. Less redundancy in training data
2. More diverse decision trees in random forests
3. More realistic out-of-bag error estimates
4. Better generalization to new data

---

## 4. Sample Weights by Return Attribution

### The Weighting Problem

Even with sequential bootstrap, we need to tell the ML algorithm how much to "trust" each sample. Two factors matter:

1. **Uniqueness**: How much independent information does this sample contain?
2. **Return magnitude**: How significant is the price move that determined this label?

### Return Attribution

The weight for event $i$ is the absolute sum of attributed returns over its lifespan:

$$\tilde{w}_i = \left| \sum_{t=t_{i,0}}^{t_{i,1}} \frac{r_{t-1,t}}{c_t} \right|$$

where:
- $r_{t-1,t}$ is the log return at time $t$
- $c_t$ is the concurrency at time $t$

**Intuition**: 
- Divide each return by concurrency (attribute fairly across overlapping events)
- Sum the attributed returns over the event's lifespan
- Take absolute value (we care about magnitude, not direction)

Finally, scale weights to sum to the number of observations:

$$w_i = \tilde{w}_i \cdot \frac{I}{\sum_{j=1}^{I} \tilde{w}_j}$$

In [None]:
# Compute sample weights by return attribution
sample_weights = compute_sample_weights(
    event_end_times=event_end_times,
    close_prices=close_prices,
    use_returns=True,
)

print(f"Sample weights computed for {len(sample_weights)} events")
print(f"\nWeight statistics:")
print(f"  Sum: {sample_weights.sum():.2f} (should equal {len(sample_weights)})")
print(f"  Mean: {sample_weights.mean():.4f}")
print(f"  Std: {sample_weights.std():.4f}")
print(f"  Min: {sample_weights.min():.4f}")
print(f"  Max: {sample_weights.max():.4f}")

In [None]:
# Visualize sample weights
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Histogram of weights
axes[0, 0].hist(sample_weights, bins=30, edgecolor='black', alpha=0.7)
axes[0, 0].axvline(x=1.0, color='red', linestyle='--', label='Default weight (1.0)')
axes[0, 0].set_title('Distribution of Sample Weights', fontsize=12)
axes[0, 0].set_xlabel('Weight')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()

# Weights over time
axes[0, 1].plot(sample_weights.index, sample_weights.values, linewidth=0.8)
axes[0, 1].axhline(y=1.0, color='red', linestyle='--', alpha=0.7)
axes[0, 1].set_title('Sample Weights Over Time', fontsize=12)
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Weight')

# Weight vs Uniqueness
common_idx = sample_weights.index.intersection(avg_uniqueness.index)
axes[1, 0].scatter(avg_uniqueness.loc[common_idx], sample_weights.loc[common_idx], 
                   alpha=0.5, s=20)
axes[1, 0].set_title('Weight vs Uniqueness', fontsize=12)
axes[1, 0].set_xlabel('Average Uniqueness')
axes[1, 0].set_ylabel('Sample Weight')

# Get labels and show weight vs return
labels = get_labels(events, close_prices)
common_idx_labels = sample_weights.index.intersection(labels.index)
axes[1, 1].scatter(labels.loc[common_idx_labels, 'ret'].abs(), 
                   sample_weights.loc[common_idx_labels], alpha=0.5, s=20)
axes[1, 1].set_title('Weight vs Absolute Return', fontsize=12)
axes[1, 1].set_xlabel('Absolute Return')
axes[1, 1].set_ylabel('Sample Weight')

plt.tight_layout()
plt.show()

### Interpreting the Weights

The scatter plots reveal important patterns:

1. **Weight vs Uniqueness**: Higher uniqueness generally leads to higher weights (more independent information)

2. **Weight vs Return**: Larger absolute returns lead to higher weights (more significant price moves)

3. **Combined effect**: A sample with high uniqueness AND large return gets the highest weight

**Usage in sklearn**:
```python
model.fit(X_train, y_train, sample_weight=sample_weights)
```

---

## 5. Time Decay

### Why Time Decay?

Markets are **adaptive systems** (Lo, 2017). As market participants learn and adapt:

- Old patterns become less relevant
- Recent observations are more predictive of future behavior
- Models trained on stale data may fail

Time decay reduces the weight of older observations, helping the model focus on recent patterns.

### The Decay Function

We apply a **piecewise-linear decay** function:

$$d(x) = \max\{0, a + bx\}$$

where $x$ is cumulative uniqueness (not chronological time!).

The parameter $c \in (-1, 1]$ controls the decay shape:

| Parameter $c$ | Effect |
|--------------|--------|
| $c = 1$ | No decay (all weights unchanged) |
| $0 < c < 1$ | Linear decay; oldest gets weight $c$ |
| $c = 0$ | Linear decay to zero for oldest |
| $-1 < c < 0$ | Oldest $|c|$ fraction gets zero weight |

**Why cumulative uniqueness instead of time?**

If we decay by chronological time, redundant observations would be penalized too quickly. Using cumulative uniqueness ensures that decay is proportional to actual information content.

In [None]:
# Demonstrate different decay factors
decay_factors = [1.0, 0.75, 0.5, 0.0, -0.25, -0.5]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for i, c in enumerate(decay_factors):
    decayed = apply_time_decay(sample_weights, decay_factor=c)
    
    # Plot original vs decayed
    axes[i].plot(sample_weights.index, sample_weights.values, 
                 alpha=0.5, label='Original', linewidth=0.8)
    axes[i].plot(decayed.index, decayed.values, 
                 label='Decayed', linewidth=0.8)
    axes[i].set_title(f'Decay Factor c = {c}', fontsize=12)
    axes[i].set_xlabel('Date')
    axes[i].set_ylabel('Weight')
    axes[i].legend()
    
    # Add annotation
    if c == 1.0:
        axes[i].annotate('No decay', xy=(0.5, 0.9), xycoords='axes fraction',
                         fontsize=10, ha='center')
    elif c == 0:
        axes[i].annotate('Oldest → 0', xy=(0.5, 0.9), xycoords='axes fraction',
                         fontsize=10, ha='center')
    elif c < 0:
        axes[i].annotate(f'Oldest {abs(c)*100:.0f}% = 0', xy=(0.5, 0.9), 
                         xycoords='axes fraction', fontsize=10, ha='center')

plt.tight_layout()
plt.show()

In [None]:
# Compare linear vs exponential decay
linear_decayed = apply_time_decay(sample_weights, decay_factor=0.5)
exp_decayed = apply_exponential_decay(sample_weights, half_life=sample_weights.sum() / 4)

fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(sample_weights.index, sample_weights.values, 
        alpha=0.4, label='Original', linewidth=0.8)
ax.plot(linear_decayed.index, linear_decayed.values,
        label='Linear Decay (c=0.5)', linewidth=1)
ax.plot(exp_decayed.index, exp_decayed.values,
        label='Exponential Decay', linewidth=1, linestyle='--')

ax.set_title('Linear vs Exponential Time Decay', fontsize=14)
ax.set_xlabel('Date')
ax.set_ylabel('Weight')
ax.legend()

plt.tight_layout()
plt.show()

print("Comparison of decay methods:")
print(f"  Original sum: {sample_weights.sum():.2f}")
print(f"  Linear decayed sum: {linear_decayed.sum():.2f}")
print(f"  Exponential decayed sum: {exp_decayed.sum():.2f}")

### Choosing the Decay Factor

The optimal decay factor depends on:

1. **Market regime stability**: Rapidly changing markets → stronger decay (lower $c$)
2. **Data volume**: More data → can afford to discard oldest observations
3. **Prediction horizon**: Short-term predictions → stronger decay
4. **Strategy type**: Mean-reversion may need recent data; momentum may benefit from longer history

**Recommendation**: Start with $c = 0.5$ and tune based on out-of-sample performance.

---

## 6. Putting It All Together

### The Complete Sample Weighting Pipeline

Here's how to combine all the techniques for training an ML model:

```python
# 1. Generate labels using triple-barrier method
events = get_events(...)
labels = get_labels(events, close_prices)

# 2. Compute sample weights
sample_weights = compute_sample_weights(
    event_end_times=events['t1'],
    close_prices=close_prices,
    use_returns=True,
)

# 3. Apply time decay
decayed_weights = apply_time_decay(sample_weights, decay_factor=0.5)

# 4. Train model with weights
model.fit(X_train, y_train, sample_weight=decayed_weights)

# 5. For bagging/random forests, also use sequential bootstrap
# Set max_samples based on average uniqueness
avg_uniqueness = decayed_weights.mean() / len(decayed_weights)
bagging_clf = BaggingClassifier(
    max_samples=avg_uniqueness,
    ...
)
```

In [None]:
# Complete example
print("Complete Sample Weighting Pipeline")
print("=" * 40)

# Step 1: We already have events and labels
print(f"\n1. Events: {len(events)} samples")
print(f"   Labels: {len(labels)} samples")

# Step 2: Compute sample weights
weights = compute_sample_weights(
    event_end_times=events['t1'],
    close_prices=close_prices,
    use_returns=True,
)
print(f"\n2. Sample weights computed")
print(f"   Mean: {weights.mean():.4f}")
print(f"   Std: {weights.std():.4f}")

# Step 3: Apply time decay
final_weights = apply_time_decay(weights, decay_factor=0.5)
print(f"\n3. Time decay applied (c=0.5)")
print(f"   Mean: {final_weights.mean():.4f}")
print(f"   Std: {final_weights.std():.4f}")

# Step 4: Compute uniqueness for max_samples
uniqueness_df = compute_sample_uniqueness(close_prices, events['t1'])
avg_uniq = uniqueness_df['average_uniqueness'].mean()
print(f"\n4. Average uniqueness: {avg_uniq:.4f}")
print(f"   Recommended max_samples for bagging: {avg_uniq:.4f}")

# Summary visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].hist(final_weights, bins=30, edgecolor='black', alpha=0.7)
axes[0].set_title('Final Sample Weights', fontsize=12)
axes[0].set_xlabel('Weight')
axes[0].set_ylabel('Frequency')

axes[1].plot(final_weights.index, final_weights.values, linewidth=0.8)
axes[1].set_title('Weights Over Time', fontsize=12)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Weight')

# Show weight by label
common_idx = final_weights.index.intersection(labels.index)
for label_val in sorted(labels['bin'].unique()):
    mask = labels.loc[common_idx, 'bin'] == label_val
    label_weights = final_weights.loc[common_idx][mask]
    axes[2].hist(label_weights, bins=20, alpha=0.6, 
                 label=f'Label {int(label_val)}')
axes[2].set_title('Weights by Label', fontsize=12)
axes[2].set_xlabel('Weight')
axes[2].set_ylabel('Frequency')
axes[2].legend()

plt.tight_layout()
plt.show()

---

## Summary

### Key Takeaways

1. **Financial data violates IID assumptions**: Overlapping outcomes create dependencies between labels

2. **Uniqueness measures independence**: The average uniqueness score tells us how much independent information each sample contains

3. **Sequential bootstrap reduces redundancy**: By adjusting sampling probabilities, we get samples closer to IID

4. **Return attribution weights samples**: Combines uniqueness with return magnitude to determine sample importance

5. **Time decay focuses on recent data**: Markets evolve, so recent observations are more relevant

### Function Reference

| Function | Purpose |
|----------|--------|
| `get_num_concurrent_events()` | Count overlapping labels at each time |
| `get_average_uniqueness()` | Compute uniqueness score for each label |
| `get_indicator_matrix()` | Build binary matrix for bootstrap |
| `sequential_bootstrap()` | Sample with overlap-aware probabilities |
| `compute_sample_weights()` | Compute weights by return attribution |
| `apply_time_decay()` | Apply linear time decay to weights |
| `apply_exponential_decay()` | Apply exponential time decay |

### Practical Recommendations

1. **Always compute sample weights** when using triple-barrier labels
2. **Use sequential bootstrap** for bagging classifiers (Random Forest, etc.)
3. **Set `max_samples`** to average uniqueness for sklearn's BaggingClassifier
4. **Apply time decay** with $c \approx 0.5$ as a starting point
5. **Use `class_weight='balanced'`** in sklearn to handle label imbalance

---

## Exercises

1. **Vary the holding period**: Change the vertical barrier from 5 days to 10 days. How does this affect average uniqueness?

2. **Monte Carlo comparison**: Run `compare_bootstrap_methods()` with 1000 iterations. Is the difference between standard and sequential bootstrap statistically significant?

3. **Time decay sensitivity**: Try different decay factors ($c = 0.25, 0.5, 0.75$) and observe how weight distributions change.

4. **Implement exponential decay**: The book suggests exponential decay as an exercise. Compare its behavior to linear decay on your data.

5. **Cross-validation impact**: Train a simple classifier with and without sample weights. Compare cross-validation scores to see the effect of proper weighting.