In [15]:
pip install --upgrade nbformat


Collecting nbformat
  Using cached nbformat-5.10.4-py3-none-any.whl.metadata (3.6 kB)
Collecting fastjsonschema>=2.15 (from nbformat)
  Using cached fastjsonschema-2.21.1-py3-none-any.whl.metadata (2.2 kB)
Collecting jsonschema>=2.6 (from nbformat)
  Downloading jsonschema-4.24.0-py3-none-any.whl.metadata (7.8 kB)
Collecting jsonschema-specifications>=2023.03.6 (from jsonschema>=2.6->nbformat)
  Downloading jsonschema_specifications-2025.4.1-py3-none-any.whl.metadata (2.9 kB)
Collecting referencing>=0.28.4 (from jsonschema>=2.6->nbformat)
  Using cached referencing-0.36.2-py3-none-any.whl.metadata (2.8 kB)
Collecting rpds-py>=0.7.1 (from jsonschema>=2.6->nbformat)
  Downloading rpds_py-0.26.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (4.2 kB)
Using cached nbformat-5.10.4-py3-none-any.whl (78 kB)
Using cached fastjsonschema-2.21.1-py3-none-any.whl (23 kB)
Downloading jsonschema-4.24.0-py3-none-any.whl (88 kB)
Downloading jsonschema_specifications-2025.4.1-py3-none-any.whl (18 kB)
Using

Here’s a detailed, step-by-step walkthrough of what your script is doing:

---

## 1. Imports & Dependencies

```python
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.callbacks import EarlyStopping
import vectorbt as vbt
```

* **NumPy & pandas** for array and DataFrame manipulation.
* **yfinance** to download historical/intraday price data from Yahoo Finance.
* **StandardScaler** to normalize your price series so the LSTM trains more smoothly.
* **TensorFlow/Keras** (`Sequential`, `LSTM`, etc.) to build and train your recurrent neural network.
* **vectorbt** for a quick, vectorized backtest of the generated signals.

---

## 2. User-Defined Parameters

```python
SYMBOL     = "AAPL"    # which stock ticker to use
PERIOD     = "60d"     # grab the last 60 days of data
INTERVAL   = "5m"      # with 5-minute bars
LOOKBACK   = 20        # use 20 past bars per LSTM input
TEST_RATIO = 0.2       # reserve 20% of samples for testing
EPOCHS     = 20        # max training epochs
BATCH_SIZE = 64        # gradient-descent batch size
```

These let you quickly swap symbols, timeframes, or model hyperparameters.

---

## 3. Fetch Intraday Data

```python
data = yf.download(
    SYMBOL,
    period=PERIOD,
    interval=INTERVAL,
    progress=False
)
if data.empty:
    raise ValueError("No intraday data – check your symbol/interval!")
close_series = data["Close"].dropna()
```

* Calls `yf.download()` to pull Open/High/Low/Close/Volume at 5-minute intervals over the last 60 days.
* Grabs only the **Close** column and drops any missing values.

---

## 4. Scale the Close Prices

```python
scaler  = StandardScaler()
close_s = scaler.fit_transform(close_series.values.reshape(-1, 1)).ravel()
```

* Neural nets train more reliably on zero-mean, unit-variance data.
* `StandardScaler` fits to your close prices and transforms them so they have mean = 0 and std = 1.
* We keep both the **raw** `close_arr` (for PnL calculations) and the **scaled** `scaled_arr` (for LSTM inputs).

---

## 5. Build LSTM Input Sequences & Targets

```python
seqs, targets, idxs = [], [], []

for i in range(LOOKBACK, len(close_arr) - 1):
    seqs.append(scaled_arr[i - LOOKBACK : i])
    # target is the next bar’s return: (next_close – this_close) / this_close
    targets.append((close_arr[i + 1] - close_arr[i]) / close_arr[i])
    idxs.append(close_series.index[i])
```

* **Sequences**: For each time *i*, grab the previous 20 normalized prices → shape `(20,)`.
* **Targets**: The *actual* percentage return from bar *i* to *i+1*, computed on the raw price scale.
* **Timestamps**: Keep the datetime index for later alignment in backtesting.

You then reshape:

```python
X = np.array(seqs).reshape(-1, LOOKBACK, 1)  # (samples, timesteps, features)
y = np.array(targets)                        # (samples,)
idxs = pd.DatetimeIndex(idxs)
```

---

## 6. Train/Test Split

```python
n_samples = len(X)
split     = int(n_samples * (1 - TEST_RATIO))

X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
idx_train, idx_test = idxs[:split], idxs[split:]
```

* The first 80% of rolling windows go to **training**, the last 20% to **testing**.
* You keep the corresponding timestamps (`idx_train`, `idx_test`) for plotting/backtesting.

---

## 7. Define the LSTM Model

```python
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(LOOKBACK, 1)),
    Dropout(0.2),
    LSTM(50),
    Dropout(0.2),
    Dense(1)
])
model.compile(optimizer="adam", loss="mse")
```

* **First LSTM layer**: 50 memory cells, returns a full sequence so the next LSTM layer can consume it.
* **Dropout(0.2)** after each LSTM to guard against overfitting.
* **Second LSTM layer**: Another 50 units, but returns only its final hidden state.
* **Dense(1)**: A single output predicting the *next‐bar return*.
* Trained with **mean squared error** and the **Adam** optimizer.

---

## 8. Train with Early Stopping

```python
es = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=[es],
    verbose=1
)
```

* Monitors validation MSE and stops if it doesn’t improve for 5 epochs, restoring the best weights.

---

## 9. Predict & Flatten

```python
y_pred_raw = model.predict(X_test)       # shape (n_test, 1)
y_pred = y_pred_raw.flatten()            # shape (n_test,)
```

* Runs your LSTM on the test windows to get predicted next-bar returns.

---

## 10. Build Trading Signals

```python
# Real close prices aligned to predictions:
close_seq        = close_arr[LOOKBACK : -1]
close_test       = close_seq[split:]
# Align lengths and timestamp index:
signals = pd.DataFrame({
    "Close":    close_test.squeeze(),
    "pred_ret": y_pred
}, index=idx_test)
# Simple rule: go long whenever predicted return > 0, exit otherwise
entries = signals["pred_ret"] > 0
exits   = ~entries
```

* **`signals["Close"]`** is your actual price series for the backtest.
* **`pred_ret`** drives entry/exit decisions:

  * **Entry** (`True`) when `pred_ret > 0`
  * **Exit** when `pred_ret <= 0`

---

## 11. Backtest with vectorbt

```python
pf = vbt.Portfolio.from_signals(
    close=signals["Close"],
    entries=entries,
    exits=exits,
    init_cash=100_000,
    fees=0.001,
    freq="5T"
)
```

* **`from_signals`** consumes your price series plus boolean entry/exit masks.
* **`init_cash`** = \$100 000 starting capital.
* **`fees=0.001`** imposes a 0.1% round-trip commission on each trade.
* **`freq="5T"`** tells vectorbt these are 5-minute intervals (important for annualized metrics).

---

## 12. Performance Summary & Visualization

```python
print("Total Return      :", pf.total_return() * 100, "%")
print("Annualized Return :", pf.annualized_return() * 100, "%")
print("Sharpe Ratio      :", pf.sharpe_ratio())
print("Max Drawdown      :", pf.max_drawdown() * 100, "%")

pf.plot_cash_flow().show()
```

* **`total_return()`**: cumulative PnL as a percentage.
* **`annualized_return()`**: compounds the 5-minute PnL up to a yearly figure.
* **`sharpe_ratio()`**: risk-adjusted return (assumes zero risk-free).
* **`max_drawdown()`**: deepest peak-to-trough equity decline.
* **`plot_cash_flow()`**: a built-in vectorbt chart showing deposits/withdrawals and realized PnL over time.

---

### In a nutshell

1. **Fetch** 5-minute Apple prices
2. **Normalize** them and build rolling windows
3. **Train** an LSTM to predict the next bar’s return
4. **Signal**: if predicted return > 0, go long; otherwise close position
5. **Backtest** that strategy vectorized with realistic fees
6. **Report** your key metrics and plot your PnL curve

This gives you an end-to-end pipeline—from raw data to neural prediction to strategy evaluation—all in a few dozen lines of Python.


In [23]:
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.callbacks import EarlyStopping
import vectorbt as vbt

# ────────── 1. PARAMETERS ──────────
SYMBOL     = "AAPL"     # ticker to backtest
PERIOD     = "60d"      # last 60 days of 5m data
INTERVAL   = "5m"
LOOKBACK   = 20         # bars per LSTM input sequence
TEST_RATIO = 0.2
EPOCHS     = 20
BATCH_SIZE = 64

# ────────── 2. FETCH INTRADAY DATA ──────────
data = yf.download(
    SYMBOL,
    period=PERIOD,
    interval=INTERVAL,
    progress=False
)
if data.empty:
    raise ValueError("No intraday data – check your symbol/interval!")
close_series = data["Close"].dropna()  # pandas Series (1-D)
print(f"Fetched {len(close_series)} bars from {close_series.index.min().date()} to {close_series.index.max().date()}")

# ────────── 3. SCALE CLOSE PRICE ──────────
scaler    = StandardScaler()
# Use .values.reshape(-1,1) to give scaler a 2-D array
close_s   = scaler.fit_transform(close_series.values.reshape(-1, 1)).ravel()

# Convert to numpy arrays
close_arr  = close_series.values        # shape (N,)
scaled_arr = close_s                    # shape (N,)

# ────────── 4. BUILD SEQUENCES & TARGETS ──────────
seqs, targets, idxs = [], [], []
for i in range(LOOKBACK, len(close_arr) - 1):
    seqs.append(scaled_arr[i - LOOKBACK : i])               # last LOOKBACK scalars
    targets.append((close_arr[i + 1] - close_arr[i]) / close_arr[i])  # next-bar return
    idxs.append(close_series.index[i])                      # timestamp

X = np.array(seqs).reshape(-1, LOOKBACK, 1)  # (samples, timesteps, features)
y = np.array(targets)                        # (samples,)
idxs = pd.DatetimeIndex(idxs)

# ────────── 5. TRAIN/TEST SPLIT ──────────
n_samples = len(X)
split     = int(n_samples * (1 - TEST_RATIO))

X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
idx_train, idx_test = idxs[:split], idxs[split:]

# ────────── 6. DEFINE LSTM MODEL ──────────
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(LOOKBACK, 1)),
    Dropout(0.2),
    LSTM(50),
    Dropout(0.2),
    Dense(1)
])
model.compile(optimizer="adam", loss="mse")

# ────────── 7. TRAIN WITH EARLY STOPPING ──────────
es = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=[es],
    verbose=1
)

# ────────── 8. PREDICT & FLATTEN ──────────
y_pred_raw = model.predict(X_test)
print(f"Raw predictions shape: {y_pred_raw.shape}")
y_pred = y_pred_raw.flatten()  # Ensure 1D array
print(f"Flattened predictions shape: {y_pred.shape}")

# ────────── 9. BUILD SIGNALS ──────────
# Align raw closes with our sequences:
close_seq  = close_arr[LOOKBACK : -1]   # drop first LOOKBACK and last bar
close_test = close_seq[split:]          # test portion

# Verify both are 1-D and same length
print("Test len:", len(idx_test), 
      "predictions:", y_pred.shape, 
      "closes:", close_test.shape)

# Ensure all arrays have matching lengths
min_len = min(len(idx_test), len(y_pred), len(close_test))
idx_test_aligned = idx_test[:min_len]
y_pred_aligned = y_pred[:min_len]
close_test_aligned = close_test[:min_len]

print(f"Aligned lengths - idx: {len(idx_test_aligned)}, pred: {len(y_pred_aligned)}, close: {len(close_test_aligned)}")

signals = pd.DataFrame({
    "Close":    close_test_aligned.squeeze(),
    "pred_ret": y_pred_aligned
}, index=idx_test_aligned)

entries = signals["pred_ret"] > 0
exits   = ~entries  # exit whenever pred_ret ≤ 0

# ────────── 10. BACKTEST WITH VECTORBT ──────────
pf = vbt.Portfolio.from_signals(
    close=signals["Close"],
    entries=entries,
    exits=exits,
    init_cash=100_000,
    fees=0.001,
    freq="5T"
)

# ────────── 11. PERFORMANCE & PLOT ──────────
print("\n=== Performance Summary ===")
print(f"Total Return      : {pf.total_return() * 100:.2f}%")
print(f"Annualized Return : {pf.annualized_return() * 100:.2f}%")
print(f"Sharpe Ratio      : {pf.sharpe_ratio():.2f}")
print(f"Max Drawdown      : {pf.max_drawdown() * 100:.2f}%")

pf.plot_cash_flow().show()



YF.download() has changed argument auto_adjust default to True



Fetched 4642 bars from 2025-04-11 to 2025-07-09
Epoch 1/20



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 17ms/step - loss: 0.0017 - val_loss: 9.1951e-06
Epoch 2/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 1.9509e-04 - val_loss: 4.5775e-06
Epoch 3/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 1.0134e-04 - val_loss: 5.7038e-06
Epoch 4/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 7.6430e-05 - val_loss: 6.2807e-06
Epoch 5/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - loss: 6.1031e-05 - val_loss: 6.1444e-06
Epoch 6/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 4.8214e-05 - val_loss: 7.0878e-06
Epoch 7/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 3.9945e-05 - val_loss: 3.7381e-06
Epoch 8/20
[1m58/58[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - loss: 3.2311e-05 - val_loss: 2.8209e-06


'T' is deprecated and will be removed in a future version. Please use 'min' instead of 'T'.



ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

In [22]:
import nbformat
print(nbformat.__version__)  # should be ≥ 4.2.0
import plotly.io as pio
pio.renderers.default = "vscode"

5.10.4
