We need a solid test / train / predict / backtest pipeline in place. 

Which means we need a strategy. Let's do something brutally, impishly simple to start. Let's stick with the vibes thinking, and implement an ETF end of day strategy. Training prompts will be a list of headlines. For training responses, we'll take the best performing broad market levered ETF from close to open, and that will be the expected response.

I've always been a bit beffudled by overnight levered ETF pricing, what is it tracking? Who is buying? Why am I not?

First, let's come up with the "buy" signals for each day, which we'll calculate as the best performing of the pool from close to open on that day.

In [6]:
%run 02-markets.ipynb


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
import pandas as pd

def best_overnight_performance(ticker_days, date):
    best_performance = float('-inf')
    best_ticker = None
    
    for ticker, df in ticker_days.items():
        # Ensure the date exists in the DataFrame
        if date in df.index:
            # Get the overnight return for the given date
            overnight_return = df.loc[date, 'Overnight_Return']
            if overnight_return > best_performance:
                best_performance = overnight_return
                best_ticker = ticker
    
    return best_ticker

def calculate_winner_df(ticker_days, start, end):
    # calculate the best overnight performer for every day since 2022-01-01 and put this in a DataFrame
    results = []
    for date in pd.date_range(start, end):
        best_ticker = best_overnight_performance(ticker_days, date)
        if best_ticker:  # Ensure there is a best ticker for the date
            overnight_return_percentage = ticker_days[best_ticker].loc[date, 'Overnight_Return'] * 100
            results.append({'Date': date, 'Ticker': best_ticker, 'Overnight_Return_Percentage': overnight_return_percentage})

    results_df = pd.DataFrame(results)
    return results_df

def print_winner_stats(results_df):
    print('Frequency of best overnight performers')
    # print the distribution of best overnight performers
    print(results_df['Ticker'].value_counts())

    print('\nChecking dates with a return less than zero')
    # print the dates with a return less than zero
    print(results_df[results_df['Overnight_Return_Percentage'] < 0])
    print(f'\nAverage overnight return: {results_df["Overnight_Return_Percentage"].mean():.2f}%')

In [8]:
tickers = [
    'TQQQ',
    'SPXL',
    'UDOW',
    'SQQQ',
    'SPXS',
    'SDOW'
]

ticker_days = {ticker: fetch_dailies(ticker) for ticker in tickers}
now = pd.Timestamp.now()
results_df = calculate_winner_df(ticker_days, '2022-01-01', now)
print_winner_stats(results_df)

Frequency of best overnight performers
Ticker
SQQQ    188
TQQQ    164
UDOW     63
SDOW     57
SPXS     39
SPXL     31
Name: count, dtype: int64

Checking dates with a return less than zero
          Date Ticker  Overnight_Return_Percentage
367 2023-06-21   SPXS                    -0.205321

Average overnight return: 2.83%


This isn't enough coverage. Let's extend the window to the beginning of 2020 and see if we get a more balanced distribution.

In [9]:
results_df = calculate_winner_df(ticker_days, '2020-01-01', now)
print_winner_stats(results_df)

Frequency of best overnight performers
Ticker
TQQQ    334
SQQQ    283
UDOW    170
SDOW    141
SPXS     63
SPXL     56
Name: count, dtype: int64

Checking dates with a return less than zero
          Date Ticker  Overnight_Return_Percentage
872 2023-06-21   SPXS                    -0.205321

Average overnight return: 4.27%


~~Slightly better. We'll take it.~~

To keep things simple, let's just use TQQQ and SQQQ as our initial expected responses. I don't expect a smaller model to be able to predict the less common winners, so we'll train a binary TQQQ/SQQQ picker.

In [10]:
tickers = [
    'TQQQ',
    'SQQQ'
]
ticker_days = {ticker: fetch_dailies(ticker) for ticker in tickers}
results_df = calculate_winner_df(ticker_days, '2020-01-01', now)
print_winner_stats(results_df)

Frequency of best overnight performers
Ticker
TQQQ    565
SQQQ    482
Name: count, dtype: int64

Checking dates with a return less than zero
          Date Ticker  Overnight_Return_Percentage
124 2020-06-30   TQQQ                    -0.021659
265 2021-01-21   SQQQ                    -1.173881
502 2021-12-29   TQQQ                    -0.023344
581 2022-04-22   SQQQ                    -0.011962
722 2022-11-11   SQQQ                    -0.042481
730 2022-11-23   TQQQ                    -0.090621
810 2023-03-22   SQQQ                    -0.565058
872 2023-06-21   SQQQ                    -0.306592
999 2023-12-20   TQQQ                    -1.058408

Average overnight return: 2.78%


A sizeable drop in the profile of the returns when only using TQQQ and SQQQ. The holy grail model would be able to predict the less common winners. Looks like we'll be training both a binary TQQQ/SQQQ picker and a broader market model.



In [None]:
TODO: Save targets for TQQQ/SQQQ model
TODO: Save targets for broader market model