In [1]:
from utils import remove_extended_hours, get_market_dates, get_tickers, get_data, first_trading_date_after_equal
from datetime import datetime, date, timedelta
import mplfinance as mpf
import pandas as pd
import numpy as np
import os
import pyarrow as pa
import pyarrow.parquet as pq
import ast
DATA_PATH = "../../../data/polygon/"

# 0. Parquet Files



# 1. Simulating screeners
In backtesting, we should be able to simulate what a screener would have given, else we would have to simulate all stocks simultaneously. Because this is a computationally expensive thing to do, the results are saved in the <code>processed/cache/</code> folder. For the short-term (<1d), simulated scans should match real scans. I will use the IBKR TWS scanner to determine what is possible. The screen can be combined (just like with real screens). 

A function for calculating the most liquid N non-ADR stocks. This is to simulate the Russell or S&P indices. Of course there will be differences, but I care more about liquidity than whether it is precisely in an index or not.

In [None]:
def calculate_top_n_liquid(n=3000, start = date(2000, 1, 1), end = date(2100, 1, 1)):
    tickers = get_tickers()
    tickers = tickers[tickers['type'] == "CS"]
    data = pd.DataFrame(index=get_market_dates())
    IDs = []

    quarterly_all = pd.DataFrame()
    for index, id in enumerate(tickers['ID']):
        bars = get_data(id, columns=['volume', 'close'], start=start, end=end)
        quarterly = bars.resample('Q').agg({'close': 'last',
                                'volume': 'sum'})
        quarterly['turnover'] = quarterly['volume'] * quarterly['close']
        quarterly = quarterly.rename(columns={'turnover': id}).drop(columns=['volume', 'close'])
        quarterly_all = quarterly_all.merge(quarterly[id], how='outer', left_index=True, right_index=True)
        #print(index)
        
        # Avoids defragmentation, increasing performance. Without this it would take more than 4x longer.
        if index % 100 == 0:
            quarterly_all = quarterly_all.copy()

    for _, row in quarterly_all.copy().iterrows():
        IDs.append(row[row.notna()].nlargest(n).index.tolist()) # Calculate largest N stocks, append to 'IDs'
        
    quarterly_all['IDs'] = IDs
    quarterly_all = quarterly_all[['IDs']] # Get only the IDs
    quarterly.index = quarterly.index + timedelta(days=1) # Get first day of new month to avoid look-ahead bias
    quarterly.index = map(first_trading_date_after_equal, quarterly.index.date) # Get first trading day of new month

    data = data.merge(quarterly_all, how='left', left_index=True, right_index=True) # Convert to daily
    data = data.fillna(method='ffill')
    data.index = pd.to_datetime(data.index)

    data.to_csv(DATA_PATH + f'processed/cache/top_{n}_liquid.csv')
    return

def get_top_n_liquid(day, n=3000):
    if os.path.isfile(DATA_PATH + f'processed/cache/top_{n}_liquid.csv'):
        data = pd.read_csv(DATA_PATH + f'processed/cache/top_{n}_liquid.csv', index_col=0)
        data.index = pd.to_datetime(data.index).date
        if data.index[-1] < day:
            data = calculate_top_n_liquid(n)
    else:
        calculate_top_n_liquid(n)
        data = pd.read_csv(DATA_PATH + f'processed/cache/top_{n}_liquid.csv', index_col=0)
        data.index = pd.to_datetime(data.index).date
    
    return ast.literal_eval(data.loc[day, 'IDs'])

In [10]:
data = get_top_n_liquid(date(2023, 8, 25), n=500)
print(data[:5])

['TSLA-2019-01-01', 'NVDA-2019-01-01', 'AAPL-2019-01-01', 'MSFT-2019-01-01', 'AMD-2019-01-01']


In [None]:
# Top losers %

In [None]:
# Pre-market liquid (premarket open to 9:25)
# Uses m5 data
# DataFrame with lists

In [None]:
# Intraday P30 in 15-minutes with constraints
# Uses m1 data
# DataFrame with lists

# 2. Testing a mean-reversion strategy on SPY.
I will try out a popular strategy using the IBS indicator. I want to see if I can match the results on [here](https://www.quantifiedstrategies.com/internal-bar-strength-ibs-indicator-strategy/). I will also use pure pandas.

* Universe: SPY ETF
* Entry: IBS < 0.2
* Exit: IBS > 0.8
* Trading on close prices.

Although it is possible to get the exact closing price using market-on-close orders, you cannot know the value of the IBS at market close. So as an extension I will look at how the strategy has performed if the IBS is calculated using daily bars exluding the last minute.

I also want to look at the impact of taxes (36%) and inflation.

Note: I would not use the SPY for trading indices, because futures are more liquid. Even index CFDs may have lower trading costs. Also, I can only access US-domiciled ETFs from a very small amount of brokers due to stupid EU regulations.

My focus will be on:
* Single or low-asset strategies for e.g. SPY/VXX. The actual execution may not be in the ETF but in a more liquid futures.
* Screenable strategies, e.g. S3 or Gap-up short. These use the top % winners and losers. At most the backtester has to keep track of 5-10 stocks at the same time. These may use fundamentals which I will implement later.
* All systems have a expected holding time of at least 1 hour.
* As such, I do not need tick data for signal generation. However I want it to simulate realistic fills and to clean data.

I care less about:
* Mostly fundamental strategies
* Scalping: only feasible for US-based PFOF brokers. Scalping could use lower resolution bars (15s) but always use ticks to simulate fills. My backtester should always be able to handle fixed-time-interval bars.