# Scanner Development Notebook

This notebook contains development of the scanner for 5 minute pullback.

The idea behind the scanner is the following:<br>
1. In the end of each 5 minute period(exact time depends on the speed of the execution), scanner collects the data for all the stocks that fall under the rules of the strategy(based on market cap, float, price etc.)<br>
2. Then the code filters downloaded stocks for the ones that are "tradable", based on volume. Threshold is to be specified<br>
3. It then outputs the list of stocks with potential pullbacks, based on defined filters. The filters are neither too strict nor too loose, so that the trader is not overwhelmed by the number of offerings, but the scanner is able to catch as much valid moves as possible.
4. Further logic to automatically draw charts, save approved pullbacks etc is to be defined.<br>

## Yfinance limitations and options to overcome them

There are 3121 valid stocks on the market. Due to the limitation of yfinance library, which is a scraper essentially, this number is too high to be processed by the scanner for every 5 minute candle.
<br>
<br>
Therefore, there is a need to further narrow down the type of stocks that are processed by the scanner.
<br>
<br>
Possibilities are:

1. Price(smaller upper boundary). This will make stock selection closer to 1 minute momentum strategy, that this strategy was derived from
2. Average volume(some stocks that are actively traded today byt have low average volume may be lost, but it would cut the number of stocks quite good)
<br>

The best way to analyse this is to follow the strategy for some time and get parameters of the stocks that are suitable. It will give a picture of what the filters can be.
<br>
For the testing purposes it is sufficient to restrict the number of stocks scanned to potentially the best ones in order to test the flow of the strategy during the trading session. When more ata is collected, optimized version can be used.

## Imports

In [1]:
import yfinance as yf
import pandas as pd
import time
import pytz
import gspread
from gspread_dataframe import get_as_dataframe
from oauth2client.service_account import ServiceAccountCredentials

%cd ..

/Users/ivanosipchyk/dev/investing/5-min-pullback


A little bit of TODOs, because it is getting messy.<br>
1. Create Google Sheet with all the trades that I make to import them and use for analysis - DONE
2. Define criteria to narrow down number of stocks for the scanner to analyze - IN PROGRESS. Long until is DONE but the progress is good
3. Check all the trades that I made and find other pullbacks that I missed, to define the shape of the candle and other filters for the scanner

## Read Google Sheet

In [8]:
scopes = [
    "https://www.googleapis.com/auth/spreadsheets",
    "https://www.googleapis.com/auth/drive"
]

creds = ServiceAccountCredentials.from_json_keyfile_name("credentials/google_sheet_credentials.json", scopes)

client = gspread.authorize(creds)
sheet = client.open("5 Minute Pullback Ledger").worksheet("DAS Report Formatted")
ledger = get_as_dataframe(sheet, evaluate_formulas=True)


In [9]:
ledger.head()

Unnamed: 0,Symbol,Entry,Exit,Qty,P/L per Share,Gross P/L,Comm,Ecn Fee,Other Fees,Net P/L,...,Date,Setup,News,Company Type,L2,Daily Volume,Entry Candle Volume,Outcome,Strategy Followed,Comments
0,IRBT,4.945,5.01,10.0,0.065,0.65,0.1,0.06,0.05,0.44,...,7/21/2025,,,,,,,,,
1,NVTS,8.77,9.03,10.0,0.26,2.6,0.1,0.03,0.06,2.41,...,7/21/2025,,,,,,,,,
2,OPEN,3.76,3.89,10.0,0.13,1.3,0.15,0.015,0.08,1.055,...,7/21/2025,,,,,,,,,
3,FFAI,2.435,2.65,10.0,0.215,2.15,0.1,0.03,0.05,1.97,...,7/22/2025,,,,,,,,,
4,DNUT,5.84,5.62,10.0,-0.22,-2.2,0.1,0.06,0.05,-2.41,...,7/23/2025,,,,,,,,,


In [10]:
traded_stocks = ledger['Symbol'].unique().tolist()
not_yet_in_ledger = ['RILY', 'BE', 'LIDR', 'FFAI', 'SRFM']
traded_stocks = traded_stocks + not_yet_in_ledger

## Retrieve Classification Results

In [11]:
classification_result = pd.read_csv('scanner/all_symbols_results.csv')

## Analyze Data

In [20]:
# TODO: renew all symbol results each day and map to the trades

In [14]:
# select classification for traded stocks
traded_classified = classification_result[classification_result['Ticker'].isin(traded_stocks)]

In [24]:
# explore market cap, price and volume
summary = traded_classified[['currentPrice', 'averageDailyVolume10Day', 'marketCap']].describe().loc[['mean', 'min', 'max']]

# Format the table nicely
summary_formatted = summary.style.format({
    'currentPrice': '${:,.2f}',
    'averageDailyVolume10Day': '{:,.0f}',
    'marketCap': '{:,.0f}'
})
summary_formatted

Unnamed: 0,currentPrice,averageDailyVolume10Day,marketCap
mean,$5.54,47415449,976638287
min,$1.13,780410,22157604
max,$26.72,504748910,6205158400


## Filter Data for Scanner

In [33]:
valid_stocks = classification_result[(classification_result['marketCapLabel'] == 'small') & (classification_result['priceRangeLabel'] == 'in')]
print(f'Number of stocks after first filtering: {len(valid_stocks)}')

Number of stocks after first filtering: 3121


In [34]:
# try price filter
max_price_offset = 1.2
price_filtered = valid_stocks[valid_stocks['currentPrice'] < summary['currentPrice'].loc['max'] * max_price_offset]
print(f'Number of stocks after filtering by price: {len(price_filtered)}')

Number of stocks after filtering by price: 2837


In [35]:
# try average volume filter
min_avg_volume_offset = 0.8
avg_volume_filtered = valid_stocks[valid_stocks['averageDailyVolume10Day'] > summary['averageDailyVolume10Day'].loc['min'] * min_avg_volume_offset]
print(f'Number of stocks after filtering by average volume: {len(avg_volume_filtered)}')

Number of stocks after filtering by average volume: 829


In [36]:
# try price and average volume filters
price_avg_volume_filtered = valid_stocks[
    (valid_stocks['Ticker'].isin(price_filtered['Ticker'].unique())) &
    (valid_stocks['Ticker'].isin(avg_volume_filtered['Ticker'].unique()))
]
print(f'Number of stocks after filtering by price and average volume: {len(price_avg_volume_filtered)}')

Number of stocks after filtering by price and average volume: 807


In [32]:
test_download = price_avg_volume_filtered['Ticker'].dropna().tolist()
start_time = time.time()

df = yf.download(test_download, period='1d', interval='5m', group_by="ticker", progress=False, threads=True, ignore_tz=True)

end_time = time.time()

print(f'Elapsed time: {end_time-start_time:.2f} seconds')
print(f'Elapsed time per symbol: {(end_time-start_time)/len(test_download):.2f} seconds')

  df = yf.download(test_download, period='1d', interval='5m', group_by="ticker", progress=False, threads=True, ignore_tz=True)

13 Failed downloads:
['SATX', 'LTRY', 'CLBR', 'HYAC', 'RDUS', 'EYEN', 'EVRI', 'SRM', 'INZY', 'DADA', 'RDFN', 'RGLS', 'KIND']: YFPricesMissingError('possibly delisted; no price data found  (period=1d)')


Elapsed time: 24.90 seconds
Elapsed time per symbol: 0.03 seconds


With 800 stocks, processing time is 25 seconds, which is enough to run in the last minute of 5 minute candle and identify potential setups.<br>
Time may increase with adding filtering, but it shouldn't be the problem.<br>
It is also possible to run another scanner, once in 15 minutes, for a higher subset of stocks, that will identify those with a good volume today. This will reduce the work for the main scanner. Although more data needs to be collected to identify what is volume threshold

In [38]:
def download_data(symbols, period='5d', interval='5m', batch_size=500, delay=1):
    final_df = pd.DataFrame()

    for i in range(0, len(symbols), batch_size):
        batch_symbols = symbols[i:i + batch_size]
        print(f'Processing batch {i // batch_size + 1}/{(len(symbols) - 1) // batch_size + 1}')

        try:
            batch_df = yf.download(batch_symbols, period=period, interval=interval, group_by="ticker", progress=False, threads=True, ignore_tz=True)

            if batch_df.empty:
                continue

            batch_df_stacked = batch_df.stack(level=0).stack(level=0).reset_index()
            batch_df_stacked.columns = ['Datetime', 'Symbol', 'PriceType', 'Price']

            final_df = pd.concat([final_df, batch_df_stacked])
        
        except Exception as e:
            print(f"Error in batch {i // batch_size + 1}: {e}")
            continue

        time.sleep(delay)

    return final_df
