MagicFormula

Any info is up-to-date as of 2023-09-23

Version 1.1

Changes:

Reduced stock validation time by >75% by getting more info from nasdaq_stocks.csv (see below for URL) instead of web scraping
Other fixes

WARNING: Lists of stocks were found on https://www.nasdaq.com/market-activity/stocks/screener, clicking "Download CSV", and saving that files as nasdaq_stocks.csv. This file is NOT automatically updated by this script. This means the following info is not updated automatically: market cap, volume, country, sector, and industry. This won't matter too much in the short term (since they don't change THAT much), but you might want to redownload it every once in a while.

A program to implement the Magic Formula from the book "The Little Book that Still Beats the Market" by Joel Greenblatt. The book is pretty short, and it's actually a really fun read, so go check it out if you want the exact details about his "Magic Formula". I know the list of stocks are already available on his website https://magicformulainvesting.com, but the top stocks are listed in alphabetical order. I wanted to know the exact rankings, so here we are.

Stock data is retrieved using the YahooFinancials library (unofficial, so it uses web scraping).

Lists of stocks were found on https://www.nasdaq.com/market-activity/stocks/screener, clicking "Download CSV", and saving that files as nasdaq_stocks.csv. This file is NOT automatically updated by this script. The market cap, volume, country, sector, and industry information were also in the CSV file. I could get most information automatically from yahoo finance web scraping, but I decided the time saving from manually downloading the file is worth it, as web scraping the information for every stock takes a very, very long time. The old way I found the stock tickers was going to http://ftp.nasdaqtrader.com/, in Symbol Directory, and downloading nasdaqlisted.txt and otherlisted.txt. That method did not get the other information I listed, so it required web scraping from yahoo finance.

It then calculates Return on Capital and Earnings Yield, as specified by Joel Greenblatt. Some of the exact metrics he used were not available on Yahoo Finance, so I had to do some Googling to figure out what the equivalent metrics were (for example, I found that Short Term Debt is also known as Current Liabilities). He also does not reveal some of his exact calculation methods, such as for Net Working Capital, so I did my best based on this site. If things are wrong, let me know. Stocks below a certain market cap are then filtered out (as he suggests to do), and ranked based on Return on Capital and Earnings Yield.

Balance Sheet metrics use the most recent Quarter's values. Income statement metrics use the TTM, based on Quarterly data. Stocks are also filtered out for Market Cap and Average Dollar Volume (although the calculation is rough, since it only uses the current share price)

Getting started

Here is how to download the source code, install all dependencies, and run the program:

git clone https://github.com/nblade66/MagicFormula.git
cd MagicFormula
pip install -r requirements.txt
python magic_formula.py

Configuration

There are six flags to be aware of:

-r Retrieves the data (balance_sheet, income_statement, market_cap) of all the stocks listed in the "ticker_list" list. Uses 10 threads. This will take a really long time, since it uses web scraping. As such, I implement retrieval in batches, which are then saved to the JSON file. If no flag, then data will load from the JSON file
-t Retrieves the ticker_list using the file nasdaq_stocks.csv; if flag isn't used, ticker_list and sector_dict will load from the JSON file
-c Continues retrieving the data of the stocks that aren't in the JSON files, but are in the ticker_list. Generally used if for some reason retrieval was interrupted.
-mc Allows for multiprocessing (or multi-core) to fetch data from Yahoo Finance web scraping faster. Only applies to continued retrieval. An integer specifies how many processes should be run
--validate Validates the tickers based on Market Cap and Average Dollar Volume (and also updates Market Cap Data on valid tickers)

How to Use

Anytime you run python magicformula.py, a CSV file with magic formula ranks will be generated, regardless of flags. Of course, if you run it without retrieving new data, the ranks generated will also be the same.

First thing to do is go to https://www.nasdaq.com/market-activity/stocks/screener, click "Download CSV", and save that files as nasdaq_stocks.csv. This file contains a list of tickers, as well as the following info is not updated automatically: market cap, volume, country, sector, and industry.

On the first time running, usually run python magicformula.py -t -r to refresh the ticker list, then retrieve all the data. This will also remove tickers with missing data from the ticker list, and validate tickers (set tickers that don't meet the market cap and dollar average volume criteria to "invalid"). You should rarely use the -t flag after this because it will cause run time to take unnecessarily long.

On subsequent runs, python magicformula.py -r will retrieve updated values for only valid tickers. This is to speed up retrieval. Because it's possible for average dollar cost volumes and market caps to change, use python magicformula.py --validate -r to also recheck if tickers are valid (and thus, potentially retrieve data from newly valid tickers).

Things to be aware of:

The ticker_list used to debug can just be any Python list; when actually running the code, make sure to use the -t flag to get a refreshed list of tickers.
Running on ~10,000 could take 6+ hours the first time, as it filters through stocks
Any information parsed from nasdaq_stocks.csv is NOT updated automatically, so you'll need to go to the website and update the file yourself.

Some things I'm working on:

Increasing speed of data fetching
Updating Market Caps using Threading
Re-testing a good batch size after Yahoo API changes
An option to re-retrieve tickers that have missing data to double check if data is actually missing, or if there was an error during retrieval
Dealing with WARNING:root:yahoofinancials ticker: <TICKER> error getting income - <class 'yahoofinancials.etl.ManagedException'>; I think this has to do with an empty financial statement; I should check this
Currently, balance sheet values are taken from the last index, which is assumed to be the most current quarter. Ideally, it instead looks at all the dates on the balance sheet and takes the most recent one.
Changing the "Continued Refresh" option to use threading instead of multiprocessing. There just isn't any need for multiprocessing
Updating the nasdaq_stocks.csv file automatically from https://www.nasdaq.com/market-activity/stocks/screener.
Adjust Income and balance sheet data for the currency (e.g. TSM is being reported in TWD, when market cap and price info is all in USD). I think I can use yh.get_stock_price_data() to get the following arguments:
- THIS CHANGE IS NO LONGER NECESSARY; the bug was not in the currency, but rather in Yahoo Finance's TSM data
- regularMarketPrice
- regularMarketVolume
- marketCap
- currency <- this is the one to check if it returns the name of the currency
I need to check what the above arguments return. It is possible that I could speed up the data retrieval process if I just use one yh.get_stock_price_data() call, then use the arguments to get whatever data I need from the price_data dict that the function returns.

Name	Name	Last commit message	Last commit date
Latest commit Nathan Hsu and Nathan Hsu Updates README and sets nasdaq_stocks.csv file name to a variable Feb 13, 2024 c9f6e79 · Feb 13, 2024 History 76 Commits
tests	tests	Fixes currency test	Dec 26, 2023
LICENSE	LICENSE	Add license file	May 4, 2021
README.md	README.md	Updates README and sets nasdaq_stocks.csv file name to a variable	Feb 13, 2024
magic_formula.py	magic_formula.py	Updates README and sets nasdaq_stocks.csv file name to a variable	Feb 13, 2024
market_cap_info.json	market_cap_info.json	Updates stock information JSON files.	Apr 10, 2021
nasdaqlisted.txt	nasdaqlisted.txt	Implements getting all stock tickers from text files. Adds text files…	Apr 6, 2021
otherlisted.txt	otherlisted.txt	Implements getting all stock tickers from text files. Adds text files…	Apr 6, 2021
quarterly_balance_sheet.json	quarterly_balance_sheet.json	Updates formulas to use quarterly financial statements instead of annual	Oct 14, 2022
quarterly_income_statement.json	quarterly_income_statement.json	Updates formulas to use quarterly financial statements instead of annual	Oct 14, 2022
requirements.txt	requirements.txt	Begins adding the use of yfinance library to get Yahoo Finance data	Dec 24, 2023
sector_info.json	sector_info.json	Optimizes retrieving stock profile info and updates database to inclu…	Apr 12, 2021
stock_info.csv	stock_info.csv	Updates the csv and database files to include sector, industry, and c…	Apr 12, 2021
stock_info.db	stock_info.db	Updates the csv and database files to include sector, industry, and c…	Apr 12, 2021
ticker_dict.json.old	ticker_dict.json.old	BUG FIX: valid tickers were 0 when not using --validate flag. Caused …	Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MagicFormula

Getting started

Configuration

How to Use

About

Releases

Packages

Contributors 2

Languages

License

nblade66/MagicFormula

Folders and files

Latest commit

History

Repository files navigation

MagicFormula

Getting started

Configuration

How to Use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages