IMPORTANT UPDATE on Apr 2, 2017: I've decided to stop maintaining this repo. The HTTP API (http://data.pystock.com) and the GitHub repo will be still accessible, only that latest data will stay at Mar 31, 2017 forever. However, I do offer paid service for maintaining this dataset. Please contact me at email@example.com.
US Stock Data
Historical data of US stocks, updated daily.
How Data are Collected
The data are collected using pystock-crawler and pystock-github. Every day before the US stock exchanges open at 9:30 EST/EDT, the crawler collects the stock prices and financial reports, and pushes the data to this repository. Thus, the latest data is of yesterday. The data are day-based, meaning that you won't find hourly or minute-level data here.
pystock-crawler crawls data from the following sources:
- NASDAQ.com for company ticker symbols
- Yahoo Finance for stock prices
- SEC EDGAR for financial reports
All data are stored in CSV and TXT files, archived with gzip. The files are
categorized and named by their created dates. For example, a file collected on
2015-03-23 is named
20150323.tar.gz placed under
Initial data are the first batch of collected data, whose date range spans from
2009-01-01 to 2015-03-20. They are split into three files:
Every gzip archive file may or may not contain the following files:
symbols.txt is a list of companies, line by line. For example:
AAPL Apple Inc. FB Facebook GOOGL Google Inc.
prices.csv contains daily prices in CSV format. Normally, in a
daily-generated file, two trading days of prices are included. This is because
when a company splits its shares, you will need to compare the close price
close) and the adjusted close price (
adj_close) of the previous trading
day to detect the split. See Yahoo's explaination
for more details.
Here's a sample of
symbol,date,open,high,low,close,volume,adj_close AAPL,2015-03-23,127.12,127.85,126.52,127.21,36761000,127.21 AAPL,2015-03-20,128.25,128.40,125.16,125.90,67941100,125.90
reports.csv contains several financial metrics extracted from 10-Q or 10-K
reports. 10-Q is a quarterly report. 10-K is an annual report. The CSV file
symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0 GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0 GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
The financial metrics included are:
end_date: the end date of the period of the filing report
amend: is the filing an amendment?
period_focus: Q1, Q2, Q3 for quarterly reports, or FY for annual reports
fiscal_year: fiscal year of the company
doc_type: 10-Q or 10-K
revenues: revenues or sales
op_income: operating income
net_income: net income or net earnings
eps_basic: basic earnings per share
eps_diluted: diluted earnings per share
assets: total assets
cur_assets: current assets
cur_liab: current liabilities
cash: cash and cash equivalents
equity: total equity
cash_flow_op: cash from operating activities
cash_flow_inv: cash from investing activities
cash_flow_fin: cash from financing activities
This "website" is built with Github Pages and is a mirror of the git repo. If you're building an application on pystock-data, http://data.pystock.com is a better choice over the git repo, since not only is HTTP API easier to use, but you will also benefit from the fast CDN provided by Github Pages.
To navigate the filesystem on the HTTP API, you just visit index file:
where you can find the index file for each year. For instance:
From there, you will find all the .tar.gz files for that year. For instance: