<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Getting-Started" data-toc-modified-id="Getting-Started-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Getting Started</a></span><ul class="toc-item"><li><span><a href="#Configuration" data-toc-modified-id="Configuration-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Configuration</a></span></li><li><span><a href="#Download-the-Master-Indexes" data-toc-modified-id="Download-the-Master-Indexes-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Download the Master Indexes</a></span></li><li><span><a href="#Check-Download-Plan" data-toc-modified-id="Check-Download-Plan-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Check Download Plan</a></span></li></ul></li><li><span><a href="#Multiple-Filings" data-toc-modified-id="Multiple-Filings-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Multiple Filings</a></span></li><li><span><a href="#Downloading" data-toc-modified-id="Downloading-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Downloading</a></span></li><li><span><a href="#SEC-Server-Timing" data-toc-modified-id="SEC-Server-Timing-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>SEC Server Timing</a></span></li></ul></div>

In [1]:
from EDGARConnect import EDGARConnect

# Getting Started

Instantiate an EDGARConnect object and tell it the path you want to write all the output to

In [2]:
edgar = EDGARConnect(edgar_path = 'F:/Python Projects/Perplexity and Readability/')

Print the object to check the configuration status

In [3]:
print(edgar)

SEC Edgar Scraper for Python, v0.0
Files to be scraped have NOT been defined.
Choose scraping targets using the configure_downloader() method


## Configuration

Call the configure_downloader() method to tell it which forms and date ranges you are interested in. end_date = None tells it to go up to the present day.

In [4]:
edgar.configure_downloader(target_forms='10-K', start_date='2020-01-01', end_date=None)

In [5]:
print(edgar)

SEC Edgar Scraper for Python, v0.0
EDGARConnect is configured for scraping.
	 Target Forms: 10-K
	 Date Range: 2020Q1 to 2021Q3



## Download the Master Indexes

EDGARConnect first downloads all the SEC master indexes to your HDD. To do this, use the download_master_indexes() method. These files are quarterly pipe-delimited tables of URLs to corporate filings. By default, EDGARConnect will update the 2 most recent quarters every time you run download_master_indexes(), but you can modify this behavior by passing parameters. 

In [6]:
edgar.download_master_indexes(update_range = 0, update_all = False)



## Check Download Plan

After the master lists are downloaded, EDGARConnect can download everything you request from the SEC archive. You can show the download plan using the show_download_plan() method. This is important because the number of filings is quite surprising... it's nice to know what you're signing up for.

In [7]:
edgar.show_download_plan()

EDGARConnect is prepared to download 1 types of filings between 2020Q1 and 2021Q3
	Number of 10-Ks: 13039
	Total files: 13039
Estimated download time, assuming 1s per file: 0 Days, 3 hours, 37 minutes, 19 seconds
Estimated drive space, assuming 150KB per filing: 1.96GB


# Multiple Filings

EDGARConnect has a built-in list of common filing groups you can pass into the configure_downloader() method. To see this groups, use the show_available_forms() method.

In [8]:
edgar.show_available_forms()

Available forms:
f_10k -> ['10-K', '10-K405', '10KSB', '10-KSB', '10KSB40']
f_10ka -> ['10-K/A', '10-K405/A', '10KSB/A', '10-KSB/A', '10KSB40/A']
f_10kt -> ['10-KT', '10KT405', '10-KT/A', '10KT405/A']
f_10q -> ['10-Q', '10QSB', '10-QSB']
f_10qa -> ['10-Q/A', '10QSB/A', '10-QSB/A']
f_10qt -> ['10-QT', '10-QT/A']
f_10x -> ['10-K', '10-K405', '10KSB', '10-KSB', '10KSB40', '10-K/A', '10-K405/A', '10KSB/A', '10-KSB/A', '10KSB40/A', '10-KT', '10KT405', '10-KT/A', '10KT405/A', '10-Q', '10QSB', '10-QSB', '10-Q/A', '10QSB/A', '10-QSB/A', '10-QT', '10-QT/A']


So if you want everything in the 10-Q family, you can configure it like this:

In [9]:
edgar.configure_downloader(edgar.forms['f_10q'], start_date='2020-01-01')

In [10]:
print(edgar)

SEC Edgar Scraper for Python, v0.0
EDGARConnect is configured for scraping.
	 Target Forms: ['10-Q', '10QSB', '10-QSB']
	 Date Range: 2020Q1 to 2021Q3



In [11]:
edgar.show_download_plan()

EDGARConnect is prepared to download 3 types of filings between 2020Q1 and 2021Q3
	Number of 10-Qs: 25332
	Number of 10QSBs: 0
	Number of 10-QSBs: 0
	Total files: 25332
Estimated download time, assuming 1s per file: 0 Days, 7 hours, 2 minutes, 12 seconds
Estimated drive space, assuming 150KB per filing: 3.80GB


# Downloading

When you're ready to go, use the download_requested_filings() method to start grabbing stuff. It will always check if a file already exists and skip it if it does, so this should be somewhat robust to starts and stops.

In [12]:
# edgar.download_requested_filings()

# SEC Server Timing

The SEC requests that users bulk download only between 9PM and 6AM EST. By default, EDGARConnect will help you check if it's a good time to download and raise an error if it's not. It will also perform this check periodically while downloads are going on (it does it every time a new batch of forms is selected for download).

To disable this behavior, pass ignore_time_guidelines = True to the download_requested_filings() method.

In [13]:
# edgar.download_requested_filings(ignore_time_guidelines = True)