Edgar Crawler

Script to extract fund holdings from EDGAR given ticker or CIK, outputting content found from tabular data in crawled docs as TSV in within /results subfolder. Has basic CLI for running the crawling with optional filter text.

The crawled URL essentially comes from https://www.sec.gov/edgar/searchedgar/companysearch.html. Data normalization is done for consistency, but the only property the table extraction looks for is for a table class "tableFile". In the future this should be looked into so that all possible tables can be extracted and checked for meaningful data, by adding extra steps in checking the parsed text for more tables.

Run the script with arguments --id TICKER_CIK and --filter (optional) FILTER_STRING to have Edgar Crawler only extract documents with titles that contain the text passed.

Pre-requisites

Beautiful Soup 4
LXML
html5lib

pip3 install -r requirements.txt

Examples

python edgar_cik_crawler.py -i 0001068833

python edgar_crawler_cik.py -i 0001166559 -f 13F

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
edgar_cik_crawler.py		edgar_cik_crawler.py
output_screenshot_sample.png		output_screenshot_sample.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edgar Crawler

Pre-requisites

Examples

Sample output screenshot

About

Releases

Packages

Languages

jddunn/edar-cik-crawler

Folders and files

Latest commit

History

Repository files navigation

Edgar Crawler

Pre-requisites

Examples

Sample output screenshot

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages