Skip to content

gaulinmp/edgar_shortcourse

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Scraping EDGAR Short Python Course

This course was designed off the top of my head for a class of 3 students that met once a week for three weeks (i.e. all design decisions were made accordingly). Each week's meeting was 45 minutes, mostly explaning the intent of the next week's lesson. Students were expected to go through the notebook on their own time, get familiar with the concepts, and execute the 'homework' on their own.

Lessons

1: Installation

Walk through installing Python on your computer, with optional software/packages to install.

Outline:

  1. Installing python
  2. Installing git
  3. Installing VSCode
  4. Installing pyEDGAR

2: Python

Walk through the basics of Python, and end with extracting simple count data from text with regular expressions.

Outline:

  1. Syntax basics (strings, variables, lists, dicts, etc.)
  2. Program control logic (if statements, for loops, etc.)
  3. Functions
  4. Reading files
  5. Regular expressions
  6. Homework on analysing text data (answers)

3: Scraping

Introduce EDGAR, and the library to download/analyze EDGAR filings (pyEDGAR). View the data, introduce the basics of HTML (BeautifulSoup), typical filing format, and extracting data from the DOM.

Outline:

  1. EDGAR (and pyEDGAR to interact with it)
  2. Filing formats
    1. Plaintext
    2. HTML
  3. Homework on analysing HTML documents (answers)

3: Bulk Scraping

Introduce DataFrames, and looping over them. Provide simple scraping loop structure for convenience. Close with example of parallelization using ipyparallel.

Outline:

  1. DataFrames
  2. Looping thereover
  3. Scraping loop framework
  4. Result aggregation and saving to disk
  5. Parallelization example

About

Scraping EDGAR Short Python Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published