Skip to content

gaulinmp/edgar_shortcourse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scraping EDGAR Short Python Course

This course was designed off the top of my head for a class of 3 students that met once a week for three weeks (i.e. all design decisions were made accordingly). Each week's meeting was 45 minutes, mostly explaning the intent of the next week's lesson. Students were expected to go through the notebook on their own time, get familiar with the concepts, and execute the 'homework' on their own.

Lessons

Walk through installing Python on your computer, with optional software/packages to install.

Outline:

  1. Installing python
  2. Installing git
  3. Installing VSCode
  4. Installing pyEDGAR

Walk through the basics of Python, and end with extracting simple count data from text with regular expressions.

Outline:

  1. Syntax basics (strings, variables, lists, dicts, etc.)
  2. Program control logic (if statements, for loops, etc.)
  3. Functions
  4. Reading files
  5. Regular expressions
  6. Homework on analysing text data (answers)

Introduce EDGAR, and the library to download/analyze EDGAR filings (pyEDGAR). View the data, introduce the basics of HTML (BeautifulSoup), typical filing format, and extracting data from the DOM.

Outline:

  1. EDGAR (and pyEDGAR to interact with it)
  2. Filing formats
    1. Plaintext
    2. HTML
  3. Homework on analysing HTML documents (answers)

Introduce DataFrames, and looping over them. Provide simple scraping loop structure for convenience. Close with example of parallelization using ipyparallel.

Outline:

  1. DataFrames
  2. Looping thereover
  3. Scraping loop framework
  4. Result aggregation and saving to disk
  5. Parallelization example

About

Scraping EDGAR Short Python Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published