Skip to content

khosrogh/DC_Web_Scraping_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping in Python with Scrapy

Course Description

The ability to build tools capable of retrieving and parsing information stored across the internet has been and continues to be valuable in many veins of data science. In this repo, you will learn to navigate and parse HTML code, and build tools to crawl websites automatically. Although our scraping will be conducted using the versatile Python library Scrapy, many of the techniques you learn in this course can be applied to other popular Python libraries as well, including BeautifulSoup and Selenium. You will have a strong mental model of HTML structure, will be able to build tools to parse HTML code and access desired information, and create simple Scrapy spiders to crawl the web at scale.

Table of Contents

  1. Introduction to HTML

    • Learn the structure of HTML.
    • Web Scraping Overview
    • HTML tree wordy navigation
    • From Tree to HTML
    • Attributes
    • Keep it Classy
    • Finding href
    • Crash Course in XPath
    • Where am I?
    • It's Time to P
    • A classy span
  2. XPaths and Selectors

    • Leverage XPath syntax to explore Scrapy selectors.
    • XPathology
    • Counting Elements in the Wild
    • Body Appendages
    • Choose DataCamp!
    • Off the Beaten XPath
    • Where it's @
    • Check your Class
    • Hyper(link) Active
    • Secret Links
    • Selector Objects
    • XPath Chaining
    • Divvy Up This Exercise
    • The Source of the Source
    • Course Class by Inspection
    • Requesting a Selector
  3. CSS Locators, Chaining, and Responses

    • Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath.
    • From XPath to CSS
    • The (X)Path to CSS Locators
    • Get an "a" in this Course
    • The CSS Wildcard
    • CSS Attributes and Text Selection
    • You've been hrefed
    • Top Level Text
    • All Level Text
    • Respond Please!
    • Reveal By Response
    • Responding with Selectors
    • Selecting from a Selection
    • Survey
    • Titular
    • Scraping with Children
  4. Spiders

    • Learn to create web crawlers with Scrapy.
    • Your First Spider
    • Inheriting the Spider
    • Hurl the URLs
    • Start Requests
    • Self Referencing is Classy
    • Starting with Start Requests
    • Parse and Crawl
    • Pen Names
    • Crawler Time
    • Capstone
    • Time to Run
    • DataCamp Descriptions
    • Capstone Crawler
    • The Finale

Getting Started

To get started with the course, navigate to the respective folders for each section and follow the instructions provided in the course materials.

Happy learning!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages