Web Scraping in Python with Scrapy

Course Description

The ability to build tools capable of retrieving and parsing information stored across the internet has been and continues to be valuable in many veins of data science. In this repo, you will learn to navigate and parse HTML code, and build tools to crawl websites automatically. Although our scraping will be conducted using the versatile Python library Scrapy, many of the techniques you learn in this course can be applied to other popular Python libraries as well, including BeautifulSoup and Selenium. You will have a strong mental model of HTML structure, will be able to build tools to parse HTML code and access desired information, and create simple Scrapy spiders to crawl the web at scale.

Introduction to HTML
- Learn the structure of HTML.
- Web Scraping Overview
- HTML tree wordy navigation
- From Tree to HTML
- Attributes
- Keep it Classy
- Finding href
- Crash Course in XPath
- Where am I?
- It's Time to P
- A classy span
XPaths and Selectors
- Leverage XPath syntax to explore Scrapy selectors.
- XPathology
- Counting Elements in the Wild
- Body Appendages
- Choose DataCamp!
- Off the Beaten XPath
- Where it's @
- Check your Class
- Hyper(link) Active
- Secret Links
- Selector Objects
- XPath Chaining
- Divvy Up This Exercise
- The Source of the Source
- Course Class by Inspection
- Requesting a Selector
CSS Locators, Chaining, and Responses
- Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath.
- From XPath to CSS
- The (X)Path to CSS Locators
- Get an "a" in this Course
- The CSS Wildcard
- CSS Attributes and Text Selection
- You've been hrefed
- Top Level Text
- All Level Text
- Respond Please!
- Reveal By Response
- Responding with Selectors
- Selecting from a Selection
- Survey
- Titular
- Scraping with Children
Spiders
- Learn to create web crawlers with Scrapy.
- Your First Spider
- Inheriting the Spider
- Hurl the URLs
- Start Requests
- Self Referencing is Classy
- Starting with Start Requests
- Parse and Crawl
- Pen Names
- Crawler Time
- Capstone
- Time to Run
- DataCamp Descriptions
- Capstone Crawler
- The Finale

Getting Started

To get started with the course, navigate to the respective folders for each section and follow the instructions provided in the course materials.

Happy learning!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Chapter01		Chapter01
chapter02		chapter02
chapter03		chapter03
chapter04		chapter04
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping in Python with Scrapy

Course Description

Table of Contents

Getting Started

About

Releases

Packages

Languages

License

khosrogh/DC_Web_Scraping_Python

Folders and files

Latest commit

History

Repository files navigation

Web Scraping in Python with Scrapy

Course Description

Table of Contents

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages