Skip to content

sangaline/advanced-web-scraping-tutorial

master
Switch branches/tags
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Advanced Web Scraping Tutorial Project

This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.

This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:

  1. User agent filtering.
  2. Obfuscated javascript redirects.
  3. Captchas.
  4. Header consistency checks.

The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.

About

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages