Skip to content

kuleafenu/customizable-web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

web_scraping

Liked it? Please give a ⭐️ to build this 💪 stronger.

👋 Introduction

  • This is a Scrapy and Splash project which can be customized to scrape almost all types of websites.

Plot

💻 Published on medium

Click the link below for a comprenhensive tutorial on how to set-up the project environment.

How to scrape all types of websites with python — part 1

🔥 Scraping the data

Click the link below for a comprenhensive tutorial on get the project up and running.

How to scrape all types of websites with python — part 2

Project Goal: A comprehensive guide on how I scraped 19 thousand medium posts with scrappy and splash.

🔢 What you will learn

  • Download and install Anaconda Navigator and Docker.
  • Know how to install scrappy and splash.
  • Learn how to program in VS Code
  • Write Splash Script
  • Extract patterns with Scrapy
  • Store data in CSV,JSON and XML

🏗️ How to reproduce the project

You can run this code locally with a few easy steps.

  1. Clone the repository
https://github.com/kuleafenu/customizable-web-crawler.git
  1. Click the link below for a comprenhensive tutorial on how to set-up the project environment.

How to scrape all types of websites with python — part 1

  1. Click the link below for a comprenhensive tutorial on get the project up and running.

How to scrape all types of websites with python — part 2

🛡️ License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Support

We all need support and motivation. Please give this project a ⭐️ to encourage and show that you liked it. Don't forget to leave a star ⭐️ before you move away.