Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 1.28 KB

README.md

File metadata and controls

41 lines (27 loc) · 1.28 KB

DataCamp Crawler

Problem Statement.

We have a starting webpage. There is a list of courses related to Python programming language each shown as a separate block.

start_url

Each link redirects user to that specific course page. That course page has some information such as:

  • course name
  • course description
  • number of exercises
  • participants
  • time hours
  • url
  • videos
  • xp points

example

We need to collect these information for each course. The output should be like this in an .csv format:


excel

Run

Make sure you have scrapy installed in your environment. Navigate to desired folder. And create a Scrapy project from console:

scrapy startproject datacamp

Copy this project files into that datacamp folder. It should be ..../datacamp/datacamp.../

Open console, change directory to that inside datacamp folder. And run the following command:

scrapy crawl my_scraper -o datacamp.csv