Web-Scraping-Demo

This is just a demo of "Web Scraping" with Python.

Usage

python3 scrapper.py visual - To scrape visually with selenium
python3 scrapper.py headless - To scrape in shell/commandline only with requests

How The Script Works

This script with will scrape https://blog.scrapinghub.com/ recursively.
First the script will scrape home page for every blog "Post Title", "Post Date", "Post Author", "Post Link".
After successfully scrape the first/home page it will check if there is any second page, if any it will go further and do the same thing.
It will keep dooing the same thing until it reaches the last page of the blog.
After collecting all data it will store those data locally in a CSV file called result.csv.
Then it will create a directory named data/ under the current directory.
Then it will take all the links for the blog posts from result.csv file and scrape for the actual "Blog Article" and save it
in data/ directory with a name "ACTUAL-POST-TITLE".txt

Module used

Requests
- pip3 install requests
BeautifulSoup
- pip3 install bs4
Selenium
- pip3 install selenium
CSV
- pip3 install csv

POC-Proof-of-concept

Visual Scraping. headless Scraping.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
poc.png		poc.png
scrapper.py		scrapper.py
scrapperMajor.py		scrapperMajor.py
scrapperMinor.py		scrapperMinor.py
visualMajor.py		visualMajor.py
visualMinor.py		visualMinor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

poc.png

poc.png

scrapper.py

scrapper.py

scrapperMajor.py

scrapperMajor.py

scrapperMinor.py

scrapperMinor.py

visualMajor.py

visualMajor.py

visualMinor.py

visualMinor.py

Repository files navigation

Web-Scraping-Demo

Usage

How The Script Works

Module used

POC-Proof-of-concept

About

Releases

Packages

Languages

mahimsafa/Web-Scraping-Demo

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping-Demo

Usage

How The Script Works

Module used

POC-Proof-of-concept

About

Resources

Stars

Watchers

Forks

Languages