Web-Scraping-Projects

A set of projects to learn and experiment Web Scraping with Python.

Project	Description	Dataset size
greatplacetowork-web-scraper	Scrape companies data	Small
rottentomatoes-web-scraper	Scrape movies data	Small
euronics-web-scraper	Scrape smart TV products data from e-commerce platform	Small

Technologies

Python version: 3.11.

Python libraries:

BeautifulSoup
selenium
urllib
pandas

Status

Project is: in progress.

Warnings

The websites structure changes in time and a Web Scraper that was previously working perfectly can break due to these updates. The code must be maintained and updated by running periodical tests. Small adjustments are usually required since the websites changes are small and incremental. I will try to update the code periodically but keep in mind that any errors are part of the Web Scraping process.

Make sure to check the robots.txt file before scraping a website. This standardized file tells you which parts of the website can be scraped and by whom. You can check it out by adding robots.txt to the root of the website domain, i.e. https://www.amazon.it/robots.txt.

The robots.txt file can contain a crawling delay (a waiting time between crawling actions). Make sure to use a crawling delay (around 5-10 seconds) even if not present in the robots.txt file to avoid causing performance issues to the scraped website and getting blocked.

Contact

Created by mary_0094@hotmail.it, feel free to get in touch! 👩‍💻

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web-Scraping-Projects

Technologies

Status

Warnings

Contact

About

Uh oh!

Releases

Packages

mariadancianu/Web-Scraping-Projects

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping-Projects

Technologies

Status

Warnings

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages