Skip to content

Web scraping is the practice of extracting content and data from a website using bots. This custom code I built searches the source code of the page for specific parts I've defined and extracts the content I've asked it to extract.

License

Notifications You must be signed in to change notification settings

octocatblain/Webscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕸️ Web Scraper 💻

Web scraping is the practice of extracting content and data from a website using bots. Web scraping, unlike screen scraping, which replicates only the pixels seen onscreen, retrieves the underlying HTML code and, with it, the data contained in a database. The scraper can then copy the full website's content to another location. This custom code searches the source code of the page for specific parts defined and extracts the content asked to extract.

⚠️ Beware

Before scraping any website, check the terms and conditions page to determine if there are any clear scraping rules. You should follow them if there are any. If there aren't any, it's more of a guessing game.

😔Note

  • Sadly, not all websites support web scraping.

📚Resource Used

  • Newegg eCommerce Online Store. Newegg Commerce, Inc. is a company that sells computer hardware and consumer gadgets online. Its headquarters are in the City of Industry, California.
Newegg.eCommerce.mp4
  • National Weather Service. The National Weather Service is a federal government agency responsible with delivering weather forecasts, hazardous weather warnings, and other weather-related services to organizations and the general public for protection, safety, and general information.
NWS.mp4

🛠️Tools & Languages Used

  • Anaconda Version 4.10.1
  • Python Version 3.8.8
  • Beautiful Soup - Beautiful Soup is a Python package for parsing HTML and XML documents.
  • Extensible library for opening URLs -The urllib.request module
  • Python requests library in NWS-webscraper.py

🔆 Best Practices when Web Scraping

  • Never scrape more frequently than you need to.
  • Consider caching the content you scrape so that it’s only downloaded once.
  • Build pauses into your code using functions like time.sleep() to keep from overwhelming servers with too many requests too quickly.

🔌 What to Expect

  • Script Results in cmd.exe

Script Results in cmd.exe

  • Results in the Products.csv file

Results in the Products.csv file)

Dataframe Display in Terminal

This code was built with ❤️ and 2 cups of Coffee☕

About

Web scraping is the practice of extracting content and data from a website using bots. This custom code I built searches the source code of the page for specific parts I've defined and extracts the content I've asked it to extract.

Topics

Resources

License

Stars

Watchers

Forks

Languages