Purpose of the tutorial:
Show how to automate basic data collection using Python. Prior knowledge of Python is helpful but not necessary to engage in the tutorial.
Audience:
Data analysts, scientists, engineers, or data enthusiasts.
Overview:
This tutorial will show you various ways to build simple web scrapers that captures data from Wikipedia pages. This tutorial was made in VS Code IDE and should be run on your local computer.
Prerequisites:
- Basic knowledge of Python
- Jupyter Notebook Environment
- Highly recommended that a virtual environemnt is used.
Part 1 - Simple
Vocabulary: web scraper, Python, Pandas, Numpy, Matplotlib
- Import Python Libraries
- Read html tables
- Output the data in a data frame
Part 2 - Beautiful Soup
Vocabulary: requests, Beautiful Soup
- Import Python Libraries
- Read html
- Output the data
Part 3 - Selenium
Vocabulary - Beautiful Soup, Selenium
- Import Libraries
- Read in Web Page
- Output Data
Part 4 - APIs
Vocabulary: API, urllib, request, json, ssl, API key
- Without Key
- With Key