Webscraper Tutorial

Purpose of the tutorial:

Show how to automate basic data collection using Python. Prior knowledge of Python is helpful but not necessary to engage in the tutorial.

Audience:

Data analysts, scientists, engineers, or data enthusiasts.

Overview:

This tutorial will show you various ways to build simple web scrapers that captures data from Wikipedia pages. This tutorial was made in VS Code IDE and should be run on your local computer.

Prerequisites:

Basic knowledge of Python
Jupyter Notebook Environment
Highly recommended that a virtual environemnt is used.

Part 1 - Simple

Vocabulary: web scraper, Python, Pandas, Numpy, Matplotlib

Import Python Libraries
Read html tables
Output the data in a data frame

Part 2 - Beautiful Soup

Vocabulary: requests, Beautiful Soup

Import Python Libraries
Read html
Output the data

Other Methods

Part 3 - Selenium

Vocabulary - Beautiful Soup, Selenium

Import Libraries
Read in Web Page
Output Data

Part 4 - APIs

Vocabulary: API, urllib, request, json, ssl, API key

Without Key
With Key

Resources

Beautiful Soup Documentation

Python Docs

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
api_data_collection		api_data_collection
bs4_scraping		bs4_scraping
data		data
scraping_for_tables		scraping_for_tables
scraping_with_selenium		scraping_with_selenium
soap		soap
webcrawlers		webcrawlers
.gitignore		.gitignore
Python Regex Cheat Sheet - Hackr.io.pdf		Python Regex Cheat Sheet - Hackr.io.pdf
README.md		README.md
schedule_scraper.ipynb		schedule_scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webscraper Tutorial

Other Methods

Resources

About

Releases

Packages

Languages

parcheesime/data-collection-methods

Folders and files

Latest commit

History

Repository files navigation

Webscraper Tutorial

Other Methods

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages