Skip to content

parcheesime/data-collection-methods

Repository files navigation

Webscraper Tutorial

Purpose of the tutorial:

Show how to automate basic data collection using Python. Prior knowledge of Python is helpful but not necessary to engage in the tutorial.

Audience:

Data analysts, scientists, engineers, or data enthusiasts.

Overview:

This tutorial will show you various ways to build simple web scrapers that captures data from Wikipedia pages. This tutorial was made in VS Code IDE and should be run on your local computer.

Prerequisites:

  • Basic knowledge of Python
  • Jupyter Notebook Environment
  • Highly recommended that a virtual environemnt is used.

Part 1 - Simple

Vocabulary: web scraper, Python, Pandas, Numpy, Matplotlib

  • Import Python Libraries
  • Read html tables
  • Output the data in a data frame

Part 2 - Beautiful Soup

Vocabulary: requests, Beautiful Soup

  • Import Python Libraries
  • Read html
  • Output the data

Other Methods

Part 3 - Selenium

Vocabulary - Beautiful Soup, Selenium

  • Import Libraries
  • Read in Web Page
  • Output Data

Part 4 - APIs

Vocabulary: API, urllib, request, json, ssl, API key

  • Without Key
  • With Key

Resources

Beautiful Soup Documentation

Python Docs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published