Skip to content

lorey/mlscraper-experiments

Repository files navigation

mlscraper-experiment

Trying some ideas to extend my main library mlscraper.

Features:

  • scraping arbitrary items (dict, lists, list of dicts, etc.)
  • smart scraper selection

Structure

This class diagram shows the basic relationships.

class diagram

Terminology

  • Scraper: turn a page into an item by scraping HTML
  • Sample: One item on a page (to be scraped later), i.e. what the user inputs
  • Match: One possible occurrence of a sample, i.e. nodes in which the sample occurs
  • Extractor: get the value out of a DOM node
  • Selector: an algorithm to select nodes

Does mlscraper support?

  • scraping arbitary items? yes
  • scraping dicts with missing values? yes
  • detecting specific pages that have no results? no

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages