Skip to content
/ brows Public

An easy to use application for consuming text content from any website in the command line

License

Notifications You must be signed in to change notification settings

mk-hill/brows

Repository files navigation

brows

npm version build status dependencies downloads GitHub license language

An easy to use application for consuming text content from any website in the command line. Uses CSS selectors to retrieve content.

brows demo
Shows basic usage, importing, and groups

Contents

Features

Installation

npm install -g brows

Usage

brows can either be used with one URL followed by one selector, or any number of saved target names.

brows [options] <url> <selector>
brows [options] <name> [<name> ...]

Options

Option Alias Description
--save <name> -s Save target or group for future use with given name
--save-only <name> Save target or group and exit without retrieving content
--html -h Retrieve outer HTML instead of text content
--all-matches -a Target all matching elements instead of just the first one
--delim -d Set delimiter between results for -a, defaults to newline
--force-browser -f Prevent request attempt and force browser launch
--list-saved -l Print a list of all saved targets and groups
--import <source> -i Import targets and groups from source file
--export <target> -e Export all saved targets and groups to target file
--ordered-print -o Print results in the order their targets were passed
--verbose -v Print information about about what is being done
--yes -y Accept confirmation prompts without displaying them
--help Print a detailed explanation of usage and options

Examples

Basic usage

By default, brows will retrieve the first matching HTML element's text content.

$ brows info.cern.ch/hypertext/WWW/TheProject.html h1
World Wide Web

The --html option can be used to retrieve its outer HTML instead.

$ brows -h info.cern.ch/hypertext/WWW/TheProject.html h1
<h1>World Wide Web</h1>

--all-matches will target all elements matching the selector. By default, results are separated by a newline.

$ brows -a todomvc.com/examples/react 'ul:first-of-type li'
Tutorial
Philosophy
Support
Flux architecture example

Options can be placed anywhere.

$ brows info.cern.ch/hypertext/WWW/TheProject.html h1 -v
# ...
Found h1 in response data
World Wide Web

Saving targets

Targets can be saved with a given name using ---save or --save-only. Content type preferences are saved as well.

$ brows --save-only listItems todomvc.com/examples/react 'ul:first-of-type li' -a -d ', '
$ brows -s titleHtml info.cern.ch/hypertext/WWW/TheProject.html h1 -h
<h1>World Wide Web</h1>

This name can then be used in future executions.

$ brows listItems
Tutorial, Philosophy, Support, Flux architecture example

Multiple saved names can be used at a time.

$ brows titleHtml listItems
titleHtml: <h1>World Wide Web</h1>
listItems: Tutorial, Philosophy, Support, Flux architecture example

Saving groups

Multiple saved targets can also be grouped under a different name.

$ brows 'google.com/search?q=weather' '#wob_ttm' --save-only temperature
$ brows 'google.com/search?q=weather' '#wob_pp' --save-only precipitation
$ brows temperature precipitation --save-only weather
$ brows weather
temperature: 28
precipitation: 64%

It's generally much faster to retrieve all desired content together rather than performing a separate run for each target.

Further grouping saved targets (and groups of targets) makes this easy to do for content you expect to retrieve frequently.

$ brows --save-only latestKurzgesagt 'youtube.com/user/Kurzgesagt/videos?sort=dd' '#video-title'
$ brows --save-only availability https://amazon.com/How-Absurd-Scientific-Real-World-Problems/dp/0525537090 '#availability span'
$ brows --save-only examples weather availability latestKurzgesagt titleHtml listItems

Results are printed as they are retrieved by default.

$ brows examples
titleHtml: <h1>World Wide Web</h1>
listItems: Tutorial, Philosophy, Support, Flux architecture example
temperature: 28
precipitation: 64%
latestKurzgesagt: Why Are You Alive – Life, Energy & ATP
availability: Temporarily out of stock.

Importing and exporting

--import and --export use a relative or absolute path.

$ brows -i /absolute/path/to/example.yaml
$ brows -e readme_examples.yml

A default file name will be used if the provided path is a directory.

$ brows -e .
$ ls
brows_exports.yml

brows will prompt for confirmation before overwriting anything by default.

$ brows -i .
8 names match existing ones and would be overwritten: availability, precipitation, temperature, titleHtml, listItems, latestKurzgesagt, examples, weather
Import anyway? Y/N:

Overriding defaults

--yes will accept any confirmation prompts which would have otherwise been displayed.

$ brows -i . -y

--delim can be used to specify a different delimiter than the default newline for --all-matches.

$ brows -a -d ', ' todomvc.com/examples/react 'ul:first-of-type li'
Tutorial, Philosophy, Support, Flux architecture example

The --ordered-print option can be used to wait for all results to be ready and print them in the order their targets were passed instead of printing each result as it's retrieved.

$ brows examples -o
temperature: 28
precipitation: 64%
availability: Temporarily out of stock.
latestKurzgesagt: Why Are You Alive – Life, Energy & ATP
titleHtml: <h1>World Wide Web</h1>
listItems: Tutorial, Philosophy, Support, Flux architecture example

Browser requirements are handled automatically for the vast majority of use cases. The --force-browser option will override this.

$ brows my-single-page-app.com html -h --force-browser > spa.html

Import/Export Format

The import/export format is based around creating, editing, and transferring any number of targets and groups as easily as possible:

  • Uses easy to read and quick to type YAML format by default.
  • Targets are listed under their URLs.
  • Defaults don't need to be entered.
  • If no other options are being entered, each target name can be directly mapped to its corresponding selector.
  • As in the command line, http:// is automatically prepended to the URL if it doesn't begin with http:// or https://.
  • Groups can be entered as arrays of target names in any valid YAML format.
  • You don't need to specify whether a browser is needed except for niche use cases.
Targets:
  example.com:
    myHeader: h1
    mySpan: div span.my-span
  example2.com:
    myAnchors:
      selector: a
      contentType: outerHTML
      allMatches: true
Groups:
  myGroup: [myHeader, mySpan]
  anotherGroup: [mySpan, myAnchors]

is effectively the same as:

Targets:
  http://example.com:
    myHeader:
      selector: h1
      contentType: textContent
      forceBrowser: false
      allMatches: false
    mySpan:
      selector: div span.my-span
      contentType: textContent
      forceBrowser: false
      allMatches: false
  http://example2.com:
    myAnchors:
      selector: a
      contentType: outerHTML
      forceBrowser: false
      allMatches: true
      delim: "\n"
Groups:
  myGroup:
    - myHeader
    - mySpan
  anotherGroup:
    - mySpan
    - myAnchors

Targets and groups saved in the above examples are exported as:

Targets:
  google.com/search?q=weather:
    precipitation:
      forceBrowser: true
      selector: '#wob_pp'
    temperature:
      forceBrowser: true
      selector: '#wob_ttm'
  https://amazon.com/How-Absurd-Scientific-Real-World-Problems/dp/0525537090:
    availability: '#availability span'
  info.cern.ch/hypertext/WWW/TheProject.html:
    titleHtml:
      contentType: outerHTML
      selector: h1
  todomvc.com/examples/react:
    listItems:
      allMatches: true
      delim: ', '
      forceBrowser: true
      selector: ul:first-of-type li
  youtube.com/user/Kurzgesagt/videos?sort=dd:
    latestKurzgesagt:
      forceBrowser: true
      selector: '#video-title'
Groups:
  examples:
    - temperature
    - precipitation
    - availability
    - latestKurzgesagt
    - titleHtml
    - listItems
  weather:
    - temperature
    - precipitation

Additional Details

  • By default, brows will initially make a GET request to the URL and attempt to find the selector in the response HTML. If this fails, a headless browser will be used instead.
  • If a saved target isn't found in the response data on the first attempt, it will be automatically updated to skip the unnecessary request in the future and directly launch the browser.
  • When multiple saved names are passed, brows will only make a request (and/or navigate a browser page) to each URL once. All targets in the same URL will be retrieved from the same response data and/or browser page.
  • Saving multiple targets with a new name will create a group. Groups are essentially just aliases which expand to their member targets in the order they were passed when saving.
  • When saving or retrieving content from multiple overlapping groups, each individual target is only used once. No duplicates will be retrieved or saved under the new combined group.
  • Conventional HTTP_PROXY/HTTPS_PROXY/NO_PROXY environment variables will be used if they exist.
  • Importing JSON files with the same structure as the YAML examples above is also supported without any additional configuration. Just pass a JSON file instead.

About

An easy to use application for consuming text content from any website in the command line

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published