Perform a lookup by CSS selector on an HTML input
Python
Switch branches/tags
Nothing to show
Latest commit aa85dc1 Feb 22, 2017 @plainas committed on GitHub added note about python3 requirement and pip
Permalink
Failed to load latest commit information.
bin github impoort Sep 28, 2015
doc rough edges here and there Nov 30, 2015
tq version bump Nov 30, 2015
.gitignore rough edges here and there Nov 30, 2015
LICENSE rough edges here and there Nov 30, 2015
README.md added note about python3 requirement and pip Feb 22, 2017
setup.py Update setup.py May 10, 2016
tqtest added test script Nov 1, 2015

README.md

tq

tq is command line utility that performs an HTML element selection on HTML content passed to the stdin. Using css selectors that everybody knows.

Since input comes from stdin and output is sent to stdout. It can easily be used inside traditional UNIX pipelines to extract content from webpages and html files.

tq provides extra formating options such as json-encoding or newlines squashing, so it can play nicely with everyones favourite command line tooling.

Installation

sudo pip install https://github.com/plainas/tq/zipball/stable

WARNING: tq requires python3. On some systems, the pip will install python2 packages. In that case, you will need to use pip3 instead.

Example usage

Get headlines from hacker news

curl https://news.ycombinator.com/news | tq -tj ".title a"

Get the title of an html document stored in a file

cat mydocument.html | tq -t title

Get all the images from a webpage

curl -s 'http://example.com/' | tq  "img" -a src | wget -i -

Notice that tq doesn't provide a way to make http requests or read files. You can use your favorite HTTP client, or provide the html source from any source you want.

For a modern, user friendly http client, check httpie. Or you can just use curl, wget, netcat, etc.

Command options

  • SELECTOR A css selector

  • -a ATTRIBUTE --attr=ATTRIBUTE Outputs only the contents of the html ATTRIBUTE.

  • -t, --text Outputs only the inner text of the selected elements.

  • -q, --squash Squash lines.

  • -s, --squash-space Squash spaces.

  • -j, --json-lines JSON encode each match.

  • -J, --json Output as json array of strings.

  • -v, --version Prints tq version