pquery

grep for HTML; CLI for pyquery

Demo

$curl -s https://github.com/hupili/pquery | pquery '.content a' -p text
.gitignore
LICENSE
MANIFEST.in
README.md
pquery
setup.py

pquery is intended to integrate into your UNIX pipeline.

Install

pip install pquery

Syntax

Usage:
    pquery <selector>
    pquery <selector> -p <projector>
    pquery <selector> -f <format_string>
    pquery -h | --help

Options:
    -p: project the dict onto field `<projector>`.
    -f: equivalent of `<format_string>.format(item)`,
        where item is the dict form of one selected HTML element.
    -h | -v: shows this doc.

Dict keys:
    'tag': The HTML tag
    'html': Inner HTML of the element
    'text': Inner text of the element
    ...: [optional] Other attributes: e.g. 'href'

Why

grep is powerful for lines. HTML is structured and not line processor friendly. CSS selector is a natural grep for HTML. This script simply wraps pyquery to provide a CLI.

Example 1

A course webpage lists slides in pdf and pptx. Want to download all the PDFs. This saves you some click.

wget --load-cookies=cookies.txt -O- 'https://class.coursera.org/crypto-008/wiki/LectureSlidesPublicCourse' | pquery a -p href | grep pdf | xargs -P 5 -I{} wget {}

It's tedious to directly grep the PDF links out from HTML.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pquery

Demo

Install

Syntax

Why

Example 1

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.txt		README.txt
pquery		pquery
setup.py		setup.py

License

hupili/pquery

Folders and files

Latest commit

History

Repository files navigation

pquery

Demo

Install

Syntax

Why

Example 1

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages