hq

jq, but for HTML. Try it in your browser here

hq reads HTML and converts it into a JSON object based on a series of CSS selectors. The selectors are expressed in a similar way to JSON, but where the values are CSS selectors. For example:

{posts: .athing | [ {title: .titleline > a, url: .titleline > a | @(href)} ] }

This will select all .athing elements, and it will create an array (| [{...}]) of objects for each element selected. Then for each element it will select the text of the titleline > a element, and the href attribute (| @(href)).

The end result is the following structure:

{
  "posts": [
    {
      "title": "...",
      "url": "..."
    }
  ]
}

Install

brew install hq, or cargo install html-query

Special query syntax

Text

.foo | @text

This will select the text content from the first element matching .foo.

Selecting attributes

.foo | @(href)

This will select the href attribute from the first element matching .foo.

Parents

.foo | @parent

This will return the parent element from the first element matching .foo.

Siblings

.foo | @sibling(1)

This will return the sibling element from the first element matching .foo.

Examples

Full hacker news story extraction

{posts: .athing | [{href: .titleline > a | @(href), title: .titleline > a, meta: @sibling(1) | {user: .hnuser, posted: .age | @(title) }}]}

This selects each .athing element, extracts the URL from the href attribute as well as the title. It then selects the sibling .athing element, and extracts the user and post time from that:

{
  "posts": [
    {
      "title": "...",
      "url": "...",
      "meta": {
        "posted": "...",
        "user": "..."
      }
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github		.github
crates		crates
images		images
.editorconfig		.editorconfig
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dioxus.toml		Dioxus.toml
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hq

Install

Special query syntax

Text

Selecting attributes

Parents

Siblings

Examples

Full hacker news story extraction

About

Releases 4

Packages

Contributors 2

Languages

License

orf/html-query

Folders and files

Latest commit

History

Repository files navigation

hq

Install

Special query syntax

Text

Selecting attributes

Parents

Siblings

Examples

Full hacker news story extraction

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages