Skip to content

HTMLExtract

Francesco edited this page May 1, 2018 · 33 revisions

Introduced in t-ui beta 6.6

This feature let's you extract text from HTML pages and display it inside t-ui.

XPath

XPath is a language used to find particular nodes and tags in HTML/XML documents. It's very easy to understand, and very powerful.

JSONPath

JSONPath has the same features of the language described above, but it works with Json.

Warning

This JSONPath
$.[bid,ask,last]
won't work unless you use it like that
$.['bid','ask','last']

Format

Values:

  • %n -> newline

  • %t -> tag name

  • %t(attributeName) -> the value of the attribute attributeName of the matched node

  • %a(format)(separator) -> prints every attribute of the matched nodes

    • %an -> attribute name
    • %av -> attribute value
  • %v -> tag value

  • #[URL] -> link

  • #rrggbb[text] -> color the text

  • #[replaceThis/with][replaceAlsoThis/withThis]... -> replace the text in front of the group

The replace group

Note that the replace format works only with the very next word or set of world in front of it. For instance

#[replace/with]%t%v

will affect only %t.

Moreover, color and links aren't allowed inside the replace group. You'll have to put them outside.
The / symbol is the value of optional_value_separator.

Example

Matched node:
<a href="https://github.com/Andre1299/TUI-ConsoleLauncher/subscription" class="myClass" role="button">This is a link</a>

Example 1

Format:
#[%t(href)]

Output:
https://github.com/Andre1299/TUI-ConsoleLauncher/subscription

Example 2

Format:
%t -> %v%n%a(%an = %av)(%n)

Output:

a -> This is a link
href = https://github.com/Andre1299/TUI-ConsoleLauncher/subscription
class = myClass
role = button

Example 3

Format:
#[a:linkTag]%t -> #[is:is not][link:plain text]%v%n%a(%an = %av)(%n)

Output:

linkTag -> This is not a plain text
href = https://github.com/Andre1299/TUI-ConsoleLauncher/subscription
class = myClass
role = button

Steps

1. Find a webpage

2. Decide the node kind

You can select an infinite amount of nodes, but everyone will be of the same kind. Decide carefully what kind of nodes you need.

3. Create a new XPath/JsonPath expression

4. Test!

5. Add the expression to t-ui

htmlextract -add [json OR xpath] [ID] [expression]

For instance:
htmlextract -add xpath 1 //a[@class="foo"]

6. Add a new format to t-ui (you can also use the default one)

htmlextract -add format [ID] [expression]

For instance:
htmlextract -add format 5 #[%t(href)]

7. Use it!

htmlextract -query [ID] [optional: Format ID] [webpage]
For instance:
htmlextract -query 1 5 https://website.com/page.html

Notice that [Format ID] is optional. This means that if you omit it, t-ui will use the value of htmlextract_default_format instead.