Query a document tree with selectors
Extracts nodes using a selector syntax that is a subset of the CSS selectors specification.
npm i mkql --save
For the command line interface install mkdoc globally (npm i -g mkdoc
).
- Install
- Usage
- Example
- Selectors
- Help
- API
- License
Pass selectors when creating the stream:
var ql = require('mkql')
, ast = require('mkast');
ast.src('Paragraph\n\n* 1\n* 2\n* 3\n\n```javascript\nvar foo;\n```')
.pipe(ql('p, ul, pre[info^=javascript]'))
.pipe(ast.stringify({indent: 2}))
.pipe(process.stdout);
mkcat README.md | mkql 'p, ul, pre[info^=javascript]' | mkout
printf 'Para 1\n\nPara 2\n\n* List item\n\n' | mkcat | mkql '*' | mkout -y
Implemented selectors work like their CSS counterparts and in some cases extensions have been added specific to markdown tree nodes.
Types are based on the equivalent HTML element name, so to select a node of paragraph
type use p
; the universal selector *
will select nodes of any type.
The map of standard HTML tag names to node types is:
p
: paragraphul
: listol
: listli
: itemh1-h6
: headingpre
: code_blockblockquote
: block_quotehr
: thematic_breakcode
: codeem
: emphstrong
: stronga
: linkbr
: linebreakimg
: image
Extensions for markdown specific types:
nl
: softbreaktext
: texthtml
: html_blockinline
: html_inline
Use whitespace for a descendant combinator or if you prefer use the explicit >>
notation from CSS4:
ol li
ol >> li
A selector such as ol li
will find all descendants use the child combinator operator when you just want direct children:
ol > li
The adjacent sibling combinator is supported; select all lists that are directly preceeded by a paragraph:
p + ul
The following sibling combinator is supported; select code that is preceeded by a text node:
p text ~ code
You can match on attributes in the same way as usual but attributes are matched against tree nodes not HTML elements so the attribute names are often different.
a[href^=http://domain.com]
See attribute selectors (@mdn) for more information on the available operators.
The operator =~
(not to be confused with ~=
) is a non-standard operator that may be used to match by regular expression pattern:
img[src=~\.(png|jpg)$]
For all nodes that have a literal
property you may match on the attribute.
p text[literal~=example]
Nodes that have a literal
property include:
pre
: code_blockcode
: codetext
: texthtml
: html_blockinline
: html_inline
The content
attribute is available for containers that can contain text
nodes. This is a more powerful (but slower) method to match on the text content.
Consider the document:
Paragraph with some *emphasis* and *italic*.
If we select on the literal
attribute we would get a text
node, for example:
p [literal^=emph]
Results in the child text
node with a literal value of emphasis
. Often we may wish to match the parent element instead to do so use the content
attribute:
p [content^=emph]
Which returns the emph
node containing the text
node matched with the previous literal
query.
The value for the content
attribute is all the child text nodes concatenated together which is why it will always be less performant than matching on the literal
.
Links support the href
and title
attributes.
a[href^=http://]
a[title^=Example]
Images support the src
and title
attributes.
img[src$=.jpg]
img[title^=Example]
Code blocks support the info
and fenced
attributes.
pre[info^=javascript]
pre[fenced]
The list
and item
types (ul
, ol
and li
) support the bullet
and delimiter
attributes.
So you can select elements depending upon the bullet character used (unordered lists) or the delimiter (ordered lists). For the bullet
attribute valid values are +
, *
and -
; for the delimiter
attribute valid values are .
or )
.
This selector will match lists declared using the *
character:
ul[bullet=*]
Or for all ordered lists declared using the 1)
style:
ol[delimiter=)]
Use a child selector to get list items:
ul li[bullet=+]
The pseudo classes :first-child
, :last-child
, :only-child
and :nth-child
are supported.
p a:first-child
p a:last-child
ul li:nth-child(5)
ul li:nth-child(2n+1)
ul li:nth-child(odd) /* same as above */
ul li:nth-child(2n)
ul li:nth-child(even) /* same as above */
ul li:only-child
See the :nth-child docs (@mdn) for more information.
The relational pseudo-class :has
is useful for selecting parents based on a condition:
p:has(em)
a:has(> img)
The negation pseudo-class :not
is also available:
p:not(:first-child)
Use the :empty
pseudo-class to select nodes with no children:
p :empty
Use the pseudo element prefix ::
to select elements not directly in the tree.
The pseudo elements used to select the html_block
and html_inline
nodes by type are:
::comment
Select comments<!-- -->
::pi
Select processing instructions<? ?>
::doctype
Select doctype declarations<!doctype html>
::cdata
Select CDATA declarations<![CDATA[]]>
::element
Select block and inline elements<div></div>
::doctype /* select doctype declarations */
p ::comment /* select inline html comments */
Usage: mkql [-dprmnh] [--delete] [--preserve] [--range] [--multiple]
[--newline] [--help] [--version] <selector...>
mkql [-dprmnh] [--delete] [--preserve] [--multiple] [--newline] [--help]
[--version] --range <start-selector> [end-selector]
Query documents with selectors.
Options
-d, --delete Remove matched nodes
-p, --preserve Preserve text when deleting
-r, --range Execute a range query
-m, --multiple Include multiple ranges
-n, --newline Add line break between matches
-h, --help Display help and exit
--version Print the version and exit
mkql@1.0.8
compile(source)
Compile a source selector string to a tree representation.
Returns Object result tree.
source
String input selector.
range(start[, end])
Compile a range query.
When an end
selector is given it must have the same number of
selectors in the list as the start
selector.
If the end
selector is not given the range will end when the start
selector matches again or the end of file is reached.
start
String selector to start the range match.end
String selector to end the range match.
slice(source[, opts])
Execute a range query on the input nodes.
Returns Range query execution object.
source
Object compiled range query.opts
Object range query options.
query(markdown, source[, opts])
Query a markdown document tree with a source selector.
If the markdown parameter is a string it is parsed into a document tree.
If the given source selector is a string it is compiled otherwise it should be a previously compiled result tree.
If the source selector appears to be a range query the slice
function is
called with the range query.
Returns Array list of matched nodes.
markdown
Array|Object|String input data.source
String|Object input selector.opts
Object query options.
ql([opts][, cb])
Run queries on an input stream.
Returns an output stream.
opts
Object processing options.cb
Function callback function.
input
Readable input stream.output
Writable output stream.
MIT
Created by mkdoc on April 24, 2016