# XPath Reference Sheet

XPath is a language for selecting elements on XML document. Although XML is rarely used in websites nowadays, XPath is still widely used for selecting elements on HTML since XML and HTML have the same structure - both are built by tags. 



---
## 1. Basic selectors:

- `tagName` = All element with the tag `<tagName>`

- `/` = The root element. In HTML it is the `<html>` tag.

- `/parent/tagName` = Select the tag with name `<tagName>` base on an absolute path

- `parent/tagName` = All children of `<parent>` that is a tag with name `<tagName>` 

- `//tagName` = All `<tagName>` element in the document. i.e. all descendant under root.

- `parent//tagName` = All descendant of `<parent>` that with a tag `<tagName>`

- `.` = current element

- `tagName/..` = parent of `<tagName>`

**Note that attributes are treated as individual nodes in XPath.** So attributes have their own selectors. The return is **the value of the attribute**.

- `tagName@` = All attribute in the tag `<tagName>`

- `tagName@attr` = select the attribute node named `attr`

- `parent//@attr` = select all attribute nodes named `attr` under `<parent>`


The selection of an element will return the text content and also the outer HTML tag (i.e. the `outerHTML` in JavaScript). To get only the text content, use `text()`, because text content are also a node in HTML.

- `tagName/text()` = all the text content under `<tagName>`. Include the text of children of all level under `<tagName>` 

---
## 2. Predicate
For isolating the specific nodes that satisfy certain condition. Always use `[]` to carry the conditions.

- `parent/tagName[n]` = The nth `<tagName>` that is the child under `<parent>`. **Note that index begins with 1 in XPath.**

- `parent/tagName[last()]` = The last `<tagName>` that is the child under `<parent>`.


- `parent/tagName[position()<n]` = All the first n-1 `<tagName>` that are the children under `<parent>`.

- `tagName[@attr]` = Any `<tagName>` that contain an attribute called `attr`.

- `tagName[@attr = 'value']` = Any `<tagName>` that contain an attribute called `attr` whose value is `value`. 

- `tagName[child>15.00]` = Any `<tagName>` that has a child node whose value is larger than 15.00.

---
## 3. Operators
We can use these operators **inside the predicate** to do mathematic operations.

- `+`, `-`, `*`, `div`, `mod` = Addition / Subtraction / Multiplication / Division / Modulus

- `=`, `!=` = Equal / Not equal

- `<`, `<=`, `>`, `>=` = Less than / Less than or equal to / Greater than / Greater than or equal to

- `and`, `or` = For supplying multiple conditions

---
## 4. Wildcard

These wildcards are commonly used to replace the node name:
- `*` = any element node
- `@*` = any attribute node
- `node()` = any node of any kind

For example,
- `parent/*` = All child under `<parent>`.
- `//*` = All element node in the document.
- `//tagName[@*]` = All node with name `<tagName>` which contain at least one attribute. The kinds of attribute does not matter.

---
## 5. Intext search
XPath supports intext search as well.

- `//tagName[contains(.,"str")]` = All `<tagName>` that (in current node) contains the string `"str"` (case-sensitive).

- `//tagName[starts-with(.,"str")]` = All `<tagName>` that (in current node) starts with the string `"str"` (case-sensitive).

- `//tagName[ends-with(.,"str")]` = All `<tagName>` that (in current node) ends with the string `"str"` (case-sensitive).

- `//tagName[matches(.,"str")]` = All `<tagName>` whose (in current node) text match exactly the string `"str"` (case-sensitive).

---
## 6. Axes
Instead of using the basic selectors, XPath also provides the full syntax for accessing relations between tree nodes, which is called "axis". It can be used to locate a node relative to the current node. The syntax goes by

```
axesName::current-node-name
```

There are 13 axis names in total.

- `self` = the context node itself
- `child` = all children of the context node
- `descendant` = all descendants (children+)
- `parent` = the parent (empty if at the root)
- `ancestor` = all ancestors from the parent to the root
- `descendant‐or‐self` = the union of descendant and self 
- `ancestor‐or‐self = the union of ancestor and self
- `following‐sibling` = all siblings to the right
- `preceding‐sibling` = all siblings to the left
- `following` = all following nodes in the document, excluding descendants
- `preceding` = all preceding nodes in the document, excluding ancestors 
- `attribute` = all the attributes of the context node


![tree](https://jrebecchi.github.io/xpath-helper/_images/xpath-axes.jpg)

Picture retrieved from https://jrebecchi.github.io/xpath-helper/xpath-axes.html.