Skip to content

Commit

Permalink
A couple of minor fixes / error handling improvements (#29)
Browse files Browse the repository at this point in the history
A couple of minor fixes / error handling improvements

filter should handle all iterables consistently (no special cases).
More consistent naming of variables in API and better descriptions in documents.
Capture pseudo-element pattern so that we can give a better exception.
Add DEBUG flag for development purposes.
Track index in selector pattern to bubble up more useful exceptions.
More accurate exception descriptions.
Fail for bad input type
cache no meta language found
  • Loading branch information
facelessuser committed Dec 20, 2018
1 parent 025e80d commit 96dc075
Show file tree
Hide file tree
Showing 14 changed files with 190 additions and 100 deletions.
1 change: 1 addition & 0 deletions docs/src/dictionary/en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ builtin
deprecations
html
iterable
iterables
linter
lxml
matchable
Expand Down
8 changes: 8 additions & 0 deletions docs/src/markdown/about/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## 1.2.1

- **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic.
- **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters all Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a `Tag` object. This is viewed as fixing an inconsistency.
- **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development.
- **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language in the `meta` tag to prevent searching again during the current select.
- **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`.

## 1.2.0

- **NEW**: Add Python 2.7 support.
Expand Down
10 changes: 5 additions & 5 deletions docs/src/markdown/about/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ class Selector:

Flags | Description
------------------- | -----------
`SEL_EMPTY` | The current compound selector contained a `:empty` pseudo.
`SEL_EMPTY` | The current compound selector contained an `:empty` pseudo-class.
`SEL_ROOT` | The current compound selector contains `:root`.
`SEL_DEFAULT` | The compound selector has a `:default` pattern and requires additional logic to determine if it is the first `submit` button in a form.
`SEL_INDETERMINATE` | The compound selector has a `:indeterminate` pattern and requires additional logic to ensure a `radio` element and all of the `radio` elements with the same `name` under a form are not set.
Expand All @@ -201,8 +201,8 @@ Attribute | Description
`classes` | Contains a tuple of class names to match.
`attributes` | Contains a tuple of attributes. Each attribute is represented as a [`SelectorAttribute`](#selectorattribute).
`nth` | Contains a tuple containing `nth` selectors, each selector being represented as a [`SelectorNth`](#selectornth). `nth` selectors contain things like `:first-child`, `:only-child`, `#!css :nth-child()`, `#!css :nth-of-type()`, etc.
`selectors` | Contains a tuple of `SelectorList` objects for each pseudo class selector part of the compound selector: `#!css :is()`, `#!css :not()`, `#!css :has()`, etc.
`relation` | This will contain a `SelectorList` object with one `Selector` object, which could in turn chain an additional relation depending on the complexity of the compound selector. For instance, `div > p + a` would be a `Selector` for `a` that contains a `relation` for `p` (another `SelectorList` object) which also contains a relation of `div`. When matching, we would match that the tag is `a`, and then walk its relation chain verifying that they all match. In this case, the relation chain would be a direct, previous sibling of `p`, which has a direct parent of `div`. A `:has()` pseudo class would walk this in the opposite order. `div:has(> p + a)` would verify `div`, and then check for a child of `p` with a sibling of `a`.
`selectors` | Contains a tuple of `SelectorList` objects for each pseudo-class selector part of the compound selector: `#!css :is()`, `#!css :not()`, `#!css :has()`, etc.
`relation` | This will contain a `SelectorList` object with one `Selector` object, which could in turn chain an additional relation depending on the complexity of the compound selector. For instance, `div > p + a` would be a `Selector` for `a` that contains a `relation` for `p` (another `SelectorList` object) which also contains a relation of `div`. When matching, we would match that the tag is `a`, and then walk its relation chain verifying that they all match. In this case, the relation chain would be a direct, previous sibling of `p`, which has a direct parent of `div`. A `:has()` pseudo-class would walk this in the opposite order. `div:has(> p + a)` would verify `div`, and then check for a child of `p` with a sibling of `a`.
`rel_type` | `rel_type` is attached to relational selectors. In the case of `#!css div > p + a`, the relational selectors of `div` and `p` would get a relational type of `>` and `+` respectively. `:has()` relational `rel_type` are preceded with `:` to signify a forward looking relation.
`contains` | Contains a tuple of strings of content to match in an element.
`lang` | Contains a tuple of [`SelectorLang`](#selectorlang) objects.
Expand Down Expand Up @@ -268,8 +268,8 @@ Attribute | Description
`a` | The `a` value in the formula `an+b` specifying an index.
`n` | `True` if the provided formula has included a literal `n` which signifies the formula is not a static index.
`b` | The `b` value in the formula `an+b`.
`type` | `True` if the `nth` pseudo class is an `*-of-type` variant.
`last` | `True` if the `nth` pseudo class is a `*last*` variant.
`type` | `True` if the `nth` pseudo-class is an `*-of-type` variant.
`last` | `True` if the `nth` pseudo-class is a `*last*` variant.
`selectors` | A `SelectorList` object representing the `of S` portion of `:nth-chld(an+b [of S]?)`.

### `SelectorLang`
Expand Down
34 changes: 17 additions & 17 deletions docs/src/markdown/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ Early in development, flags were used to specify document type, but as of 1.0.0,
## `soupsieve.select()`

```py3
def select(select, node, namespaces=None, limit=0, flags=0):
def select(select, parent, namespaces=None, limit=0, flags=0):
"""Select the specified tags."""
```

`select` will return all tags under the given a tag, that match the given CSS selectors provided. You can also limit the number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).
`select` will return all tags under the given tag that match the given CSS selectors provided. You can also limit the number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).

`select` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`.
`select` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`.

```pycon3
>>> import soupsieve as sv
Expand All @@ -49,13 +49,13 @@ def iselect(select, node, namespaces=None, limit=0, flags=0):
## `soupsieve.match()`

```py3
def match(select, node, namespaces=None, flags=0):
def match(select, tag, namespaces=None, flags=0):
"""Match node."""
```

The `match` function matches a given `node`/element with a given CSS selector.
The `match` function matches a given tag with a given CSS selector.

`match` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, and flags.
`match` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary, and flags.

```pycon3
>>> nodes = sv.select('p:is(.a, .b, .c)', soup)
Expand All @@ -72,9 +72,9 @@ def filter(select, nodes, namespaces=None, flags=0):
"""Filter list of nodes."""
```

`filter` takes an iterable containing HTML `nodes`/elements and will filter them based on the provided CSS selector string. If given a Beautiful Soup tag, it will iterate the direct children that are tags.
`filter` takes an iterable containing HTML nodes and will filter them based on the provided CSS selector string. If given a `Tag`/`BeautifulSoup` object, it will iterate the direct children filtering them.

`filter` accepts a CSS selector string, an iterable containing tags, an optional [namespace](#namespaces) dictionary, and flags.
`filter` accepts a CSS selector string, an iterable containing nodes, an optional [namespace](#namespaces) dictionary, and flags.

```pycon3
>>> sv.filter('p:not(.b)', soup.div)
Expand All @@ -84,13 +84,13 @@ def filter(select, nodes, namespaces=None, flags=0):
## `soupsieve.comments()`

```
def comments(node, limit=0, flags=0):
def comments(parent, limit=0, flags=0):
"""Get comments only."""
```

The `comments` function can be used to extract all comments from a document or document tag. It will extract from the given tag down through all of its children. You can limit how many comments are returned with `limit`.
The `comments` function can be used to extract all comments from a document or document tag. It will return comments from the given tag down through all of its children. You can limit how many comments are returned with `limit`.

`comments` accepts a `node`/element, a `limit`, and flags.
`comments` accepts a `Tag`/`BeautifulSoup` object, a `limit`, and flags.

## `soupsieve.icomments()`

Expand All @@ -114,22 +114,22 @@ def compile(pattern, namespaces=None, flags=0):
class SoupSieve:
"""Match tags in Beautiful Soup with CSS selectors."""

def match(self, node):
def match(self, tag):
"""Match."""

def filter(self, nodes):
def filter(self, iterable):
"""Filter."""

def comments(self, node, limit=0):
def comments(self, parent, limit=0):
"""Get comments only."""

def icomments(self, node, limit=0):
def icomments(self, parent, limit=0):
"""Iterate comments only."""

def select(self, node, limit=0):
def select(self, parent, limit=0):
"""Select the specified tags."""

def iselect(self, node, limit=0):
def iselect(self, parent, limit=0):
"""Iterate the specified tags."""
```

Expand Down
3 changes: 3 additions & 0 deletions docs/src/markdown/selectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ Selector | Example | Descript
!!! warning "Experimental Selectors"
`:has()` and `of S` support (in `:nth-child(an+b [of S]?)`) is experimental and may change. There are currently no reference implementations available in any browsers, not to mention the CSS4 specifications have not been finalized, so current implementation is based on our best interpretation. Any issues should be reported.

!!! danger "Pseudo-elements"
Pseudo elements are not supported as they do not represent real elements.

### HTML Only Selectors

There are a number of selectors that apply specifically to HTML documents. Such selectors will only match tags in HTML documents. Use of these selectors are not restricted from XML, but when used with XML documents, they will never match.
Expand Down
29 changes: 14 additions & 15 deletions soupsieve/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,10 @@
from . import css_parser as cp
from . import css_match as cm
from . import css_types as ct
from .util import HTML, HTML5, XHTML, XML
from .util import DEBUG

__all__ = (
'HTML', 'HTML5', 'XHTML', 'XML',
'SoupSieve', 'compile', 'purge',
'SoupSieve', 'compile', 'purge', 'DEBUG',
'comments', 'icomments', 'select', 'iselect', 'match', 'filter'
)

Expand Down Expand Up @@ -65,39 +64,39 @@ def purge():
cp._purge_cache()


def match(select, node, namespaces=None, flags=0):
def match(select, tag, namespaces=None, flags=0):
"""Match node."""

return compile(select, namespaces, flags).match(node)
return compile(select, namespaces, flags).match(tag)


def filter(select, nodes, namespaces=None, flags=0): # noqa: A001
def filter(select, iterable, namespaces=None, flags=0): # noqa: A001
"""Filter list of nodes."""

return compile(select, namespaces, flags).filter(nodes)
return compile(select, namespaces, flags).filter(iterable)


def comments(node, limit=0, flags=0):
def comments(parent, limit=0, flags=0):
"""Get comments only."""

return compile("", None, flags).comments(node, limit)
return compile("", None, flags).comments(parent, limit)


def icomments(node, limit=0, flags=0):
def icomments(parent, limit=0, flags=0):
"""Iterate comments only."""

for tag in compile("", None, flags).icomments(node, limit):
for tag in compile("", None, flags).icomments(parent, limit):
yield tag


def select(select, node, namespaces=None, limit=0, flags=0):
def select(select, parent, namespaces=None, limit=0, flags=0):
"""Select the specified tags."""

return compile(select, namespaces, flags).select(node, limit)
return compile(select, namespaces, flags).select(parent, limit)


def iselect(select, node, namespaces=None, limit=0, flags=0):
def iselect(select, parent, namespaces=None, limit=0, flags=0):
"""Iterate the specified tags."""

for tag in compile(select, namespaces, flags).iselect(node, limit):
for tag in compile(select, namespaces, flags).iselect(parent, limit):
yield tag
2 changes: 1 addition & 1 deletion soupsieve/__meta__.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,5 +186,5 @@ def parse_version(ver, pre=False):
return Version(major, minor, micro, release, pre, post, dev)


__version_info__ = Version(1, 2, 0, "final")
__version_info__ = Version(1, 2, 1, "final")
__version__ = __version_info__._get_canonical()
Loading

0 comments on commit 96dc075

Please sign in to comment.