A couple of minor fixes / error handling improvements (#29)

A couple of minor fixes / error handling improvements filter should handle all iterables consistently (no special cases). More consistent naming of variables in API and better descriptions in documents. Capture pseudo-element pattern so that we can give a better exception. Add DEBUG flag for development purposes. Track index in selector pattern to bubble up more useful exceptions. More accurate exception descriptions. Fail for bad input type cache no meta language found
facelessuser · Dec 20, 2018 · 96dc075 · 96dc075
1 parent 025e80d
commit 96dc075
Show file tree

Hide file tree

Showing 14 changed files with 190 additions and 100 deletions.
diff --git a/docs/src/dictionary/en-custom.txt b/docs/src/dictionary/en-custom.txt
@@ -25,6 +25,7 @@ builtin
 deprecations
 html
 iterable
+iterables
 linter
 lxml
 matchable

diff --git a/docs/src/markdown/about/changelog.md b/docs/src/markdown/about/changelog.md
@@ -1,5 +1,13 @@
 # Changelog
 
+## 1.2.1
+
+- **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic.
+- **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters all Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a `Tag` object. This is viewed as fixing an inconsistency.
+- **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development.
+- **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language in the `meta` tag to prevent searching again during the current select.
+- **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`.
+
 ## 1.2.0
 
 - **NEW**: Add Python 2.7 support.

diff --git a/docs/src/markdown/about/development.md b/docs/src/markdown/about/development.md
@@ -189,7 +189,7 @@ class Selector:
 
 Flags               | Description
 ------------------- | -----------
-`SEL_EMPTY`         | The current compound selector contained a `:empty` pseudo.
+`SEL_EMPTY`         | The current compound selector contained an `:empty` pseudo-class.
 `SEL_ROOT`          | The current compound selector contains `:root`.
 `SEL_DEFAULT`       | The compound selector has a `:default` pattern  and requires additional logic to determine if it is the first `submit` button in a form.
 `SEL_INDETERMINATE` | The compound selector has a `:indeterminate` pattern and requires additional logic to ensure a `radio` element and all of the `radio` elements with the same `name` under a form are not set.
@@ -201,8 +201,8 @@ Attribute       | Description
 `classes`       | Contains a tuple of class names to match.
 `attributes`    | Contains a tuple of attributes. Each attribute is represented as a [`SelectorAttribute`](#selectorattribute).
 `nth`           | Contains a tuple containing `nth` selectors, each selector being represented as a [`SelectorNth`](#selectornth). `nth` selectors contain things like `:first-child`, `:only-child`, `#!css :nth-child()`, `#!css :nth-of-type()`, etc.
-`selectors`     | Contains a tuple of `SelectorList` objects for each pseudo class selector  part of the compound selector: `#!css :is()`, `#!css :not()`, `#!css :has()`, etc.
-`relation`      | This will contain a `SelectorList` object with one `Selector` object, which could in turn chain an additional relation depending on the complexity of the compound selector.  For instance, `div > p + a` would be a `Selector` for `a` that contains a `relation` for `p` (another `SelectorList` object) which also contains a relation of `div`.  When matching, we would match that the tag is `a`, and then walk its relation chain verifying that they all match. In this case, the relation chain would be a direct, previous sibling of `p`, which has a direct parent of `div`. A `:has()` pseudo class would walk this in the opposite order. `div:has(> p + a)` would verify `div`, and then check for a child of `p` with a sibling of `a`.
+`selectors`     | Contains a tuple of `SelectorList` objects for each pseudo-class selector  part of the compound selector: `#!css :is()`, `#!css :not()`, `#!css :has()`, etc.
+`relation`      | This will contain a `SelectorList` object with one `Selector` object, which could in turn chain an additional relation depending on the complexity of the compound selector.  For instance, `div > p + a` would be a `Selector` for `a` that contains a `relation` for `p` (another `SelectorList` object) which also contains a relation of `div`.  When matching, we would match that the tag is `a`, and then walk its relation chain verifying that they all match. In this case, the relation chain would be a direct, previous sibling of `p`, which has a direct parent of `div`. A `:has()` pseudo-class would walk this in the opposite order. `div:has(> p + a)` would verify `div`, and then check for a child of `p` with a sibling of `a`.
 `rel_type`      | `rel_type` is attached to relational selectors. In the case of `#!css div > p + a`, the relational selectors of `div` and `p` would get a relational type of `>` and `+` respectively. `:has()` relational `rel_type` are preceded with `:` to signify a forward looking relation.
 `contains`      | Contains a tuple of strings of content to match in an element.
 `lang`          | Contains a tuple of [`SelectorLang`](#selectorlang) objects.
@@ -268,8 +268,8 @@ Attribute     | Description
 `a`           | The `a` value in the formula `an+b` specifying an index.
 `n`           | `True` if the provided formula has included a literal `n` which signifies the formula is not a static index.
 `b`           | The `b` value in the formula `an+b`.
-`type`        | `True` if the `nth` pseudo class is an `*-of-type` variant.
-`last`        | `True` if the `nth` pseudo class is a `*last*` variant.
+`type`        | `True` if the `nth` pseudo-class is an `*-of-type` variant.
+`last`        | `True` if the `nth` pseudo-class is a `*last*` variant.
 `selectors`   | A `SelectorList` object representing the `of S` portion of `:nth-chld(an+b [of S]?)`.
 
 ### `SelectorLang`

diff --git a/docs/src/markdown/api.md b/docs/src/markdown/api.md
@@ -23,13 +23,13 @@ Early in development, flags were used to specify document type, but as of 1.0.0,
 ## `soupsieve.select()`
 
 ```py3
-def select(select, node, namespaces=None, limit=0, flags=0):
+def select(select, parent, namespaces=None, limit=0, flags=0):
     """Select the specified tags."""
 ```
 
-`select` will return all tags under the given a tag, that match the given CSS selectors provided. You can also limit the number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).
+`select` will return all tags under the given tag that match the given CSS selectors provided. You can also limit the number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).
 
-`select` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`.
+`select` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`.
 
 ```pycon3
 >>> import soupsieve as sv
@@ -49,13 +49,13 @@ def iselect(select, node, namespaces=None, limit=0, flags=0):
 ## `soupsieve.match()`
 
 ```py3
-def match(select, node, namespaces=None, flags=0):
+def match(select, tag, namespaces=None, flags=0):
     """Match node."""
 ```
 
-The `match` function matches a given `node`/element with a given CSS selector.
+The `match` function matches a given tag with a given CSS selector.
 
-`match` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, and flags.
+`match` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary, and flags.
 
 ```pycon3
 >>> nodes = sv.select('p:is(.a, .b, .c)', soup)
@@ -72,9 +72,9 @@ def filter(select, nodes, namespaces=None, flags=0):
     """Filter list of nodes."""
 ```
 
-`filter` takes an iterable containing HTML `nodes`/elements and will filter them based on the provided CSS selector string. If given a Beautiful Soup tag, it will iterate the direct children that are tags.
+`filter` takes an iterable containing HTML nodes and will filter them based on the provided CSS selector string. If given a `Tag`/`BeautifulSoup` object, it will iterate the direct children filtering them.
 
-`filter` accepts a CSS selector string, an iterable containing tags, an optional [namespace](#namespaces) dictionary, and flags.
+`filter` accepts a CSS selector string, an iterable containing nodes, an optional [namespace](#namespaces) dictionary, and flags.
 
 ```pycon3
 >>> sv.filter('p:not(.b)', soup.div)
@@ -84,13 +84,13 @@ def filter(select, nodes, namespaces=None, flags=0):
 ## `soupsieve.comments()`
 
 ```
-def comments(node, limit=0, flags=0):
+def comments(parent, limit=0, flags=0):
     """Get comments only."""
 ```
 
-The `comments` function can be used to extract all comments from a document or document tag. It will extract from the given tag down through all of its children.  You can limit how many comments are returned with `limit`.
+The `comments` function can be used to extract all comments from a document or document tag. It will return comments from the given tag down through all of its children.  You can limit how many comments are returned with `limit`.
 
-`comments` accepts a `node`/element, a `limit`, and flags.
+`comments` accepts a `Tag`/`BeautifulSoup` object, a `limit`, and flags.
 
 ## `soupsieve.icomments()`
 
@@ -114,22 +114,22 @@ def compile(pattern, namespaces=None, flags=0):
 class SoupSieve:
     """Match tags in Beautiful Soup with CSS selectors."""
 
-    def match(self, node):
+    def match(self, tag):
         """Match."""
 
-    def filter(self, nodes):
+    def filter(self, iterable):
         """Filter."""
 
-    def comments(self, node, limit=0):
+    def comments(self, parent, limit=0):
         """Get comments only."""
 
-    def icomments(self, node, limit=0):
+    def icomments(self, parent, limit=0):
         """Iterate comments only."""
 
-    def select(self, node, limit=0):
+    def select(self, parent, limit=0):
         """Select the specified tags."""
 
-    def iselect(self, node, limit=0):
+    def iselect(self, parent, limit=0):
         """Iterate the specified tags."""
 ```
 

diff --git a/docs/src/markdown/selectors.md b/docs/src/markdown/selectors.md
@@ -59,6 +59,9 @@ Selector                        | Example                             | Descript
 !!! warning "Experimental Selectors"
     `:has()` and `of S` support (in `:nth-child(an+b [of S]?)`) is experimental and may change. There are currently no reference implementations available in any browsers, not to mention the CSS4 specifications have not been finalized, so current implementation is based on our best interpretation. Any issues should be reported.
 
+!!! danger "Pseudo-elements"
+    Pseudo elements are not supported as they do not represent real elements.
+
 ### HTML Only Selectors
 
 There are a number of selectors that apply specifically to HTML documents. Such selectors will only match tags in HTML documents. Use of these selectors are not restricted from XML, but when used with XML documents, they will never match.

diff --git a/soupsieve/__init__.py b/soupsieve/__init__.py
@@ -30,11 +30,10 @@
 from . import css_parser as cp
 from . import css_match as cm
 from . import css_types as ct
-from .util import HTML, HTML5, XHTML, XML
+from .util import DEBUG
 
 __all__ = (
-    'HTML', 'HTML5', 'XHTML', 'XML',
-    'SoupSieve', 'compile', 'purge',
+    'SoupSieve', 'compile', 'purge', 'DEBUG',
     'comments', 'icomments', 'select', 'iselect', 'match', 'filter'
 )
 
@@ -65,39 +64,39 @@ def purge():
     cp._purge_cache()
 
 
-def match(select, node, namespaces=None, flags=0):
+def match(select, tag, namespaces=None, flags=0):
     """Match node."""
 
-    return compile(select, namespaces, flags).match(node)
+    return compile(select, namespaces, flags).match(tag)
 
 
-def filter(select, nodes, namespaces=None, flags=0):  # noqa: A001
+def filter(select, iterable, namespaces=None, flags=0):  # noqa: A001
     """Filter list of nodes."""
 
-    return compile(select, namespaces, flags).filter(nodes)
+    return compile(select, namespaces, flags).filter(iterable)
 
 
-def comments(node, limit=0, flags=0):
+def comments(parent, limit=0, flags=0):
     """Get comments only."""
 
-    return compile("", None, flags).comments(node, limit)
+    return compile("", None, flags).comments(parent, limit)
 
 
-def icomments(node, limit=0, flags=0):
+def icomments(parent, limit=0, flags=0):
     """Iterate comments only."""
 
-    for tag in compile("", None, flags).icomments(node, limit):
+    for tag in compile("", None, flags).icomments(parent, limit):
         yield tag
 
 
-def select(select, node, namespaces=None, limit=0, flags=0):
+def select(select, parent, namespaces=None, limit=0, flags=0):
     """Select the specified tags."""
 
-    return compile(select, namespaces, flags).select(node, limit)
+    return compile(select, namespaces, flags).select(parent, limit)
 
 
-def iselect(select, node, namespaces=None, limit=0, flags=0):
+def iselect(select, parent, namespaces=None, limit=0, flags=0):
     """Iterate the specified tags."""
 
-    for tag in compile(select, namespaces, flags).iselect(node, limit):
+    for tag in compile(select, namespaces, flags).iselect(parent, limit):
         yield tag
diff --git a/soupsieve/__meta__.py b/soupsieve/__meta__.py
@@ -186,5 +186,5 @@ def parse_version(ver, pre=False):
     return Version(major, minor, micro, release, pre, post, dev)
 
 
-__version_info__ = Version(1, 2, 0, "final")
+__version_info__ = Version(1, 2, 1, "final")
 __version__ = __version_info__._get_canonical()