Better exception messages

Improve some of the exception messages and fix the reporting of the character index after a combinator is parsed.
facelessuser · Jan 8, 2019 · 1a9d879 · 1a9d879
1 parent 0ee713f
commit 1a9d879
Show file tree

Hide file tree

Showing 3 changed files with 53 additions and 40 deletions.
diff --git a/docs/src/dictionary/en-custom.txt b/docs/src/dictionary/en-custom.txt
@@ -25,6 +25,7 @@ XHTML
 accessor
 boolean
 builtin
+combinator
 deprecations
 directionality
 html

diff --git a/docs/src/markdown/about/changelog.md b/docs/src/markdown/about/changelog.md
@@ -7,6 +7,7 @@
 ## 1.6.3
 
 - **FIX**: Fix pickling issue when compiled selector contains a `NullSelector` object. (#70)
+- **FIX**: Better exception messages and fix a position reporting issue that can occur in some excpetions.
 
 ## 1.6.2
 
@@ -45,20 +46,20 @@
 
 - **NEW**: Add support for `:scope`.
 - **NEW**: `:user-invalid`, `:playing`, `:paused`, and `:local-link` will not cause a failure, but all will match
-nothing as their use cases are not possible in an environment outside a web browser.
+  nothing as their use cases are not possible in an environment outside a web browser.
 - **FIX**: Fix `[attr~=value]` handling of whitespace. According to the spec, if the value contains whitespace, or is an
-empty string, it should not match anything.
+  empty string, it should not match anything.
 - **FIX**: Precompile internal patterns for pseudo-classes to prevent having to parse them again.
 
 ## 1.2.1
 
 - **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic.
 - **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters all
-Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a
-`Tag` object. This is viewed as fixing an inconsistency.
+  Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a
+  `Tag` object. This is viewed as fixing an inconsistency.
 - **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development.
 - **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language in
-the `meta` tag to prevent searching again during the current select.
+  the `meta` tag to prevent searching again during the current select.
 - **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`.
 
 ## 1.2.0
@@ -70,25 +71,26 @@ the `meta` tag to prevent searching again during the current select.
 
 - **NEW**: Adds support for `[attr!=value]` which is equivalent to `:not([attr=value])`.
 - **NEW**: Add support for `:active`, `:focus`, `:hover`, `:visited`, `:target`, `:focus-within`, `:focus-visible`,
-`:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't exist
+  `:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't
+  exist
 in the Soup Sieve environment.
 - **NEW**: Add support for `:checked`, `:enabled`, `:disabled`, `:required`, `:optional`, `:default`, and
-`:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML.
+  `:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML.
 - **NEW**: Add support for `:link` and `:any-link`, both of which will target all `<a>`, `<area>`, and `<link>` elements
     with an `href` attribute as all links will be treated as unvisited in Soup Sieve.
 - **NEW**: Add support for `:lang()` (CSS4) which works in XML and HTML.
 - **NEW**: Users must install Beautiful Soup themselves. This requirement is removed in the hopes that Beautiful Soup
-may use this in the future.
+  may use this in the future.
 - **FIX**: Attributes in the form `prefix:attr` can be matched with the form `[prefix\:attr]` without specifying a
-namespaces if desired.
+  namespaces if desired.
 - **FIX**: Fix exception when `[type]` is used (with no value).
 
 ## 1.0.2
 
 - **FIX**: Use proper CSS identifier patterns for tag names, classes, ids, etc. Things like `#3` or `#-3` should not
-match and should require `#\33` or `#-\33`.
+  match and should require `#\33` or `#-\33`.
 - **FIX**: Do not raise `NotImplementedError` for supported pseudo classes/elements with bad syntax, instead raise
-`SyntaxError`.
+  `SyntaxError`.
 
 ## 1.0.1
 
@@ -104,9 +106,9 @@ match and should require `#\33` or `#-\33`.
 - **NEW**: Drop document flags. Document type can be detected from the Beautiful Soup object directly.
 - **FIX**: CSS selectors should be evaluated with CSS whitespace rules.
 - **FIX**: Processing instructions, CDATA, and declarations should all be ignored in `:contains` and child
-considerations for `:empty`.
+  considerations for `:empty`.
 - **FIX**: In Beautiful Soup, the document itself is the first tag. Do not match the "document" tag by returning false
-for any tag that doesn't have a parent.
+  for any tag that doesn't have a parent.
 
 ## 1.0.0b1
 
@@ -122,7 +124,7 @@ for any tag that doesn't have a parent.
 ## 0.5.3
 
 - **FIX**: Previously, all pseudo classes' selector lists were evaluated as one big group, but now each pseudo classes'
-selector lists are evaluated separately.
+  selector lists are evaluated separately.
 - **FIX**: CSS selector tokens are not case sensitive.
 
 ## 0.5.2
@@ -131,7 +133,7 @@ selector lists are evaluated separately.
 - **FIX**: Relax attribute pattern matching to allow non-essential whitespace.
 - **FIX**: Attribute selector flags themselves are not case sensitive.
 - **FIX**: `type` attribute in HTML is handled special. While all other attributes values are case sensitive, `type` in
-HTML is usually treated special and is insensitive. In XML, this is not the case.
+  HTML is usually treated special and is insensitive. In XML, this is not the case.
 
 ## 0.5.1
 

diff --git a/soupsieve/css_parser.py b/soupsieve/css_parser.py
@@ -154,9 +154,9 @@
 
 # Constants
 # List split token
-SPLIT = ','
-# Relation type `:has()` descendant, the default relation type.
-REL_HAS_CHILD = ": "
+COMMA_COMBINATOR = ','
+# Relation token for descendant
+WS_COMBINATOR = " "
 
 # Parse flags
 FLG_PSEUDO = 0x01
@@ -550,32 +550,42 @@ def parse_pseudo_open(self, sel, name, has_selector, iselector, index):
         has_selector = True
         return has_selector
 
-    def parse_has_split(self, sel, m, has_selector, selectors, rel_type):
-        """Parse splitting tokens."""
+    def parse_has_combinator(self, sel, m, has_selector, selectors, rel_type, index):
+        """Parse combinator tokens."""
 
-        if m.group('relation') == SPLIT:
+        combinator = m.group('relation').strip()
+        if not combinator:
+            combinator = WS_COMBINATOR
+        if combinator == COMMA_COMBINATOR:
             if not has_selector:
-                raise SyntaxError("Cannot start or end selector with '{}'".format(m.group('relation')))
+                raise SyntaxError(
+                    "The combinator '{}' at postion {}, must have a selector before it".format(combinator, index)
+                )
             sel.rel_type = rel_type
             selectors[-1].relations.append(sel)
-            rel_type = REL_HAS_CHILD
+            rel_type = ":" + WS_COMBINATOR
             selectors.append(_Selector())
         else:
             if has_selector:
                 sel.rel_type = rel_type
                 selectors[-1].relations.append(sel)
-            rel_type = ':' + m.group('relation')
+            rel_type = ':' + combinator
         sel = _Selector()
 
         has_selector = False
         return has_selector, sel, rel_type
 
-    def parse_split(self, sel, m, has_selector, selectors, relations, is_pseudo):
-        """Parse splitting tokens."""
+    def parse_combinator(self, sel, m, has_selector, selectors, relations, is_pseudo, index):
+        """Parse combinator tokens."""
 
+        combinator = m.group('relation').strip()
+        if not combinator:
+            combinator = WS_COMBINATOR
         if not has_selector:
-            raise SyntaxError("Cannot start or end selector with '{}'".format(m.group('relation')))
-        if m.group('relation') == SPLIT:
+            raise SyntaxError(
+                "The combinator '{}' at postion {}, must have a selector before it".format(combinator, index)
+            )
+        if combinator == COMMA_COMBINATOR:
             if not sel.tag and not is_pseudo:
                 # Implied `*`
                 sel.tag = ct.SelectorTag('*', None)
@@ -584,10 +594,7 @@ def parse_split(self, sel, m, has_selector, selectors, relations, is_pseudo):
             del relations[:]
         else:
             sel.relations.extend(relations)
-            rel_type = m.group('relation').strip()
-            if not rel_type:
-                rel_type = ' '
-            sel.rel_type = rel_type
+            sel.rel_type = combinator
             del relations[:]
             relations.append(sel)
         sel = _Selector()
@@ -662,7 +669,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
         has_selector = False
         closed = False
         relations = []
-        rel_type = REL_HAS_CHILD
+        rel_type = ":" + WS_COMBINATOR
         split_last = False
         is_open = flags & FLG_OPEN
         is_pseudo = flags & FLG_PSEUDO
@@ -720,22 +727,25 @@ def parse_selectors(self, iselector, index=0, flags=0):
                     is_html = True
                 elif key == 'pseudo_close':
                     if split_last:
-                        raise SyntaxError("Expecting more selectors at postion {}".format(m.start(0)))
+                        raise SyntaxError("Expected a selector at postion {}".format(m.start(0)))
                     if is_open:
                         closed = True
                         break
                     else:
                         raise SyntaxError("Unmatched pseudo-class close at postion {}".format(m.start(0)))
                 elif key == 'combine':
                     if split_last:
-                        raise SyntaxError("Unexpected combining character at position {}".format(m.start(0)))
+                        raise SyntaxError("Unexpected combinator at position {}".format(m.start(0)))
                     if is_relative:
-                        has_selector, sel, rel_type = self.parse_has_split(
-                            sel, m, has_selector, selectors, rel_type
+                        has_selector, sel, rel_type = self.parse_has_combinator(
+                            sel, m, has_selector, selectors, rel_type, index
                         )
                     else:
-                        has_selector, sel = self.parse_split(sel, m, has_selector, selectors, relations, is_pseudo)
+                        has_selector, sel = self.parse_combinator(
+                            sel, m, has_selector, selectors, relations, is_pseudo, index
+                        )
                     split_last = True
+                    index = m.end(0)
                     continue
                 elif key == 'attribute':
                     has_selector = self.parse_attribute_selector(sel, m, has_selector)
@@ -755,7 +765,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
             raise SyntaxError("Unclosed pseudo-class at position {}".format(index))
 
         if split_last:
-            raise SyntaxError("Expected more selectors at position {}".format(index))
+            raise SyntaxError("Expected a selector at position {}".format(index))
 
         if has_selector:
             if not sel.tag and not is_pseudo:
@@ -770,7 +780,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
                 selectors.append(sel)
         elif is_relative:
             # We will always need to finish a selector when `:has()` is used as it leads with combining.
-            raise SyntaxError('Missing selectors after combining type.')
+            raise SyntaxError('Expected a selector at position {}'.format(index))
 
         # Some patterns require additional logic, such as default. We try to make these the
         # last pattern, and append the appropriate flag to that selector which communicates