Skip to content

Commit

Permalink
Better exception messages
Browse files Browse the repository at this point in the history
Improve some of the exception messages and fix the reporting of the
character index after a combinator is parsed.
  • Loading branch information
facelessuser committed Jan 8, 2019
1 parent 0ee713f commit 1a9d879
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 40 deletions.
1 change: 1 addition & 0 deletions docs/src/dictionary/en-custom.txt
Expand Up @@ -25,6 +25,7 @@ XHTML
accessor
boolean
builtin
combinator
deprecations
directionality
html
Expand Down
32 changes: 17 additions & 15 deletions docs/src/markdown/about/changelog.md
Expand Up @@ -7,6 +7,7 @@
## 1.6.3

- **FIX**: Fix pickling issue when compiled selector contains a `NullSelector` object. (#70)
- **FIX**: Better exception messages and fix a position reporting issue that can occur in some excpetions.

## 1.6.2

Expand Down Expand Up @@ -45,20 +46,20 @@

- **NEW**: Add support for `:scope`.
- **NEW**: `:user-invalid`, `:playing`, `:paused`, and `:local-link` will not cause a failure, but all will match
nothing as their use cases are not possible in an environment outside a web browser.
nothing as their use cases are not possible in an environment outside a web browser.
- **FIX**: Fix `[attr~=value]` handling of whitespace. According to the spec, if the value contains whitespace, or is an
empty string, it should not match anything.
empty string, it should not match anything.
- **FIX**: Precompile internal patterns for pseudo-classes to prevent having to parse them again.

## 1.2.1

- **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic.
- **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters all
Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a
`Tag` object. This is viewed as fixing an inconsistency.
Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a
`Tag` object. This is viewed as fixing an inconsistency.
- **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development.
- **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language in
the `meta` tag to prevent searching again during the current select.
the `meta` tag to prevent searching again during the current select.
- **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`.

## 1.2.0
Expand All @@ -70,25 +71,26 @@ the `meta` tag to prevent searching again during the current select.

- **NEW**: Adds support for `[attr!=value]` which is equivalent to `:not([attr=value])`.
- **NEW**: Add support for `:active`, `:focus`, `:hover`, `:visited`, `:target`, `:focus-within`, `:focus-visible`,
`:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't exist
`:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't
exist
in the Soup Sieve environment.
- **NEW**: Add support for `:checked`, `:enabled`, `:disabled`, `:required`, `:optional`, `:default`, and
`:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML.
`:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML.
- **NEW**: Add support for `:link` and `:any-link`, both of which will target all `<a>`, `<area>`, and `<link>` elements
with an `href` attribute as all links will be treated as unvisited in Soup Sieve.
- **NEW**: Add support for `:lang()` (CSS4) which works in XML and HTML.
- **NEW**: Users must install Beautiful Soup themselves. This requirement is removed in the hopes that Beautiful Soup
may use this in the future.
may use this in the future.
- **FIX**: Attributes in the form `prefix:attr` can be matched with the form `[prefix\:attr]` without specifying a
namespaces if desired.
namespaces if desired.
- **FIX**: Fix exception when `[type]` is used (with no value).

## 1.0.2

- **FIX**: Use proper CSS identifier patterns for tag names, classes, ids, etc. Things like `#3` or `#-3` should not
match and should require `#\33` or `#-\33`.
match and should require `#\33` or `#-\33`.
- **FIX**: Do not raise `NotImplementedError` for supported pseudo classes/elements with bad syntax, instead raise
`SyntaxError`.
`SyntaxError`.

## 1.0.1

Expand All @@ -104,9 +106,9 @@ match and should require `#\33` or `#-\33`.
- **NEW**: Drop document flags. Document type can be detected from the Beautiful Soup object directly.
- **FIX**: CSS selectors should be evaluated with CSS whitespace rules.
- **FIX**: Processing instructions, CDATA, and declarations should all be ignored in `:contains` and child
considerations for `:empty`.
considerations for `:empty`.
- **FIX**: In Beautiful Soup, the document itself is the first tag. Do not match the "document" tag by returning false
for any tag that doesn't have a parent.
for any tag that doesn't have a parent.

## 1.0.0b1

Expand All @@ -122,7 +124,7 @@ for any tag that doesn't have a parent.
## 0.5.3

- **FIX**: Previously, all pseudo classes' selector lists were evaluated as one big group, but now each pseudo classes'
selector lists are evaluated separately.
selector lists are evaluated separately.
- **FIX**: CSS selector tokens are not case sensitive.

## 0.5.2
Expand All @@ -131,7 +133,7 @@ selector lists are evaluated separately.
- **FIX**: Relax attribute pattern matching to allow non-essential whitespace.
- **FIX**: Attribute selector flags themselves are not case sensitive.
- **FIX**: `type` attribute in HTML is handled special. While all other attributes values are case sensitive, `type` in
HTML is usually treated special and is insensitive. In XML, this is not the case.
HTML is usually treated special and is insensitive. In XML, this is not the case.

## 0.5.1

Expand Down
60 changes: 35 additions & 25 deletions soupsieve/css_parser.py
Expand Up @@ -154,9 +154,9 @@

# Constants
# List split token
SPLIT = ','
# Relation type `:has()` descendant, the default relation type.
REL_HAS_CHILD = ": "
COMMA_COMBINATOR = ','
# Relation token for descendant
WS_COMBINATOR = " "

# Parse flags
FLG_PSEUDO = 0x01
Expand Down Expand Up @@ -550,32 +550,42 @@ def parse_pseudo_open(self, sel, name, has_selector, iselector, index):
has_selector = True
return has_selector

def parse_has_split(self, sel, m, has_selector, selectors, rel_type):
"""Parse splitting tokens."""
def parse_has_combinator(self, sel, m, has_selector, selectors, rel_type, index):
"""Parse combinator tokens."""

if m.group('relation') == SPLIT:
combinator = m.group('relation').strip()
if not combinator:
combinator = WS_COMBINATOR
if combinator == COMMA_COMBINATOR:
if not has_selector:
raise SyntaxError("Cannot start or end selector with '{}'".format(m.group('relation')))
raise SyntaxError(
"The combinator '{}' at postion {}, must have a selector before it".format(combinator, index)
)
sel.rel_type = rel_type
selectors[-1].relations.append(sel)
rel_type = REL_HAS_CHILD
rel_type = ":" + WS_COMBINATOR
selectors.append(_Selector())
else:
if has_selector:
sel.rel_type = rel_type
selectors[-1].relations.append(sel)
rel_type = ':' + m.group('relation')
rel_type = ':' + combinator
sel = _Selector()

has_selector = False
return has_selector, sel, rel_type

def parse_split(self, sel, m, has_selector, selectors, relations, is_pseudo):
"""Parse splitting tokens."""
def parse_combinator(self, sel, m, has_selector, selectors, relations, is_pseudo, index):
"""Parse combinator tokens."""

combinator = m.group('relation').strip()
if not combinator:
combinator = WS_COMBINATOR
if not has_selector:
raise SyntaxError("Cannot start or end selector with '{}'".format(m.group('relation')))
if m.group('relation') == SPLIT:
raise SyntaxError(
"The combinator '{}' at postion {}, must have a selector before it".format(combinator, index)
)
if combinator == COMMA_COMBINATOR:
if not sel.tag and not is_pseudo:
# Implied `*`
sel.tag = ct.SelectorTag('*', None)
Expand All @@ -584,10 +594,7 @@ def parse_split(self, sel, m, has_selector, selectors, relations, is_pseudo):
del relations[:]
else:
sel.relations.extend(relations)
rel_type = m.group('relation').strip()
if not rel_type:
rel_type = ' '
sel.rel_type = rel_type
sel.rel_type = combinator
del relations[:]
relations.append(sel)
sel = _Selector()
Expand Down Expand Up @@ -662,7 +669,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
has_selector = False
closed = False
relations = []
rel_type = REL_HAS_CHILD
rel_type = ":" + WS_COMBINATOR
split_last = False
is_open = flags & FLG_OPEN
is_pseudo = flags & FLG_PSEUDO
Expand Down Expand Up @@ -720,22 +727,25 @@ def parse_selectors(self, iselector, index=0, flags=0):
is_html = True
elif key == 'pseudo_close':
if split_last:
raise SyntaxError("Expecting more selectors at postion {}".format(m.start(0)))
raise SyntaxError("Expected a selector at postion {}".format(m.start(0)))
if is_open:
closed = True
break
else:
raise SyntaxError("Unmatched pseudo-class close at postion {}".format(m.start(0)))
elif key == 'combine':
if split_last:
raise SyntaxError("Unexpected combining character at position {}".format(m.start(0)))
raise SyntaxError("Unexpected combinator at position {}".format(m.start(0)))
if is_relative:
has_selector, sel, rel_type = self.parse_has_split(
sel, m, has_selector, selectors, rel_type
has_selector, sel, rel_type = self.parse_has_combinator(
sel, m, has_selector, selectors, rel_type, index
)
else:
has_selector, sel = self.parse_split(sel, m, has_selector, selectors, relations, is_pseudo)
has_selector, sel = self.parse_combinator(
sel, m, has_selector, selectors, relations, is_pseudo, index
)
split_last = True
index = m.end(0)
continue
elif key == 'attribute':
has_selector = self.parse_attribute_selector(sel, m, has_selector)
Expand All @@ -755,7 +765,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
raise SyntaxError("Unclosed pseudo-class at position {}".format(index))

if split_last:
raise SyntaxError("Expected more selectors at position {}".format(index))
raise SyntaxError("Expected a selector at position {}".format(index))

if has_selector:
if not sel.tag and not is_pseudo:
Expand All @@ -770,7 +780,7 @@ def parse_selectors(self, iselector, index=0, flags=0):
selectors.append(sel)
elif is_relative:
# We will always need to finish a selector when `:has()` is used as it leads with combining.
raise SyntaxError('Missing selectors after combining type.')
raise SyntaxError('Expected a selector at position {}'.format(index))

# Some patterns require additional logic, such as default. We try to make these the
# last pattern, and append the appropriate flag to that selector which communicates
Expand Down

0 comments on commit 1a9d879

Please sign in to comment.