Skip to content

Commit

Permalink
Improve exception messages when a token is not found (#73)
Browse files Browse the repository at this point in the history
In most cases, a full selector is parsed as a token, and if not, we
report the character we are at as invalid. While better context is
provided in exceptions for tokens that are fully parsed and validated,
when we don't parse a token, very little is conveyed back to the user.
There are only a few types of selectors allowed in CSS: tags, attr,
class, id, pseudo-class, pseudo-elements, and combinators. You can have
variations of these, but that is it.  We cover most of these cases
already with decent context, but there are still a few case that are
just reported as invalid characters. Moving forward, we will see if the
invalid character represents the start of one of the basic, supported
types. Only attr, class, id, and pseudo-(class|element) really fall into
the category that would generate one of these ambiguous exceptions, so
identify their starting character and raise an appropriate "malformed"
selector exception.

Also remove unnecessary "+" in PAT_PSEUDO_CLASS.
  • Loading branch information
facelessuser committed Jan 9, 2019
1 parent 73b460a commit 9fb1f46
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/src/markdown/about/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
## 1.6.3

- **FIX**: Fix pickling issue when compiled selector contains a `NullSelector` object. (#70)
- **FIX**: Better exception messages and fix a position reporting issue that can occur in some exceptions.
- **FIX**: Better exception messages in the CSS selector parser and fix a position reporting issue that can occur in some exceptions.

## 1.6.2

Expand Down
20 changes: 16 additions & 4 deletions soupsieve/css_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
# Attributes (`[attr]`, `[attr=value]`, etc.)
PAT_ATTR = r'\[{ws}*(?P<ns_attr>(?:(?:{ident}|\*)?\|)?{ident}){attr}'.format(ws=WSC, ident=IDENTIFIER, attr=ATTR)
# Pseudo class (`:pseudo-class`, `:pseudo-class(`)
PAT_PSEUDO_CLASS = r'(?P<name>:{ident}+)(?P<open>\({ws}*)?'.format(ws=WSC, ident=IDENTIFIER)
PAT_PSEUDO_CLASS = r'(?P<name>:{ident})(?P<open>\({ws}*)?'.format(ws=WSC, ident=IDENTIFIER)
# Closing pseudo group (`)`)
PAT_PSEUDO_CLOSE = r'{ws}*\)'.format(ws=WSC)
# Pseudo element (`::pseudo-element`)
Expand Down Expand Up @@ -820,9 +820,21 @@ def selector_iter(self, pattern):
yield k, m
break
if m is None:
if self.debug: # pragma: no cover
print("TOKEN: 'invalid' --> {!r} at position {}".format(pattern[index], index))
raise SyntaxError("Invlaid character {!r} at position {}".format(pattern[index], index))
c = pattern[index]
# If the character represents the start of one of the known selector types,
# throw an exception mentions that the known selector type in error;
# otherwise, report the invalid character.
if c == '[':
msg = "Malformed attribute selector at position {}".format(index)
elif c == '.':
msg = "Malformed class selector at position {}".format(index)
elif c == '#':
msg = "Malformed id selector at position {}".format(index)
elif c == ':':
msg = "Malformed pseudo-class selector at position {}".format(index)
else:
msg = "Invalid character {!r} position {}".format(c, index)
raise SyntaxError(msg)
if self.debug: # pragma: no cover
print('## END PARSING')

Expand Down
19 changes: 19 additions & 0 deletions tests/test_soupsieve.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,25 @@ def test_invalid_syntax(self):
with self.assertRaises(SyntaxError):
sv.compile('div?')

def test_malformed_selectors(self):
"""Test malformed selectors."""

# Malformed attribute
with self.assertRaises(SyntaxError):
sv.compile('div[attr={}]')

# Malformed class
with self.assertRaises(SyntaxError):
sv.compile('td.+#some-id')

# Malformed id
with self.assertRaises(SyntaxError):
sv.compile('td#.some-class')

# Malformed pseudo-class
with self.assertRaises(SyntaxError):
sv.compile('td:[href]')

def test_invalid_namespace(self):
"""Test invalid namespace."""

Expand Down

0 comments on commit 9fb1f46

Please sign in to comment.