Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] exception handling was hiding original exception #38

Merged
merged 1 commit into from
Apr 25, 2016

Conversation

Digenis
Copy link
Member

@Digenis Digenis commented Mar 30, 2016

Any xpath error was caught and reraised as a ValueError
complaining about an Invalid XPath,
quoting the original xpath for debugging purposes.

First, "Invalid XPath" is misleading
because the same exception is also raised for xpath evaluation errors.
However it also hides the original exception message
which ends up making xpath debugging harder.
I made it quote the original exception message too
which can be "Unregisted function", "Invalid Expression",
"Undefined namespace prefix" etc.

Now before merging:
Any exception can occur during xpath evaluation
because any python function can be registered and called in an xpath.
I doubt there's anyone wrapping xpath method calls in user code
in another try/except blocks for "ValueError".
Even if somebody actually does this,
I bet it's for logging some custom error message
and this can't justify the usefulness of the current try/except block.
That's why I'm leaning more towards dropping the try/except altogether.
However I opened this PR instead because I doubt you'd accept dropping it.

@codecov-io
Copy link

Current coverage is 100.00%

Merging #38 into master will not affect coverage as of fa844b9

Powered by Codecov. Updated on successful CI builds.

@eliasdorneles
Copy link
Member

Yeah, I've been bitten by this in the past, +1.

I agree that Invalid XPath is misleading, but I could see one reason for keeping it: an user might be catching ValueError and testing for exception message to catch errors in XPath.
This may be a bit of a stretch, I don't know.

What do you think of creating a custom error for this?

class ParselExpressionError(ValueError):
    "Raised when there is an error parsing or evaluating a CSS or XPath expression"

@Digenis
Copy link
Member Author

Digenis commented Mar 30, 2016

What do you think of creating a custom error

I think it would make sense when parsel becomes an abstract selector library
with lxml.etree being just one of its backends.
For now, I think the ValueError is already too much
because XPaths are supposed to be known (or predictable) in compile time.
The only case I can think of where this can be usefull
is testing xpaths provided by users.
This is still problematic because the enabled exslt extensions
allow functions to raise different exceptions.
Also, xpaths provided by users can pose security risks,
especially with exslt enabled, regexes can be evaluated that can make the program hang.
(I think I should open a documentation PR for this)

@Digenis
Copy link
Member Author

Digenis commented Apr 13, 2016

Any decisions on this?

@redapple
Copy link
Contributor

@Digenis , do you have console output example of before and after your change?

@eliasdorneles
Copy link
Member

Hey, sorry for the delay!

I'm okay with this change -- it's simply adding more information to the exception, which already makes things better.
I still think it would be nice to raise a custom exception instead of plain ValueError.

@Digenis not sure if I understand your point about ValueError being too much.
Do you mean we should just let it crash, and user should figure out what's wrong from the stack trace because it would have more context?

We could also re-raise the exception with the original traceback (which seems a bit tricky to do while being compatible way to both PY2 and PY3).

@eliasdorneles
Copy link
Member

@Digenis hey, what about using six.reraise here?

@eliasdorneles
Copy link
Member

My suggestion:

   except etree.XPathError as exc:
        _, _, tb = sys.exc_info()
        msg = u"XPath error: %s in %s" % (exc, query)
        if six.PY2:
            msg = msg.encode("unicode_escape"))
        six.reraise(ValueError, ValueError(msg), tb)

This will maintain context both in PY2 and PY3, and will still throw it as ValueError -- thus keeping backwards compatibility with client code already expecting that.

@Digenis
Copy link
Member Author

Digenis commented Apr 22, 2016

amended

@Digenis
Copy link
Member Author

Digenis commented Apr 22, 2016

After patch

>>> parsel.Selector(text=u'').xpath('string(1,2)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 183, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177421)
ValueError: XPath error: Invalid number of arguments in string(1,2)
>>> parsel.Selector(text=u'').xpath('-')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 183, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177421)
ValueError: XPath error: Invalid expression in -
>>> parsel.Selector(text=u'').xpath('a:b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 183, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177421)
ValueError: XPath error: Undefined namespace prefix in a:b
>>> parsel.Selector(text=u'').xpath('k()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 183, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177421)
ValueError: XPath error: Unregistered function in k()
>>> parsel.Selector(text=u'').xpath('re:test(.,"\\1")')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 223, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177375)
  File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:11090)
  File "src/lxml/extensions.pxi", line 842, in lxml.etree._extension_function_call (src/lxml/lxml.etree.c:173853)
  File "src/lxml/extensions.pxi", line 503, in lxml.etree._ExsltRegExp.test (src/lxml/lxml.etree.c:169051)
  File "src/lxml/extensions.pxi", line 496, in lxml.etree._ExsltRegExp._compile (src/lxml/lxml.etree.c:168844)
  File "/home/digenis/py2env/lib/python2.7/re.py", line 194, in compile
    return _compile(pattern, flags)
  File "/home/digenis/py2env/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bogus escape: '\\1'

Before patch

>>> parsel.Selector(text=u'').xpath('string(1,2)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 180, in xpath
    raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
ValueError: Invalid XPath: string(1,2)
>>> parsel.Selector(text=u'').xpath('-')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 180, in xpath
    raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
ValueError: Invalid XPath: -
>>> parsel.Selector(text=u'').xpath('a:b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 180, in xpath
    raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
ValueError: Invalid XPath: a:b
>>> parsel.Selector(text=u'').xpath('k()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 180, in xpath
    raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
ValueError: Invalid XPath: k()
>>> parsel.Selector(text=u'').xpath('re:test(.,"\\1")')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parsel/selector.py", line 177, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:61854)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:178516)
  File "src/lxml/xpath.pxi", line 223, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:177375)
  File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:11090)
  File "src/lxml/extensions.pxi", line 842, in lxml.etree._extension_function_call (src/lxml/lxml.etree.c:173853)
  File "src/lxml/extensions.pxi", line 503, in lxml.etree._ExsltRegExp.test (src/lxml/lxml.etree.c:169051)
  File "src/lxml/extensions.pxi", line 496, in lxml.etree._ExsltRegExp._compile (src/lxml/lxml.etree.c:168844)
  File "/home/digenis/py2env/lib/python2.7/re.py", line 194, in compile
    return _compile(pattern, flags)
  File "/home/digenis/py2env/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bogus escape: '\\1'

@eliasdorneles eliasdorneles changed the title exception handling was hiding original exception [MRG+1] exception handling was hiding original exception Apr 22, 2016
@eliasdorneles
Copy link
Member

Looks good to me -- thanks @Digenis !

@kmike kmike merged commit 5006628 into scrapy:master Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants