New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicitly passed parametrize ids do not get escaped to ascii #1351

Closed
davehunt opened this Issue Feb 1, 2016 · 16 comments

Comments

Projects
None yet
5 participants
@davehunt
Contributor

davehunt commented Feb 1, 2016

If values in parametrize contain unicode bytes, they are escaped when generating the IDs, however if these same objects are provided via the ids argument, they are not escaped. As a result, the resulting IDs can't be used with pylib's XML/HTML generation.

This was discovered and previously discussed in https://github.com/davehunt/pytest-html/pull/30

To reproduce, create a conftest.py with the following contents:

# -*- coding: utf-8 -*-
def pytest_runtest_logreport(report):
    # print report.nodeid  # test_unicode.py::test_unicode[〈]
    print unicode(report.nodeid)  # UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
    # print report.nodeid.encode('string-escape')  # test_unicode.py::test_unicode[〈]
    # print unicode(report.nodeid.encode('string-escape'))  # test_unicode[\xe3\x80\x88]

Then, create a test file with the following contents:

# -*- coding: utf-8 -*-
import pytest
@pytest.mark.parametrize('val', ['foo'], ids=[''])
def test_unicode(val):
    pass

Run the test, and you will get the following traceback:

test_unicode.py 
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 90, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 121, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/main.py", line 146, in pytest_runtestloop
INTERNALERROR>     item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 595, in execute
INTERNALERROR>     return _wrapped_call(hook_impl.function(*args), self.execute)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 253, in _wrapped_call
INTERNALERROR>     return call_outcome.get_result()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 279, in get_result
INTERNALERROR>     _reraise(*ex)  # noqa
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 264, in __init__
INTERNALERROR>     self.result = func()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 65, in pytest_runtest_protocol
INTERNALERROR>     runtestprotocol(item, nextitem=nextitem)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 72, in runtestprotocol
INTERNALERROR>     rep = call_and_report(item, "setup", log)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/runner.py", line 123, in call_and_report
INTERNALERROR>     hook.pytest_runtest_logreport(report=report)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
INTERNALERROR>     return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
INTERNALERROR>     _MultiCall(methods, kwargs, hook.spec_opts).execute()
INTERNALERROR>   File "/Users/dhunt/.virtualenvs/tmp-19232ea9309e990c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/dhunt/workspace/pytest-scratchpad/conftest.py", line 6, in pytest_runtest_logreport
INTERNALERROR>     print unicode(report.nodeid)  # UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
INTERNALERROR> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 30: ordinal not in range(128)
@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Feb 10, 2016

Member

at second glace, this isbreaking as expected, print on python2 will break on unicode

please verify

Member

RonnyPfannschmidt commented Feb 10, 2016

at second glace, this isbreaking as expected, print on python2 will break on unicode

please verify

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Feb 10, 2016

Member

No, unicode is - in the case of pytest-html, it's when trying to add the element to HTML

The point is if pytest should always make sure that ids are valid unicode objects, by doing _escape_bytes if they are bytes as well.

Member

The-Compiler commented Feb 10, 2016

No, unicode is - in the case of pytest-html, it's when trying to add the element to HTML

The point is if pytest should always make sure that ids are valid unicode objects, by doing _escape_bytes if they are bytes as well.

@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Feb 10, 2016

Member

my understanding is, that currently node id's are native strings so bytes on python2
if we want to make them unicode in all cases, then we need a major release

Member

RonnyPfannschmidt commented Feb 10, 2016

my understanding is, that currently node id's are native strings so bytes on python2
if we want to make them unicode in all cases, then we need a major release

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Feb 11, 2016

Member

Sorry, my comment above is wrong - it should ensure they're always ascii-only native strings (i.e. bytes on Python 2), like it does with the autogenerated IDs.

For example, with this conftest.py:

from __future__ import print_function

def pytest_runtest_logreport(report):
    if report.when == 'call':
        print('\n  nodeid: {!r}'.format(report.nodeid))

and this test_foo.py:

# encoding: utf-8

import pytest

@pytest.mark.parametrize('foo', ['büpf'])
def test_args(foo):
    pass

@pytest.mark.parametrize('foo', ['a'], ids=['büpf'])
def test_ids(foo):
    pass

The autogenerated ID gets escaped to be ascii only, but not the ids one:

$ python2 -m py.test  -v
[...]

test_foo.py::test_args[b\xc3\xbcpf] PASSED
  nodeid: 'test_foo.py::test_args[b\\xc3\\xbcpf]'

test_foo.py::test_ids[büpf] PASSED
  nodeid: 'test_foo.py::test_ids[b\xc3\xbcpf]'
Member

The-Compiler commented Feb 11, 2016

Sorry, my comment above is wrong - it should ensure they're always ascii-only native strings (i.e. bytes on Python 2), like it does with the autogenerated IDs.

For example, with this conftest.py:

from __future__ import print_function

def pytest_runtest_logreport(report):
    if report.when == 'call':
        print('\n  nodeid: {!r}'.format(report.nodeid))

and this test_foo.py:

# encoding: utf-8

import pytest

@pytest.mark.parametrize('foo', ['büpf'])
def test_args(foo):
    pass

@pytest.mark.parametrize('foo', ['a'], ids=['büpf'])
def test_ids(foo):
    pass

The autogenerated ID gets escaped to be ascii only, but not the ids one:

$ python2 -m py.test  -v
[...]

test_foo.py::test_args[b\xc3\xbcpf] PASSED
  nodeid: 'test_foo.py::test_args[b\\xc3\\xbcpf]'

test_foo.py::test_ids[büpf] PASSED
  nodeid: 'test_foo.py::test_ids[b\xc3\xbcpf]'

@RonnyPfannschmidt RonnyPfannschmidt changed the title from bytes objects in parametrize ID lists are not encoded to explicitly passed parametrize ids do not get escaped to ascii Feb 11, 2016

@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Feb 11, 2016

Member

@hpk42 ping?

im under the impression we should either ensure the id's are always ascii,
and warn if the given id's do not match canonical ascii or support unicode

since python is growing more international and identifiers are unicode in 3.x, i think we should support unicode node id's simply because that is what we will get in future
for non-ascii byte string node id's we should issue a pytest-warning and escape them as if they where byte strings

Member

RonnyPfannschmidt commented Feb 11, 2016

@hpk42 ping?

im under the impression we should either ensure the id's are always ascii,
and warn if the given id's do not match canonical ascii or support unicode

since python is growing more international and identifiers are unicode in 3.x, i think we should support unicode node id's simply because that is what we will get in future
for non-ascii byte string node id's we should issue a pytest-warning and escape them as if they where byte strings

@ceridwen

This comment has been minimized.

Show comment
Hide comment
@ceridwen

ceridwen Mar 17, 2016

Contributor

I'd like to try this bug for my Outreachy application for pytest-html. What's the final conclusion on what parametrize should do with the ids arguments? I'd assume it's probably better to support Unicode ids. Should I call the same _escape_bytes() that's being used in _idval?

Contributor

ceridwen commented Mar 17, 2016

I'd like to try this bug for my Outreachy application for pytest-html. What's the final conclusion on what parametrize should do with the ids arguments? I'd assume it's probably better to support Unicode ids. Should I call the same _escape_bytes() that's being used in _idval?

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Mar 17, 2016

Member

FWIW, @RonnyPfannschmidt's proposal makes sense from my POV.

Member

The-Compiler commented Mar 17, 2016

FWIW, @RonnyPfannschmidt's proposal makes sense from my POV.

@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Mar 17, 2016

Member

a simplistic implementation needs to ensure a few details

  1. ensue non-ascii byte strings are escaped to something ascii-ish
  2. ensure unicode/utf8 is mapped correctly to text

we should always byte-escape non-aphanumeric id values that are byte strings
we should always unicode-escape unicode id values
the target is, that node id's are always ascii text in the end

Member

RonnyPfannschmidt commented Mar 17, 2016

a simplistic implementation needs to ensure a few details

  1. ensue non-ascii byte strings are escaped to something ascii-ish
  2. ensure unicode/utf8 is mapped correctly to text

we should always byte-escape non-aphanumeric id values that are byte strings
we should always unicode-escape unicode id values
the target is, that node id's are always ascii text in the end

@nicoddemus

This comment has been minimized.

Show comment
Hide comment
@nicoddemus

nicoddemus Mar 17, 2016

Member

@RonnyPfannschmidt in your proposal do you mean Python 2 and 3, or only for Python 2? Python 3 should we strive to support full unicode?

Member

nicoddemus commented Mar 17, 2016

@RonnyPfannschmidt in your proposal do you mean Python 2 and 3, or only for Python 2? Python 3 should we strive to support full unicode?

@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Mar 17, 2016

Member

for both - even if we support full unicode id's we still will run into problems on windows, osx and other things

Member

RonnyPfannschmidt commented Mar 17, 2016

for both - even if we support full unicode id's we still will run into problems on windows, osx and other things

@nicoddemus

This comment has been minimized.

Show comment
Hide comment
@nicoddemus

nicoddemus Mar 17, 2016

Member

Displaying them in the terminal for example?

Member

nicoddemus commented Mar 17, 2016

Displaying them in the terminal for example?

@RonnyPfannschmidt

This comment has been minimized.

Show comment
Hide comment
@RonnyPfannschmidt

RonnyPfannschmidt Mar 17, 2016

Member

yes, also they are sometimes used for filenames/lookup keys - making them consistent ascii text in all cases seems sensible simply to avoid platform issues on unicode boundaries

Member

RonnyPfannschmidt commented Mar 17, 2016

yes, also they are sometimes used for filenames/lookup keys - making them consistent ascii text in all cases seems sensible simply to avoid platform issues on unicode boundaries

@nicoddemus

This comment has been minimized.

Show comment
Hide comment
@nicoddemus

nicoddemus Mar 17, 2016

Member

Fair enough, sounds reasonable. Thanks! 😁

Member

nicoddemus commented Mar 17, 2016

Fair enough, sounds reasonable. Thanks! 😁

@ceridwen

This comment has been minimized.

Show comment
Hide comment
@ceridwen

ceridwen Mar 18, 2016

Contributor

A draft pull request is up at #1463.

Contributor

ceridwen commented Mar 18, 2016

A draft pull request is up at #1463.

@ceridwen

This comment has been minimized.

Show comment
Hide comment
@ceridwen

ceridwen Mar 22, 2016

Contributor

New pull request targeted at features is up at #1470.

Contributor

ceridwen commented Mar 22, 2016

New pull request targeted at features is up at #1470.

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Apr 5, 2016

Member

Closing as the PR was merged

Member

The-Compiler commented Apr 5, 2016

Closing as the PR was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment