Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempfile mixes str and bytes in an inconsistent manner #84878

Closed
ericzolf mannequin opened this issue May 20, 2020 · 18 comments
Closed

tempfile mixes str and bytes in an inconsistent manner #84878

ericzolf mannequin opened this issue May 20, 2020 · 18 comments
Assignees
Labels
3.10 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@ericzolf
Copy link
Mannequin

ericzolf mannequin commented May 20, 2020

BPO 40701
Nosy @gpshead, @ambv, @serhiy-storchaka, @miss-islington, @ericzolf
PRs
  • bpo-40701: tempfile mixes str and bytes in an inconsistent manner #20442
  • [3.9] bpo-40701: tempfile mixes str and bytes in an inconsistent manner (GH-20442) #24734
  • bpo-40701: doc typo historcal -> historical #25334
  • Files
  • tempfile.py.diff: Patching tempfile.py to properly handle tempdir set to bytes
  • tempfile_py_2020-05-23.diff: Proper patch for tempfile.py for consistent bytes handling
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ambv'
    closed_at = <Date 2021-04-10.08:16:15.400>
    created_at = <Date 2020-05-20.19:20:24.162>
    labels = ['type-bug', '3.10']
    title = 'tempfile mixes str and bytes in an inconsistent manner'
    updated_at = <Date 2021-04-10.17:55:21.156>
    user = 'https://github.com/ericzolf'

    bugs.python.org fields:

    activity = <Date 2021-04-10.17:55:21.156>
    actor = 'gregory.p.smith'
    assignee = 'lukasz.langa'
    closed = True
    closed_date = <Date 2021-04-10.08:16:15.400>
    closer = 'lukasz.langa'
    components = []
    creation = <Date 2020-05-20.19:20:24.162>
    creator = 'ericzolf'
    dependencies = []
    files = ['49176', '49185']
    hgrepos = []
    issue_num = 40701
    keywords = ['patch']
    message_count = 18.0
    messages = ['369469', '369511', '369698', '369742', '369810', '369817', '369850', '369855', '370049', '370110', '370125', '370126', '387922', '387930', '388056', '390684', '390702', '390727']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'lukasz.langa', 'python-dev', 'serhiy.storchaka', 'miss-islington', 'ericzolf']
    pr_nums = ['20442', '24734', '25334']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40701'
    versions = ['Python 3.10']

    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 20, 2020

    tempfile fails on mixed str and bytes when setting tempfile.tempdir to a non-existent bytes path but succeeds when set to an existing bytes path.

    $ python3.9
    Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
    [GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tempfile
    >>> tempfile.tempdir = b'/doesntexist'
    >>> tempfile.TemporaryFile()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.9/tempfile.py", line 615, in TemporaryFile
        (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
      File "/usr/lib64/python3.9/tempfile.py", line 248, in _mkstemp_inner
        file = _os.path.join(dir, pre + name + suf)
      File "/usr/lib64/python3.9/posixpath.py", line 90, in join
        genericpath._check_arg_types('join', a, *p)
      File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
        raise TypeError("Can't mix strings and bytes in path components") from None
    TypeError: Can't mix strings and bytes in path components
    >>> tempfile.tempdir = b'/tmp'
    >>> tempfile.TemporaryFile()
    <_io.BufferedRandom name=3>
    >>> tempfile.mktemp()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.9/tempfile.py", line 400, in mktemp
        file = _os.path.join(dir, prefix + name + suffix)
      File "/usr/lib64/python3.9/posixpath.py", line 90, in join
        genericpath._check_arg_types('join', a, *p)
      File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
        raise TypeError("Can't mix strings and bytes in path components") from None
    TypeError: Can't mix strings and bytes in path components

    It seems to me that the case of tempfile.tempdir being of type bytes hasn't been completely considered and is handled inconsistently. My suggestion would be to manage the paths as bytes if there is no other indication like suffix or prefix.

    @ericzolf ericzolf mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels May 20, 2020
    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 21, 2020

    In the meantime, I noticed the following in addition:

    [ericl@tuxedo ~]$ python3.9
    Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
    [GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tempfile
    >>> tempfile.tempdir = b'/tmp'
    >>> tempfile.gettempdir()
    b'/tmp'
    >>> tempfile.tempdir = '/tmp'
    >>> tempfile.gettempdirb()
    b'/tmp'

    This actually explicitly hurts the interface description which states that tempfile.gettempdir() returns a string.

    "Encouraged" by this discovery, I've tried to write a patch of tempfile.py addressing the issues discovered. It's my first patch ever of Python so bare with me. The default remains string but if someone _explicitly_ sets tempdir to bytes, it'll become bytes. I've tried all the commands listed previously and it all looks consistent to me.

    @terryjreedy terryjreedy added the 3.10 only security fixes label May 23, 2020
    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 23, 2020

    Sorry, I uploaded by mistake an early version of the patch. The new one is the one I had actually tested (the old one would fail with mixing bytes and string under certain circumstances, I can't remember any more).

    @gpshead
    Copy link
    Member

    gpshead commented May 23, 2020

    Could you please turn that into a Github PR?

    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 24, 2020

    On 23/05/2020 21:41, Gregory P. Smith wrote:

    Could you please turn that into a Github PR?
    I can, if you don't tell me that I need to setup a full-blown Python
    development environment to do this.

    @serhiy-storchaka
    Copy link
    Member

    Maybe just document that tempdir should be a string?

    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 25, 2020

    On 24/05/2020 20:30, Serhiy Storchaka wrote:

    Maybe just document that tempdir should be a string?
    I would definitely prefer to have bytes paths considered as 1st class
    citizen.

    @serhiy-storchaka
    Copy link
    Member

    In any case this is a new feature, so it can be added only in 3.10, and we need the documentation patch for 3.9 and older.

    As a workaround you can use os.fsdecode():

    tempfile.tempdir = os.fsdecode(b'/doesntexist')

    @ericzolf
    Copy link
    Mannequin Author

    ericzolf mannequin commented May 27, 2020

    Well, your decision but, as a user of the library, it didn't feel like a new feature just like a bug to be fixed, the main issue being the inconsistent handling of bytes vs. str.

    @gpshead
    Copy link
    Member

    gpshead commented May 27, 2020

    We consider it closer to new feature as it changes existing behavior in a way that people cannot _depend_ on being present in older Python releases as it'd only appear in a bugfix release, so most people could never write code depending on it while claiming to generally support 3.7-3.9.

    Anyways your PR overall looks good for 3.10. I left some comments.

    @serhiy-storchaka
    Copy link
    Member

    Well, the behavior for an existing bytes path was not unintended, but some people can depend on it.

    But before making it an official feature, we should also check other cases of an unintended behavior. What if set tempfile.tempdir to a Path object or to a file descriptor of a directory? Does anything work in these cases and should we support them?

    @gpshead
    Copy link
    Member

    gpshead commented May 27, 2020

    I expect the best decision to be to get rid of tempfile.tempdir entirely. That would need be its own issue with a deprecation period involved.

    A process global that alters behavior of all calls into a module that don't explicitly opt-out is a bad API.

    @gpshead gpshead self-assigned this Mar 2, 2021
    @serhiy-storchaka
    Copy link
    Member

    A process global that alters behavior of all calls into a module that don't explicitly opt-out is a bad API.

    I don't think that it is so bad. The behavior depends on environment variables TMPDIR, TEMP, TMP. The tempdir variable is just a cache for them. As sys.path is a cache for PYTHONPATH. We need just document that it should be a string if not None. Nobody expects bytes paths be valid in sys.path.

    On other hand, there is gettempdir(), so we have two different ways to get the value of tempfile.tempdir.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 2, 2021

    I clarified the documentation in the PR and added a regression test.

    I chose to explicitly document that tempfile.tempdir may only be str or bytes and cannot be a path-like object.

    We already document that people really should not set it and instead pass dir= to their APIs. Now the docs make that more clear when it comes to setting it to a bytes object due to the global API return type change consequences.

    I view this PR as cleaning up a partial misbehavior while preserving an API wart of it ever working in any manner when set to a bytes value.

    getting rid of tempfile.tempdir entirely or preventing it from being a bytes value at all would be much more of a breaking change, even though desirable. Not a bugfix. Now with the PR as is, we at least document that people should avoid doing that and make it clear what consistent behavior happens when they do it anyways. With a test.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 3, 2021

    New changeset 9c79274 by Eric L in branch 'master':
    bpo-40701: tempfile mixes str and bytes in an inconsistent manner (GH-20442)
    9c79274

    @gpshead
    Copy link
    Member

    gpshead commented Apr 10, 2021

    Fixed for 3.10. Marking as a release blocker for 3.9 and assigning for a release manager decision on if they accept this change (the PR) as a bugfix in 3.9 or not. (see the PR)

    It is late enough in 3.8's lifetime I wouldn't touch that one.

    @gpshead gpshead added release-blocker and removed 3.7 (EOL) end of life 3.8 (EOL) end of life 3.10 only security fixes labels Apr 10, 2021
    @gpshead gpshead removed their assignment Apr 10, 2021
    @ambv
    Copy link
    Contributor

    ambv commented Apr 10, 2021

    Sorry, I decided not to take this. We're four bugfix releases into 3.9 and there is non-zero chance somebody relies on existing behavior somehow. Marking this as changing with 3.10.0 is cleaner from an end user standpoint.

    @ambv ambv added 3.10 only security fixes and removed 3.9 only security fixes labels Apr 10, 2021
    @ambv ambv closed this as completed Apr 10, 2021
    @gpshead
    Copy link
    Member

    gpshead commented Apr 10, 2021

    New changeset e05a703 by Gregory P. Smith in branch 'master':
    bpo-40701: doc typo historcal -> historical (GH-25334)
    e05a703

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants