Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string.printable.isprintable() returns False #67206

Open
planet36 mannequin opened this issue Dec 9, 2014 · 5 comments
Open

string.printable.isprintable() returns False #67206

planet36 mannequin opened this issue Dec 9, 2014 · 5 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@planet36
Copy link
Mannequin

planet36 mannequin commented Dec 9, 2014

BPO 23017
Nosy @birkenfeld, @vstinner, @ezio-melotti, @stevendaprano, @bitdancer, @4kir4, @iritkatriel
Files
  • bug-string-ascii.py: Test case shows that string.printable has control characters
  • 0001-Fix-string.printable-respect-POSIX-spec.patch
  • docs-string.printable.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-12-09.03:52:01.009>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'library', 'expert-unicode', 'docs']
    title = 'string.printable.isprintable() returns False'
    updated_at = <Date 2021-11-29.16:17:13.755>
    user = 'https://bugs.python.org/planet36'

    bugs.python.org fields:

    activity = <Date 2021-11-29.16:17:13.755>
    actor = 'iritkatriel'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Library (Lib)', 'Unicode']
    creation = <Date 2014-12-09.03:52:01.009>
    creator = 'planet36'
    dependencies = []
    files = ['37391', '37398', '37441']
    hgrepos = []
    issue_num = 23017
    keywords = ['patch']
    message_count = 5.0
    messages = ['232343', '232376', '232382', '232613', '407290']
    nosy_count = 10.0
    nosy_names = ['georg.brandl', 'vstinner', 'ezio.melotti', 'steven.daprano', 'r.david.murray', 'docs@python', 'akira', 'planet36', 'bru', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23017'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @planet36
    Copy link
    Mannequin Author

    planet36 mannequin commented Dec 9, 2014

    string.printable includes all whitespace characters. However, the only whitespace character that is printable is the space (0x20).

    By definition, the only ASCII characters considered printable are:
    alphanumeric characters
    punctuation characters
    the space character (not all whitespace characters)

    Source:
    http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03

    7.2 POSIX Locale

    Conforming systems shall provide a POSIX locale, also known as the C locale.

    7.3.1 LC_CTYPE

    space
    Define characters to be classified as white-space characters.

    In the POSIX locale, exactly <space>, <form-feed>, <newline>, <carriage-return>, <tab>, and <vertical-tab> shall be included.
    

    cntrl
    Define characters to be classified as control characters.

    In the POSIX locale, no characters in classes alpha or print shall be included.
    

    graph
    Define characters to be classified as printable characters, not including the <space>.

    In the POSIX locale, all characters in classes alpha, digit, and punct shall be included; no characters in class cntrl shall be included.
    

    print
    Define characters to be classified as printable characters, including the <space>.

    In the POSIX locale, all characters in class graph shall be included; no characters in class cntrl shall be included.
    

    LC_CTYPE Category in the POSIX Locale

    # "print" is by default "alnum", "punct", and the <space>

    @planet36 planet36 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 9, 2014
    @bru
    Copy link
    Mannequin

    bru mannequin commented Dec 9, 2014

    Here is a simple fix for the issue, plus a test.
    It does not break any unit test but this raises a backwards-compatibility problem. Therefore I wouldn't advise using it for Python 3.4 but only 3.5+.

    @bitdancer
    Copy link
    Member

    This is a bit of a conundrum. Our (string module) definition of printable is very clear, and it includes the other whitespace characters.

    We could document that this does not match the posix definition of printable. It also does not match the RFC 5822 definition of printable (for example), which does *not* include whitespace characters (not even space), but the posix definition is a more likely source of confusion.

    isprintable is a newer function than string.printable, and serves a different purpose. I suppose that when PEP-3138 was written and implemented the disconnect between the two definitions was not noticed.

    For backward compatibility reasons I suspect we are stuck with the discrepancy, but perhaps others will think it worth the pain of changing string.printable. I kind of doubt it, though.

    @bitdancer bitdancer added the docs Documentation in the Doc dir label Dec 9, 2014
    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Dec 13, 2014

    C standard defines locale-specific *printing characters* that are [ -~]
    in "C" locale for implementations that use 7-bit US ASCII character set
    i.e., SP (space, 0x20) is a printing character in C (isprint() returns
    nonzero).

    There is isgraph() function that returns zero for the space but
    otherwise is equivalent to isprint().

    POSIX definition is aligned with the ISO C standard.

    I don't know what RFC 5822 has to do with this issue but the rfc
    contradicts itself e.g., in one place it has: "printable US-ASCII
    characters except SP" that imlies that SP *is* printable but in other
    places it considers isprint==isgraph. The authors probably meant
    characters for which isgraph() is nonzero when they use "printable
    US-ASCII" (that is incorrect according to C standard).

    Tests from bpo-9770 show the relation between C character classes and
    string constants [1]:

    set(string.printable) == set(C['graph']) + set(C['space'])

    where C['space'] is '\t\n\v\f\r ' (the standard C whitespace).

    It is a documented behavior [2]:

    This is a combination of digits, ascii_letters, punctuation,
    and whitespace

    where *whitespace* is C['space'].

    In Python 2, *printable* is locale-dependent and it coincides with the
    corresponding Python 3 definition in "C" locale with ASCII charset.

    Unlike other string constants, *printable* differs from C['print'] on
    both Python 2 and 3 because it includes whitespace characters other than
    space.

    str.isprintable [3] obeys C['print'] (in ASCII range) and considers SP
    to be printable.

    ---

    It might be too late to change string.printable to correspond to C
    isprint() (for ASCII characters).

    I've uploaded a documentation patch that mentions that string.printable
    and str.isprintable differ.

    [1] http://bugs.python.org/review/9770/diff/12212/Lib/test/test_curses_ascii.py
    [2] https://hg.python.org/cpython/file/3.4/Doc/library/string.rst#l62
    [3] https://docs.python.org/3.4/library/stdtypes.html#str.isprintable

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11.

    @iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Nov 29, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants