Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing documentation for codecs.escape_decode #74773

Open
MatthieuDartiailh mannequin opened this issue Jun 7, 2017 · 9 comments
Open

Missing documentation for codecs.escape_decode #74773

MatthieuDartiailh mannequin opened this issue Jun 7, 2017 · 9 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir

Comments

@MatthieuDartiailh
Copy link
Mannequin

MatthieuDartiailh mannequin commented Jun 7, 2017

BPO 30588
Nosy @gpshead, @njsmith, @asvetlov, @serhiy-storchaka, @MatthieuDartiailh, @carlbordum
PRs
  • bpo-30588: document codecs.escape_decode #14747
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2017-06-07.15:02:55.530>
    labels = ['3.7', '3.8', 'docs']
    title = 'Missing documentation for codecs.escape_decode'
    updated_at = <Date 2019-07-14.14:55:14.718>
    user = 'https://github.com/MatthieuDartiailh'

    bugs.python.org fields:

    activity = <Date 2019-07-14.14:55:14.718>
    actor = 'serhiy.storchaka'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2017-06-07.15:02:55.530>
    creator = 'mdartiailh'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30588
    keywords = ['patch']
    message_count = 9.0
    messages = ['295342', '295344', '295347', '327259', '327268', '339469', '347827', '347919', '347922']
    nosy_count = 8.0
    nosy_names = ['gregory.p.smith', 'njs', 'asvetlov', 'docs@python', 'serhiy.storchaka', 'paulehoffman', 'mdartiailh', 'carlbordum']
    pr_nums = ['14747']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue30588'
    versions = ['Python 3.7', 'Python 3.8']

    @MatthieuDartiailh
    Copy link
    Mannequin Author

    MatthieuDartiailh mannequin commented Jun 7, 2017

    codecs.escape_decode does not appear in the codecs documentation. This function is to my knowledge the only convenient way to process the escaped characters in a literal string (actually found here https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python). It is most useful when implementing a parser for a language extending python semantic while retaining python processing of string (cf https://github.com/MatthieuDartiailh/enaml).

    Is there a reason for that function not being documented ?

    @MatthieuDartiailh MatthieuDartiailh mannequin added the 3.7 (EOL) end of life label Jun 7, 2017
    @MatthieuDartiailh MatthieuDartiailh mannequin added the docs Documentation in the Doc dir label Jun 7, 2017
    @serhiy-storchaka
    Copy link
    Member

    This is an internal function kept for compatibility. It is used only for decoding pickle protocol 0 data created in Python 2. Look at unicode_escape and raw_unicode_escape codecs for doing similar decoding to strings in Python 3.

    @MatthieuDartiailh
    Copy link
    Mannequin Author

    MatthieuDartiailh mannequin commented Jun 7, 2017

    The issue is that unicode_escape will not properly handle strings mixing
    unicode character and escaped character as it assumes latin-1 compatible
    characters only. For example, given the literal string 'Δ\nΔ', one
    cannot encode using latin-1 and encoding it using utf-8 then using
    unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using
    codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives
    the proper output. Internally the Python parser handle this case but I
    was unable to find where and this is the closest solution I found. I
    guess it may be possible using error handlers but it seems much more
    cumbersome.

    Best regards

    Matthieu

    @paulehoffman
    Copy link
    Mannequin

    paulehoffman mannequin commented Oct 6, 2018

    Bumping this thread a bit. It appears that this "internal" function is being talked about out in the real world. I came across it in a recent blog post, saw that it wasn't in the official documentation, and went looking here.

    I propose that it be documented even if it feels like a tad of a kludge.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Oct 7, 2018

    -1
    Internal function means: you can use it on your risk but the function can be changed or even removed in any Python release.
    I see no point in documenting and making it public.

    @gpshead
    Copy link
    Member

    gpshead commented Apr 5, 2019

    We can't change it or remove it, it is public by virtue of its name. We should document it.

    Removing or renaming it to be _private requires a PendingDeprecationWarning -> DeprecationWarning -> removal cycle. it is well known and used.

    https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3/23151714#23151714

    @gpshead gpshead added the 3.8 only security fixes label Apr 5, 2019
    @serhiy-storchaka
    Copy link
    Member

    I disagree. We can change, rename or remove it because it is not public function and never was. But we can not just remove it while it is used in the pickle module, and there is no reason to change it as it works pretty good for its purpose.

    If you want to make it public and maintain it, I suggest first discuss this on the Python-Ideas mailing list. You should prove that the benefit of adding it is larger than the cost of the maintance.

    @carlbordum
    Copy link
    Mannequin

    carlbordum mannequin commented Jul 14, 2019

    You have a point, the function is not in codecs.__all__. Reading the stackoverflow questions, it seems like this is a function that is useful.

    @serhiy-storchaka
    Copy link
    Member

    Reading the stackoverflow questions, I am not sure that this function would be useful for the author of the question. He just needs to remove b'\\000', this is only what we know. There are many ways to do it, and after using codecs.escape_decode() you will need to remove b'\000'.

    If you want to add a feature similar to the "string-escape" codec in Python 3, it is better to provide it officially as a new codec "bytes-escape" (functions like codecs.utf_16_le_decode() are internal). But we should discuss its behavior taking to account the difference between string literals in Python 2 and bytes literals in Python 3. For example how to treat non-escaped non-ascii bytes (they where acceptable in Python 2, but not in Python 3).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir
    Projects
    Development

    No branches or pull requests

    3 participants