Missing documentation for codecs.escape_decode #74773

MatthieuDartiailh · 2017-06-07T15:02:56Z

BPO	30588
Nosy	@gpshead, @njsmith, @asvetlov, @serhiy-storchaka, @MatthieuDartiailh, @carlbordum
PRs	bpo-30588: document codecs.escape_decode #14747

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-06-07.15:02:55.530>
labels = ['3.7', '3.8', 'docs']
title = 'Missing documentation for codecs.escape_decode'
updated_at = <Date 2019-07-14.14:55:14.718>
user = 'https://github.com/MatthieuDartiailh'

bugs.python.org fields:

activity = <Date 2019-07-14.14:55:14.718>
actor = 'serhiy.storchaka'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation = <Date 2017-06-07.15:02:55.530>
creator = 'mdartiailh'
dependencies = []
files = []
hgrepos = []
issue_num = 30588
keywords = ['patch']
message_count = 9.0
messages = ['295342', '295344', '295347', '327259', '327268', '339469', '347827', '347919', '347922']
nosy_count = 8.0
nosy_names = ['gregory.p.smith', 'njs', 'asvetlov', 'docs@python', 'serhiy.storchaka', 'paulehoffman', 'mdartiailh', 'carlbordum']
pr_nums = ['14747']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue30588'
versions = ['Python 3.7', 'Python 3.8']

MatthieuDartiailh · 2017-06-07T15:02:55Z

codecs.escape_decode does not appear in the codecs documentation. This function is to my knowledge the only convenient way to process the escaped characters in a literal string (actually found here https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python). It is most useful when implementing a parser for a language extending python semantic while retaining python processing of string (cf https://github.com/MatthieuDartiailh/enaml).

Is there a reason for that function not being documented ?

serhiy-storchaka · 2017-06-07T15:22:15Z

This is an internal function kept for compatibility. It is used only for decoding pickle protocol 0 data created in Python 2. Look at unicode_escape and raw_unicode_escape codecs for doing similar decoding to strings in Python 3.

MatthieuDartiailh · 2017-06-07T15:36:22Z

The issue is that unicode_escape will not properly handle strings mixing
unicode character and escaped character as it assumes latin-1 compatible
characters only. For example, given the literal string 'Δ\nΔ', one
cannot encode using latin-1 and encoding it using utf-8 then using
unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using
codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives
the proper output. Internally the Python parser handle this case but I
was unable to find where and this is the closest solution I found. I
guess it may be possible using error handlers but it seems much more
cumbersome.

Best regards

Matthieu

paulehoffman · 2018-10-06T21:48:10Z

Bumping this thread a bit. It appears that this "internal" function is being talked about out in the real world. I came across it in a recent blog post, saw that it wasn't in the official documentation, and went looking here.

I propose that it be documented even if it feels like a tad of a kludge.

asvetlov · 2018-10-07T07:58:25Z

-1
Internal function means: you can use it on your risk but the function can be changed or even removed in any Python release.
I see no point in documenting and making it public.

gpshead · 2019-04-05T01:03:29Z

We can't change it or remove it, it is public by virtue of its name. We should document it.

Removing or renaming it to be _private requires a PendingDeprecationWarning -> DeprecationWarning -> removal cycle. it is well known and used.

https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3/23151714#23151714

serhiy-storchaka · 2019-07-13T14:32:01Z

I disagree. We can change, rename or remove it because it is not public function and never was. But we can not just remove it while it is used in the pickle module, and there is no reason to change it as it works pretty good for its purpose.

If you want to make it public and maintain it, I suggest first discuss this on the Python-Ideas mailing list. You should prove that the benefit of adding it is larger than the cost of the maintance.

carlbordum · 2019-07-14T14:30:26Z

You have a point, the function is not in codecs.__all__. Reading the stackoverflow questions, it seems like this is a function that is useful.

serhiy-storchaka · 2019-07-14T14:55:14Z

Reading the stackoverflow questions, I am not sure that this function would be useful for the author of the question. He just needs to remove b'\\000', this is only what we know. There are many ways to do it, and after using codecs.escape_decode() you will need to remove b'\000'.

If you want to add a feature similar to the "string-escape" codec in Python 3, it is better to provide it officially as a new codec "bytes-escape" (functions like codecs.utf_16_le_decode() are internal). But we should discuss its behavior taking to account the difference between string literals in Python 2 and bytes literals in Python 3. For example how to treat non-escaped non-ascii bytes (they where acceptable in Python 2, but not in Python 3).

MatthieuDartiailh mannequin added the 3.7 (EOL) end of life label Jun 7, 2017

MatthieuDartiailh mannequin assigned docspython Jun 7, 2017

MatthieuDartiailh mannequin added the docs Documentation in the Doc dir label Jun 7, 2017

gpshead added the 3.8 only security fixes label Apr 5, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

nadove-ucsc mentioned this issue Feb 16, 2023

Forward ALB logs from S3 to CloudWatch (#4178) DataBiosphere/azul#4920

Merged

80 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing documentation for codecs.escape_decode #74773

Missing documentation for codecs.escape_decode #74773

MatthieuDartiailh mannequin commented Jun 7, 2017

MatthieuDartiailh mannequin commented Jun 7, 2017

serhiy-storchaka commented Jun 7, 2017

MatthieuDartiailh mannequin commented Jun 7, 2017

paulehoffman mannequin commented Oct 6, 2018

asvetlov commented Oct 7, 2018

gpshead commented Apr 5, 2019

serhiy-storchaka commented Jul 13, 2019

carlbordum mannequin commented Jul 14, 2019

serhiy-storchaka commented Jul 14, 2019

Missing documentation for codecs.escape_decode #74773

Missing documentation for codecs.escape_decode #74773

Comments

MatthieuDartiailh mannequin commented Jun 7, 2017

MatthieuDartiailh mannequin commented Jun 7, 2017

serhiy-storchaka commented Jun 7, 2017

MatthieuDartiailh mannequin commented Jun 7, 2017

paulehoffman mannequin commented Oct 6, 2018

asvetlov commented Oct 7, 2018

gpshead commented Apr 5, 2019

serhiy-storchaka commented Jul 13, 2019

carlbordum mannequin commented Jul 14, 2019

serhiy-storchaka commented Jul 14, 2019