Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove backslash escapes from tokenize.c. #44961

Closed
ronadam mannequin opened this issue May 16, 2007 · 11 comments
Closed

Remove backslash escapes from tokenize.c. #44961

ronadam mannequin opened this issue May 16, 2007 · 11 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@ronadam
Copy link
Mannequin

ronadam mannequin commented May 16, 2007

BPO 1720390
Nosy @gvanrossum, @tiran
Files
  • norawescape3.diff: Rrmoves escape chrs from raw strings.
  • tokenize_cleanup_patch.diff
  • no_raw_escapes_patch.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2007-11-16.00:52:49.527>
    created_at = <Date 2007-05-16.22:23:54.000>
    labels = ['interpreter-core']
    title = 'Remove backslash escapes from tokenize.c.'
    updated_at = <Date 2008-01-06.22:29:46.075>
    user = 'https://bugs.python.org/ronadam'

    bugs.python.org fields:

    activity = <Date 2008-01-06.22:29:46.075>
    actor = 'admin'
    assignee = 'gvanrossum'
    closed = True
    closed_date = <Date 2007-11-16.00:52:49.527>
    closer = 'gvanrossum'
    components = ['Interpreter Core']
    creation = <Date 2007-05-16.22:23:54.000>
    creator = 'ron_adam'
    dependencies = []
    files = ['7994', '8762', '8763']
    hgrepos = []
    issue_num = 1720390
    keywords = ['patch']
    message_count = 11.0
    messages = ['52631', '52632', '52633', '52634', '52635', '52636', '57250', '57262', '57290', '57578', '57579']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'christian.heimes', 'ron_adam']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1720390'
    versions = ['Python 3.0']

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented May 16, 2007

    This patch modifies tokanizer.c so that it does not skip the character after a backslash in determining the end of a string in raw strings only.

    A few strings needed changes in order to compile. Two in textwrap.py, and one in distutils/util.py.

    This does not include changes needed for tests to pass. I'll include those in a separate patch.

    @ronadam ronadam mannequin assigned gvanrossum May 16, 2007
    @ronadam ronadam mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 16, 2007
    @ronadam ronadam mannequin assigned gvanrossum May 16, 2007
    @ronadam ronadam mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 16, 2007
    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented May 16, 2007

    Forgot to specify...

    This is against the py3k-struni branch, revision 55388.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented May 20, 2007

    Here's a more complete patch which modifies the following files... (in py3k_struni branch)

    M Python/ast.c
    M Parser/tokenizer.c
    M Lib/test/tokenize_tests.txt
    M Lib/tokenize.py

    The test still dosen't pass, but it fails in the same way as it did before these changes were made. I'll continue to look into this. I think it's more of a problem with the test it self and not a problem with the modules. Or it may be a bug in the struni branch that is yet to be fixed.

    The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases.

    M Lib/sgmllib.py
    M Lib/markupbase.py
    M Lib/textwrap.py
    M Lib/distutils/util.py
    M Lib/cookielib.py
    M Lib/pydoc.py
    M Lib/doctest.py
    M Lib/xml/etree/ElementTree.py
    M Lib/HTMLParser.py

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented May 20, 2007

    Here's a more complete patch which modifies the following files... (in py3k_struni branch)

    M Python/ast.c
    M Parser/tokenizer.c
    M Lib/test/tokenize_tests.txt
    M Lib/tokenize.py

    The test still dosen't pass, but it fails in the same way as it did before these changes were made. I'll continue to look into this. I think it's more of a problem with the test it self and not a problem with the modules. Or it may be a bug in the struni branch that is yet to be fixed.

    The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases.

    M Lib/sgmllib.py
    M Lib/markupbase.py
    M Lib/textwrap.py
    M Lib/distutils/util.py
    M Lib/cookielib.py
    M Lib/pydoc.py
    M Lib/doctest.py
    M Lib/xml/etree/ElementTree.py
    M Lib/HTMLParser.py

    File Added: norawescape2.diff

    @gvanrossum
    Copy link
    Member

    Just FYI, I have downloaded this and will attempt to apply it some time next week.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Jun 14, 2007

    Updated patch.

    The error that I had mentioned before has been fixed.
    Added changes to the tokanize_test output comparison file.

    It has random failures due to it using a random sample of other tests as sources to do round trip tests with. If those files have a problems in them, then this tests fails.

    Added a filename output line to the test so the problem file can be identified.

    Patch is against the py3k_struni branch, revision 55970

    File Added: norawescape3.diff

    @tiran
    Copy link
    Member

    tiran commented Nov 8, 2007

    Can you create a new patch and verify that the problem still exists?
    norawescape3.diff doesn't apply cleanly any more.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Nov 8, 2007

    Yes, I will update it.

    @gvanrossum
    Copy link
    Member

    FWIW, I'm +1 on the part of this patch that disables \u in raw strings.
    I just had a problem with a doctest that couldn't be run in verbose mode
    because \u was being interpreted in raw mode... But I'm still solidly
    -1 on allowing trailing \.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Nov 16, 2007

    It looks like the disabling of \u and \U in raw strings is done. Does
    tokenize.py need to be fixed, to match?

    While working on this I was able to clean up the string parsing parts of
    tokenize.c, and have a separate patch with just that.

    And an updated patch with both the cleaned up tokenize.c and the no
    escapes in raw strings in case it is desired after all.

    @gvanrossum
    Copy link
    Member

    I don't think tokenizer.py needs to be changed -- it never interpreted
    backslashes in string literals anyway (not even in regular, non-raw
    literals).

    The tokenizer.c cleanup is submitted as revision 59007.

    I still am not warming up towards the no-raw-escapes feature, so I'm
    closing this as rejected. Nevertheless, thanks for your efforts!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants