Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Regular Expression HOWTO #55084

Closed
terryjreedy opened this issue Jan 9, 2011 · 16 comments
Closed

Update Regular Expression HOWTO #55084

terryjreedy opened this issue Jan 9, 2011 · 16 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@terryjreedy
Copy link
Member

BPO 10875
Nosy @akuchling, @birkenfeld, @terryjreedy, @merwok
Files
  • regex.rst.diff
  • zregex2.rst.diff: Matching Chars addition
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2011-01-10.23:19:16.710>
    created_at = <Date 2011-01-09.20:20:07.356>
    labels = ['docs']
    title = 'Update Regular Expression HOWTO'
    updated_at = <Date 2011-01-11.17:46:26.622>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2011-01-11.17:46:26.622>
    actor = 'terry.reedy'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2011-01-10.23:19:16.710>
    closer = 'terry.reedy'
    components = ['Documentation']
    creation = <Date 2011-01-09.20:20:07.356>
    creator = 'terry.reedy'
    dependencies = []
    files = ['20332', '20340']
    hgrepos = []
    issue_num = 10875
    keywords = ['patch']
    message_count = 16.0
    messages = ['125855', '125858', '125859', '125861', '125862', '125865', '125866', '125868', '125869', '125874', '125876', '125891', '125946', '125950', '125962', '126024']
    nosy_count = 6.0
    nosy_names = ['akuchling', 'georg.brandl', 'terry.reedy', 'eric.araujo', 'SilentGhost', 'docs@python']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue10875'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @terryjreedy
    Copy link
    Member Author

    1. Does 'Release 0.05' at the top have any useful current meaning?
      or could it be deleted?

    2. Introduction:

    The history paragraph "The re module was added in Python 1.5, and provides Perl-style regular expression patterns. Earlier versions of Python came with the regex module, which provided Emacs-style patterns. The regex module was removed completely in Python 2.5." might be eliminated in 3.x, or at least the irrelevant-for-py3 reference to regex. This is a policy decision.

    1. Performing matches:

    "If you have Tkinter available, you may also want to look at Tools/scripts/redemo.py,"

    Change 'Tkinter' to 'tkinter' and make it a module reference.
    In link, change 'scripts' to 'demo' as redemo.py got moved.

    "Phil Schwartz’s Kodos is also an interactive tool for developing and testing RE patterns."

    Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)

    "Python 2.2.2 (#1, Feb 10 2003, 12:57:01)"
    delete

    <_sre.SRE_Match object at 80c4f68>

    This is correctly updated (for late 2.x and 3.x)

    "<re.MatchObject instance at 80c9650>" (7 like this)

    Globally replace 're.MatchObject instance' with '_sre.SRE_Match object'

    1. Footnote

    "[1] Introduced in Python 2.2.2."

    remove for 3.x here and wherever footnote reference is in the text.

    1. "Not Using re.VERBOSE"

    This section is about *using* re.VERBOSE and the benefit thereof, not about not using it. I recommend deleting 'Not' as it gives the impression that the section is a warning about not using, the opposite of the intent.

    1. Code example output and doctest:

    I ran doctest.testfile("C:/programs/PyDev/py32/Doc/howto/regex.rst", module_relative = False)

    After the 're...' to '_sre...' substitution above, all 11 failures would be due to 'at 0x#######' address mismatches. I believe changing all 11 addresses to '0x...' (I took this from the doctest doc) would both fix the failures and remove irrelevant detail for human readers.

    The other 87 examples all passed ;-!.

    Is there any current doctest-related markup that should be added?

    @terryjreedy terryjreedy added the docs Documentation in the Doc dir label Jan 9, 2011
    @birkenfeld
    Copy link
    Member

    Your points 1-5 all sound valid to me. Would you like to do make a patch? I don't know what to do about the release number. Probably doesn't hurt anyone to keep it.

    @merwok
    Copy link
    Member

    merwok commented Jan 9, 2011

    Good points overall.

    The only subpoint I disagree with is this one: “Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)” IMO, it’s the job of the Sphinx builder to add URIs in plaintext if the format does not have hyperlinks. -1 on cluttering the source and HTML output with duplicated links.

    @birkenfeld
    Copy link
    Member

    Oh right, I misread that one. Can't Windows help users right-click and select "Copy URL"?

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 9, 2011

    Here is the patch implementing all but the url suggestion.

    Doctest still has 11 failures (changing to '0x...' didn't help).

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 9, 2011

    A few bits and pieces fixed compared to the previous patch.

    >>> doctest.testfile("/home/mischa/pydev/Doc/howto/regex.rst", module_relative = False, optionflags=doctest.ELLIPSIS)
    TestResults(failed=0, attempted=98)

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 9, 2011

    It seems that the special sequences description in Matching Characters section need to be updated to incorporate information on unicode and bytes. I don't think, however, that it's a good idea just to copy that information from the Doc/library/re.rst May be the section could be shortened and linked to that RE Syntax section? there aren't any deeper links available unfortunately.

    @terryjreedy
    Copy link
    Member Author

    I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.

    Unless A Kuchling says different, I would like to remove the version number. It implies to me that this doc is in pre-alpha condition and it is far beyond that. I see that the patch already does so.

    -:file:`Tools/scripts/redemo.py`, a demonstration program included with the
    +:file:`Tools/scripts/demo.py`, a demonstration program included with the

    should (currently) be
    +:file:`Tools/demo/redemo.py`, a demonstration program included with the

    Other than that, the patch looks good. Thanks. I am still thinking about Matching Characters. Once the patch is fixed with possible addition, a 2.7 version can easily be made be deleting the 3.x-specific deletions.

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 9, 2011

    I don't know whether it would be easy to strip down py3k version to 2.7 version.

    Seeing how it's just a basic introduction, I would think that a single statement re unicode support might be sufficient. For exhaustive description of special sequences refer the docs and carry on with ascii strings.

    Attached patch fixes path issue.

    @terryjreedy
    Copy link
    Member Author

    Since I think I know how to do it, easily, I will try to derive the 2.7 patch.

    In Matching Characters, I think
    "The following predefined special sequences are available:"

    should be expanded to

    "The following predefined special sequences are a subset of those available. The equivalent classes are for bytes patterns. For a complete list of sequences and expanded class definitions for Unicode string patterns, see the end of Regular Expression Syntax."
    (with section reference markup).

    Note to myself. /bytes/byte string/ for 2.7.

    While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).

    @merwok
    Copy link
    Member

    merwok commented Jan 10, 2011

    I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.

    I would argue that this is a bug in the CHM viewers, not Python :)

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 10, 2011

    While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).

    I did rebuild the docs with 'make html'. Build was clean every time. If you meant something else please let me know.

    @terryjreedy
    Copy link
    Member Author

    I applied patch to 3.2, 3.1 in r87904, r87905. Thanks.
    I had to re-edit for 2.7: r87909.

    I made a separate small patch for my suggested addition to Matching Characters. Could someone check that it is correct, given that re.rst contains the target directive (or whatever it is called):
    .. _re-syntax:

    @merwok
    Copy link
    Member

    merwok commented Jan 10, 2011

    Looks good, builds without warnings.

    Note that you can use :ref:`re-syntax` and Sphinx will substitute the heading for you. The :role:`some special text <real-target>` form is used when you want to control the text of the link.

    (That thing is called an hyperlink target: http://docutils.sourceforge.net/docs/user/rst/quickref.html#hyperlink-targets)

    @terryjreedy
    Copy link
    Member Author

    and r87918 for 2.7, with bytes -> byte string

    @terryjreedy
    Copy link
    Member Author

    Correction: r87912 and r87913 for 3.x

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants