Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser handle_starttag replaces entity references in attribute value even without semicolon #69426

Open
frogcoder mannequin opened this issue Sep 26, 2015 · 2 comments · May be fixed by #95215
Open

HTMLParser handle_starttag replaces entity references in attribute value even without semicolon #69426

frogcoder mannequin opened this issue Sep 26, 2015 · 2 comments · May be fixed by #95215
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@frogcoder
Copy link
Mannequin

frogcoder mannequin commented Sep 26, 2015

BPO 25239
Nosy @ezio-melotti
Files
  • parserentity.py: an example of the example described
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = None
    created_at = <Date 2015-09-26.16:46:39.294>
    labels = ['type-bug', 'library']
    title = 'HTMLParser handle_starttag replaces entity references in attribute value even without semicolon'
    updated_at = <Date 2015-09-26.17:06:12.433>
    user = 'https://bugs.python.org/frogcoder'

    bugs.python.org fields:

    activity = <Date 2015-09-26.17:06:12.433>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-09-26.16:46:39.294>
    creator = 'frogcoder'
    dependencies = []
    files = ['40588']
    hgrepos = []
    issue_num = 25239
    keywords = []
    message_count = 2.0
    messages = ['251654', '251657']
    nosy_count = 2.0
    nosy_names = ['ezio.melotti', 'frogcoder']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'test needed'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25239'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    Linked PRs

    @frogcoder
    Copy link
    Mannequin Author

    frogcoder mannequin commented Sep 26, 2015

    In the document of HTMLParser.handle_starttag, it states "All entity references from html.entities are replaced in the attribute values." However it will replace the string if it matches ampersand followed by the entity name without the semicolon.

    For example <a href="go?t=buy&currency=usd">foo</a> will produce "t=buy¤cy=usd" as the value of href attribute due to "curren" is the entity name for the currency sign.

    @frogcoder frogcoder mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 26, 2015
    @ezio-melotti
    Copy link
    Member

    ezio-melotti commented Sep 26, 2015

    This seems indeed to be a bug. The relevant bit is at http://www.w3.org/TR/html5/syntax.html#consume-a-character-reference :

    If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.

    Off the top of my head, this paragraph is not implemented in HTMLParser (and it should).
    Also note that <a href="go?t=buy&currency=usd">foo</a> is not valid HTML and the & should have been escaped with &amp;.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: Todo
    1 participant