Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser: subsequent duplicate attributes should be ignored #86987

Closed
karlcow mannequin opened this issue Jan 4, 2021 · 2 comments
Closed

HTMLParser: subsequent duplicate attributes should be ignored #86987

karlcow mannequin opened this issue Jan 4, 2021 · 2 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@karlcow
Copy link
Mannequin

karlcow mannequin commented Jan 4, 2021

BPO 42821
Nosy @ezio-melotti, @karlcow

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/ezio-melotti'
closed_at = <Date 2021-01-06.06:46:27.971>
created_at = <Date 2021-01-04.08:00:54.637>
labels = ['invalid', 'type-bug', 'library', '3.10']
title = 'HTMLParser: subsequent duplicate attributes should be ignored'
updated_at = <Date 2021-01-06.06:46:27.969>
user = 'https://github.com/karlcow'

bugs.python.org fields:

activity = <Date 2021-01-06.06:46:27.969>
actor = 'ezio.melotti'
assignee = 'ezio.melotti'
closed = True
closed_date = <Date 2021-01-06.06:46:27.971>
closer = 'ezio.melotti'
components = ['Library (Lib)']
creation = <Date 2021-01-04.08:00:54.637>
creator = 'karlcow'
dependencies = []
files = []
hgrepos = []
issue_num = 42821
keywords = []
message_count = 2.0
messages = ['384308', '384475']
nosy_count = 2.0
nosy_names = ['ezio.melotti', 'karlcow']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue42821'
versions = ['Python 3.10']

@karlcow
Copy link
Mannequin Author

karlcow mannequin commented Jan 4, 2021

This comes up while working on bpo-41748

browser input
data:text/html,<!doctype html><div class="bar" class="foo">text</div>

browser output
<div class="bar">text</div>

Actual HTMLParser output

see #24072 (comment)
('starttag', 'div', [('class', 'bar'), ('class', 'foo')])]

Expected HTMLParser output
('starttag', 'div', [('class', 'bar')])]

@karlcow karlcow mannequin added 3.10 only security fixes stdlib Python modules in the Lib dir labels Jan 4, 2021
@ezio-melotti
Copy link
Member

If we follow the behavior of the browser, we will have to pick one of the two values and discard the other, making this value unaccessible. If we provide both, scripts and libraries that use HTMLParser will have access to both and can decide what to do.

For example BeautifulSoup already does the right thing:
>>> bs4.BeautifulSoup('<!doctype html><div class="bar" class="foo">text</div>')
<!DOCTYPE html>
<html><body><div class="bar">text</div></body></html>

Changing this might also break code that rely on this behavior. I'm therefore going to close this as "not a bug".

@ezio-melotti ezio-melotti self-assigned this Jan 6, 2021
@ezio-melotti ezio-melotti added the type-bug An unexpected behavior, bug, or error label Jan 6, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant