-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Accessibility bug] Semantic whitespace implied by block elements isn't retained properly #369
Comments
Can you make a list of which block elements you want to handle? |
https://developer.mozilla.org/en-US/docs/Web/HTML/Block-level_elements#Elements using this or a similar list would be a good idea I think |
Transcribing that list here:
Can you take a stab at implementing this? I don't think I'm going to get to this for a while. |
I wrote a rich text layouter recently that imports HTML where I got this as a by product, so right now I have no immediate need for this. I just thought it'd be a nice thing to add at some point |
I took a stab at this ticket. The solution is not perfect but allows for readable and accessible text. While discussion on my approach should continue on the PR I wanted to bring up on this discussion: The whitespace character should be a NEWLINE and not a SPACE.
The main reason is because more complex use cases, such as lists, become unreadable with a space character -- all the blocks bleed together. |
Yeah you are correct I got that wrong, between block elements there should be a newline since that is also how it is rendered in a browser 👍 |
block elements are tracked and a newline is inserted when they are stripped. new tests are included.
Separate block elements imply a visual spacing when seen on screen, which should be retained when removing those elements to keep proper word separation. Bleach doesn't seem to handle this right now:
The expected result would be:
'Test! Hello'
(since<p>
is a block element, and therefore two of them after each other implies a visual line break that is vital for proper readability of the text)Edit: just to make this clear, I am not proposing parsing the CSS to find out what is a block element or something over-the-top like that. But at least reasonable default behavior would be nice which covers proper semantic HTML (without support for rogue CSS that unreasonably makes
<p>
inline or nonsense like that). That would work properly for 99% of the web content out there, unlike the current implementation which seems destined to produce missing vital whitespace on any non-trivial page.The text was updated successfully, but these errors were encountered: