-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for UL, OL, PRE, other non-P elements #42
Comments
“Q” is another tag that gets dropped in the HTML cleaning process, which should be kept. |
@migurski Right, all tags get dropped. So the only formatters that stay are '\n'. Another option would be some attribute |
It’s not just the tags, it’s also the content. In the example above, the content of the lists is not included in the cleaned text or in the |
I’m testing Goose and finding that elements other than paragraphs are unavailable in
cleaned_text
ortop_node
.What I did:
I expected to find the single-item list with “Xavier Grangier” and all the code samples in the output, but they were not there. I would be interested to see an additional property in the output, something like
source_node
that made the non-cleaned element tree of the original content available.The text was updated successfully, but these errors were encountered: