-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove script content with tag #67
Comments
👍 :-) |
Ok, for me to continue here... |
So... should I continue with my aproach or not? |
This probably shouldn't be limited to I'd suggest |
👍 |
Another use-case for this feature is to clean incoming HTML e-mail. These are ofter comprised of full HTML documents, with html, head, body, script and style tags. Running bleach on these leaves the full content of the script and style tags as the page content. Only workaround I've found so far is to allow script and style tags in bleach, and clear them in a later step with lxml.html.clean.Cleaner. |
Bleach operates on document fragments, not full documents. Full document support is explicitly listed in the "Non Goals" section of the docs |
That's a shame. This is the only missing feature for that use-case. |
I've thought about this for a while and I think I'm going to pass on it for There are two big reasons for doing this:
Towards that, I've clarified the goals/non-goals language and updated the docs to make it clearer what Given that, I'm going to pass on this and close it out. I'm game for talking about building a |
I'm opened a new issue for adding a |
In
clean()
, there's no way to remove the contents of a<script>
tag. In discussion in #57 we decided the best approach was an additional kwarg and optional treewalk to remove a<script>
and any of its children, including text nodes (e.g. the script content).The text was updated successfully, but these errors were encountered: