Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing lxml_html_clean with nh3 #4254

Closed
iaindillingham opened this issue Apr 2, 2024 · 0 comments · Fixed by #4384
Closed

Consider replacing lxml_html_clean with nh3 #4254

iaindillingham opened this issue Apr 2, 2024 · 0 comments · Fixed by #4384
Assignees

Comments

@iaindillingham
Copy link
Member

Currently, we use both lxml_html_clean and nh3 for cleaning HTML. Assuming that both packages perform similar functions, using them means we have an unnecessary dependency.

I'd suggest replacing lxml_html_clean with nh3, rather than the other way around, because:

Note: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

However, bleach is deprecated, which is why we added nh3 in the first place.

Usage

We use lxml_html_clean to prepare OS Interactive reports for display on OS Reports:

cleaned = Cleaner(page_structure=False, style=True, kill_tags=["head"]).clean_html(
html
)

We use nh3 to prepare a project's status description:

project.status_description = nh3.clean(markdown(project.status_description))

"status_description_html": nh3.clean(
markdown(self.object.status_description)
),

Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants