Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing lxml_html_clean with nh3 #4254

Open
iaindillingham opened this issue Apr 2, 2024 · 0 comments
Open

Consider replacing lxml_html_clean with nh3 #4254

iaindillingham opened this issue Apr 2, 2024 · 0 comments

Comments

@iaindillingham
Copy link
Member

Currently, we use both lxml_html_clean and nh3 for cleaning HTML. Assuming that both packages perform similar functions, using them means we have an unnecessary dependency.

I'd suggest replacing lxml_html_clean with nh3, rather than the other way around, because:

Note: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

However, bleach is deprecated, which is why we added nh3 in the first place.

Usage

We use lxml_html_clean to prepare OS Interactive reports for display on OS Reports:

cleaned = Cleaner(page_structure=False, style=True, kill_tags=["head"]).clean_html(
html
)

We use nh3 to prepare a project's status description:

project.status_description = nh3.clean(markdown(project.status_description))

"status_description_html": nh3.clean(
markdown(self.object.status_description)
),

Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants