Skip to content
This repository has been archived by the owner on Apr 29, 2024. It is now read-only.

Does not censor properly when using a markdown parser #1

Open
callmeclover opened this issue Mar 21, 2024 · 0 comments
Open

Does not censor properly when using a markdown parser #1

callmeclover opened this issue Mar 21, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@callmeclover
Copy link
Contributor

callmeclover commented Mar 21, 2024

If we try to run censor on the text "fuck", it returns "f***" as expected. The issue arises, however, when we attempt to implement a markdown parser like Earmark. This runs us into a predicament, because if we run censor before we run as_html, it would parse the censor as markdown. If we run censor after we run as_html, it would parse self censoring as markdown in some cases, sabotaging the filter. Running it after also makes "<p>fuck</p>" return "<p>f****/p>". We could try to make it run an iterator over some html to censor it, but then we would run into the same issue with self censoring.

Possible solutions:

  • Use a different character than "*" to solve "</p>" -> "/p>"
  • Dispute this issue with Earmark's developers
  • Implement our own markdown parser
  • Use a different markdown parser (probably a rust crate like pulldown)
  • Don't support markdown officially

I'm probably going to try pulldown before implementing our own parser.

UPDATE: Closer testing reveals that running censor after as_html does not affect self censoring. I will only be forwarding the "</p>" -> "/p>" section of this to finnbear's repo.

EDIT: Solution 1 won't work, since it detects all characters. Either add whitespace at the end of a string (probably not, that's just wasting characters) or remove the tags that earmark/pulldown adds.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant