New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow MathML Core tags in sanitized post content #19806
Comments
I started work on this a bit, and thinking about mastodon's philosophy a bit I think it'd be good to scrub all length percentage attributes from MathML tags. This means that e.g. an exemption to this, however, is probably |
I was thinking a out this yesterday. I'm glad someone else is taking it on! But I don't think this solves the problem of rendering math in native apps. |
Apps already have to render a subset of html (the content body in existing activities) this would just require them to render MathML in addition, if they want to support Math. Really, if we want different apps, instances, etc. to agree on math formatting it needs to be presented in a standard way. This isn't currently possible because different instances set up math differently (e.g. some use But if instances produce MathML, apps and other instances could consume it generically |
now that I think about it, the confusion might be around this issue, which concerns "sanitation". There's a difference between fedi software (frontends, apps, and relays) producing content, and being able to display/replicate it. This issue only considers the latter for the "mastodon" project. If you were worried about apps, it would be good to make a similar issue for them. First though it would be good for an instance to actually be producing MathML, and I will be working on that after this is complete. |
Yes, this is definitely an improvement on the current situation. |
See mastodon#19806 for more info. Test Plan: ---------- ``` $ RAILS_ENV=test bundle exec rspec spec/lib/sanitize_config_spec.rb Randomized with seed 19230 11/11 |========================================================================================== 100 ===========================================================================================>| Time: 00:00:00 Finished in 0.07389 seconds (files took 1.67 seconds to load) 11 examples, 0 failures Randomized with seed 19230 Coverage report generated for RSpec to /home/pounce/programming/mastodon/coverage. 1343 / 35156 LOC (3.82%) covered. ``` observed 100% code coverage of lib/sanitize_ext/sanitize_config.rb. closes mastodon#19806
The core Mastodon project has never been interested in introducing rich-text formatting into posts, because it complicates the UI and adds many additional concerns when compared to the current plain-text nature of Mastodon posts. (For example, rendering support for MathML would be a huge burden for native mobile apps that do not have access to a browser implementation to rely on). If we ever decided to add rich text formatting, there would be many other lower-hanging fruit to support, such as bold, italics, etc, that are much more likely to have wider, cross-platform support. However, we currently don't believe rich-text formatting matches Mastodon's model well and is better suited for other software / clients to implement. However, if you wanted to write code for transforming incoming MathML content from other servers into a plain-text equivalent, to preserve semantics, then I believe the current project policy is that we would consider it. While that may be a better step forward from a compatibility standpoint, I think there are many logistical/practicality challenges to handle there, especially since users rarely author MarhML markup manually and instead are more accustomed to having it produced from e.g. pseudo-TeX. I'm also not sure whether MathML has any support for "round-tripping" source text like this, which would probably be necessary to implement this well |
(sorry, didn't mean to close, happy to leave this issue open to consolidate discussion on alternatives) |
Thanks for the response. I've been somewhat expecting this since MathML can be used to produce rich text, even if it not designed to. <math>
<semantics>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<annotation encoding="application/x-tex">\frac{1}{2}</annotation>
</semantics>
</math> provides the source TeX of the provided MathML. I have a bit of a philosophical question though: |
It might be a struggle to render mathml without annotation, but since we're trying to sanitize chaotic HTML, I think it's safe to assume that "well behaved" incoming mathML has a first node of |
@4e554c4c the "source" field is in whatever language the user happens to author it in—the spec makes no guarantees about it even being human readable. It's purely designed for round-tripping content when edited by multiple client applications, it's not appropriate for display. |
MathML is frequently inadequate in practice. Regardless, is rendering not a client side concern? It seems to be a nice feature of the web client to offer to inline a polyfill for rendering math text (perhaps with an option to disable.) But as far as the consistency of the rendering, is this really in scope beyond its impact to the layout? (Which could be addressed by overflow rules.) Bigger picture. LaTeX is supported a lot of places. It's a common markdown extension that's supported here on GitHub for example: It feels like this could dovetail with something like #18958 |
See mastodon#19806 for more info. Test Plan: ---------- ``` $ RAILS_ENV=test bundle exec rspec spec/lib/sanitize_config_spec.rb -f d Randomized with seed 26282 Sanitize::Config ::MASTODON_OUTGOING keeps a with href and rel tag, not adding to rel or target if url is local behaves like common HTML sanitization removes a with unsupported scheme in href removes a with unparsable href keeps math keeps ul removes a without href and only keeps text content removes a without href keeps a with href keeps a with translate="no" removes "translate" attribute with invalid value keeps h1 does not re-interpret HTML when removing unsupported links keeps title in abbr keeps start and reversed attributes of ol keeps a with supported scheme and no host correctly sanitizes linethickness Finished in 0.61166 seconds (files took 4.76 seconds to load) 16 examples, 0 failures Randomized with seed 26282 ``` observed 100% code coverage of lib/sanitize_ext/sanitize_config.rb. See mastodon#19806, glitch-soc#1432
See mastodon#19806 for more info. Test Plan: ---------- ``` $ RAILS_ENV=test bundle exec rspec spec/lib/sanitize_config_spec.rb -f d Randomized with seed 26282 Sanitize::Config ::MASTODON_OUTGOING keeps a with href and rel tag, not adding to rel or target if url is local behaves like common HTML sanitization removes a with unsupported scheme in href removes a with unparsable href keeps math keeps ul removes a without href and only keeps text content removes a without href keeps a with href keeps a with translate="no" removes "translate" attribute with invalid value keeps h1 does not re-interpret HTML when removing unsupported links keeps title in abbr keeps start and reversed attributes of ol keeps a with supported scheme and no host correctly sanitizes linethickness Finished in 0.61166 seconds (files took 4.76 seconds to load) 16 examples, 0 failures Randomized with seed 26282 ``` observed 100% code coverage of lib/sanitize_ext/sanitize_config.rb. See mastodon#19806, glitch-soc#1432
Closing in favor of #26943 |
Pitch
MathML Core is a standard language to describe the structure and content of mathematical expressions in browsers. Unlike TeX-family languages, MathML is not an entire typesetting language reliant on macro-processing. Instead, it is reliant on unicode and other features of browser engines to efficiently display mathematics.
This specification is well-supported on Firefox and Safari, and is currently being shipped in Chrome (it is no longer behind a browser feature in chrome v109, the current beta).
I propose that, behind an enabled-by-default feature, MathML tags should not be sanitized from the content body of activities. This allows mathematical posts from across the fediverse to retain their mathematical content, and rendered on browsers that do not support it. This will render poorly on older versions of Google Chrome, but it will be no worse than mathematical rendering already looks.
See thread for more info: https://types.pl/@pounce/109286683477125171
Motivation
Previous suggestions to bundle mathJAX (#822) were turned down due to performance loss for all users, when only a few instances/users care about mathematics. However, this will not be the case if instances which produce mathematics, render it to MathML themselves! This puts the majority of the computational effort and javascript bloat on instances which care about it, while other instances will be able to simply consume the content in their web browser.
ActivityPub example
For example, a math-based instance could produce the activity
The user would compose the "source/content" post, containing LaTeX math syntax, which would be rendered to MathML Core. This could then be rendered on any mastodon instance not scrubbing the math tags.
This is important, since several math-based instances exist (such as https://mathstodon.xyz ,https://types.pl) and produce math-based posts. However, when these posts federate to other instances they cannot be rendered, since other instances do not have MathJAX installed. Thus, a more portable version based on open standards is necessary.
The text was updated successfully, but these errors were encountered: