-
Notifications
You must be signed in to change notification settings - Fork 13.3k
fix: add remark plugin to render raw HTML as literal text #16505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add remark plugin to render raw HTML as literal text #16505
Conversation
Test sheet reasoning_content: Final content: This patch aligns the WebUI Markdown pipeline with industry-standard LLM renderers (OpenAI ChatGPT, Hugging Face Spaces, Anthropic...) by ensuring raw HTML safety without sacrificing formatting fidelity This patch doesn't just "sanitize HTML" : it neutralizes raw XML-like output (e.g. <think>, <tool>, <meta>, <response>, <step>, <node>, <data>), ensuring these symbolic or structural tags, whether produced by LLMs or part of generic XML fragments, are displayed as plain text rather than parsed as DOM, preserving structure while keeping the UI safe and consistent. |
To reproduce the issue now, you need to explicitly ask the model to output XML-like tags in the stream, which is already a bit of a hack, since LLMs naturally know they’re emitting Markdown. |
Test prompt : Write HTML with real blank lines and indentation inside a code block and then output the same HTML outside a code block, so we can compare the rendering. https://chatgpt.com/share/68ea480b-5c3c-8012-9201-62cfb687dc67 And also on llama.cpp with this PR : conversation_6b69f066-c32c-4b0b-9f0d-92dad9c31764_tu_peux_crire_exacte.json At this point, we’re actually doing slightly better than some major LLM WebUIs so that’s a good sign 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff overall! Just left a few architectural remarks that need to be addressed.
Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs do ensuring consistent and safe Markdown rendering Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the Markdown AST into plain-text equivalents while preserving indentation and line breaks. This ensures consistent rendering and prevents unintended HTML execution, without altering valid Markdown structure Kept 'remarkRehype' in the pipeline since it performs the required conversion from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization Refined the link-enhancement logic to skip unnecessary DOM rewrites, fixing a subtle bug where extra paragraphs were injected after the first line due to full innerHTML reconstruction, and ensuring links open in new tabs only when required Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml -> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify
6555d6d
to
24ec6ac
Compare
* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...
fix: add remark plugin to render raw HTML as literal text
Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs
do ensuring consistent and safe Markdown rendering
Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the
Markdown AST into plain-text equivalents while preserving indentation and
line breaks. This ensures consistent rendering and prevents unintended HTML
execution, without altering valid Markdown structure
Kept 'remarkRehype' in the pipeline since it performs the required conversion
from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization
Refined the link-enhancement logic to skip unnecessary DOM rewrites,
fixing a subtle bug where extra paragraphs were injected after the first
line due to full innerHTML reconstruction, and ensuring links open in new
tabs only when required
Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml
-> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify
Close #16417