-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cody: Alternative implementation for markdown escaping #51151
Conversation
Codenotify: Notifying subscribers in OWNERS files for diff 3d20392...27d0914.
|
Bundle size report 📦
Look at the Statoscope report for a full comparison between the commits 27d0914 and 3ec3c34 or learn more. Open explanation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left one minor suggestion to further improve it.
Co-authored-by: David Veszelovszki <veszelovszki@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* that becomes necessary, we can add that. | ||
*/ | ||
export function renderMarkdown(markdown: string): string { | ||
export function renderCodyMarkdown(markdown: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this export so it's more obvious if we ever import the wrong file by accident (it happened to me while working on this PR)
This replaces #51144
The previous implementation was limiting in various factors:
The
sanitze
option of the Markdown library is deprecated since forever. It actually logged a bunch of nasty warnings to the console. The recommended way to deal with escaping is by using a library (likeDOMPurify
in our case) after the markdown step.We already do the above, hooray! This means that in no point in time did we ever had a XSS vulnerability but only a style related issue.
The problem is that sometimes Cody emits HTML outside of code blocks and any visual HTML is allowed by our current
DOMPurify
config. We could change this, but then we would have to maintain a complicated allowlist for all HTML tags generated in the markdown transformation and all compliant tags are removed. In addition to that, Cody could still emit these HTML events outside of code blocks. E.g. use this prompt "Write some HTML but don't use Markdown to format it". The rule of thumb here is that anything from cody should be relayed to the user 1:1, so we can neither remove nor "render" some tags (even if they are just empty<div>
that do nothing). When Cody returns<div>I’m a banana</div>
we want to surface the<div>
string to the user.We have to support two use cases where we insert HTML into the message deliberately (and we will have more of these as we add more code intel goodies): Error messages and hallucination detection. I wanted to keep these abstractions similar to where they are though, as anything more complicated would require bigger restructurings.
Because of this, I came up with a different implementation: We have two clear boundaries of where these messages come from: Either form the Cody API endpoint or from the user input (because similarly, if a user types
<div>I’m a banana</div>
into the input box and presses enter, it would be strange if onlyI’m a banana
shows up in the prompt.).At these distinct places, we now call
escapeCodyMarkdown
which will replace<
and>
to<
and>
respectively (as long as the content is outside of a code block where we just leave it as-is.Remember: This is not a security related XSS prevention. We already do that because the output from the markdown parser is being escaped but
DOMPurify
. The goal here is just to preserve HTML tags in the prompt and relay them to the end user.ToDo
Test plan