Cody: Alternative implementation for markdown escaping #51151

philipp-spiess · 2023-04-26T12:47:32Z

This replaces #51144

The previous implementation was limiting in various factors:

The sanitze option of the Markdown library is deprecated since forever. It actually logged a bunch of nasty warnings to the console. The recommended way to deal with escaping is by using a library (like DOMPurify in our case) after the markdown step.

We already do the above, hooray! This means that in no point in time did we ever had a XSS vulnerability but only a style related issue.

The problem is that sometimes Cody emits HTML outside of code blocks and any visual HTML is allowed by our current DOMPurify config. We could change this, but then we would have to maintain a complicated allowlist for all HTML tags generated in the markdown transformation and all compliant tags are removed. In addition to that, Cody could still emit these HTML events outside of code blocks. E.g. use this prompt "Write some HTML but don't use Markdown to format it". The rule of thumb here is that anything from cody should be relayed to the user 1:1, so we can neither remove nor "render" some tags (even if they are just empty <div> that do nothing). When Cody returns <div>I’m a banana</div> we want to surface the <div> string to the user.
We have to support two use cases where we insert HTML into the message deliberately (and we will have more of these as we add more code intel goodies): Error messages and hallucination detection. I wanted to keep these abstractions similar to where they are though, as anything more complicated would require bigger restructurings.

Because of this, I came up with a different implementation: We have two clear boundaries of where these messages come from: Either form the Cody API endpoint or from the user input (because similarly, if a user types <div>I’m a banana</div> into the input box and presses enter, it would be strange if only I’m a banana shows up in the prompt.).

At these distinct places, we now call escapeCodyMarkdown which will replace < and > to < and > respectively (as long as the content is outside of a code block where we just leave it as-is.

Remember: This is not a security related XSS prevention. We already do that because the output from the markdown parser is being escaped but DOMPurify. The goal here is just to preserve HTML tags in the prompt and relay them to the end user.

ToDo

Apply the same fix to the web client

Test plan

sourcegraph-bot · 2023-04-26T12:49:55Z

Codenotify: Notifying subscribers in OWNERS files for diff 3d20392...27d0914.

Notify	File(s)
@sourcegraph/code-exploration-devs	client/common/src/util/markdown/markdown.ts

sg-e2e-regression-test-bob · 2023-04-26T12:54:57Z

Bundle size report 📦

Initial size	Total size	Async size	Modules
-0.00% (-0.02 kb)	0.00% (+0.38 kb)	0.00% (+0.40 kb)	0.00% (0)

Look at the Statoscope report for a full comparison between the commits 27d0914 and 3ec3c34 or learn more.

Open explanation

Initial size is the size of the initial bundle (the one that is loaded when you open the page)
Total size is the size of the initial bundle + all the async loaded chunks
Async size is the size of all the async loaded chunks
Modules is the number of modules in the initial bundle

client/cody-shared/src/chat/markdown.ts

vdavid

Looks good, left one minor suggestion to further improve it.

Co-authored-by: David Veszelovszki <veszelovszki@gmail.com>

abeatrix

LGTM

philipp-spiess · 2023-04-26T14:46:56Z

client/cody-shared/src/chat/markdown.ts

 * that becomes necessary, we can add that.
 */
-export function renderMarkdown(markdown: string): string {
+export function renderCodyMarkdown(markdown: string): string {


I renamed this export so it's more obvious if we ever import the wrong file by accident (it happened to me while working on this PR)

Cody: Alternative implementation for markdown escaping

02cf7c9

philipp-spiess requested review from eseliger, abeatrix and a team April 26, 2023 12:47

philipp-spiess self-assigned this Apr 26, 2023

cla-bot bot added the cla-signed label Apr 26, 2023

github-actions bot added the team/code-exploration Issues owned by the Code Exploration team label Apr 26, 2023

vdavid reviewed Apr 26, 2023

View reviewed changes

client/cody-shared/src/chat/markdown.ts Outdated Show resolved Hide resolved

vdavid approved these changes Apr 26, 2023

View reviewed changes

Update client/cody-shared/src/chat/markdown.ts

573637e

Co-authored-by: David Veszelovszki <veszelovszki@gmail.com>

abeatrix approved these changes Apr 26, 2023

View reviewed changes

philipp-spiess commented Apr 26, 2023

View reviewed changes

Also fix web client

27d0914

philipp-spiess merged commit 21b8501 into main Apr 26, 2023
15 checks passed

philipp-spiess deleted the ps/cody-escaping branch April 26, 2023 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cody: Alternative implementation for markdown escaping #51151

Cody: Alternative implementation for markdown escaping #51151

philipp-spiess commented Apr 26, 2023

sourcegraph-bot commented Apr 26, 2023 •

edited

Loading

sg-e2e-regression-test-bob commented Apr 26, 2023 •

edited

Loading

vdavid left a comment

abeatrix left a comment

philipp-spiess Apr 26, 2023

Cody: Alternative implementation for markdown escaping #51151

Cody: Alternative implementation for markdown escaping #51151

Conversation

philipp-spiess commented Apr 26, 2023

ToDo

Test plan

sourcegraph-bot commented Apr 26, 2023 • edited Loading

sg-e2e-regression-test-bob commented Apr 26, 2023 • edited Loading

Bundle size report 📦

vdavid left a comment

Choose a reason for hiding this comment

abeatrix left a comment

Choose a reason for hiding this comment

philipp-spiess Apr 26, 2023

Choose a reason for hiding this comment

sourcegraph-bot commented Apr 26, 2023 •

edited

Loading

sg-e2e-regression-test-bob commented Apr 26, 2023 •

edited

Loading