Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cody: Alternative implementation for markdown escaping #51151

Merged
merged 3 commits into from
Apr 26, 2023

Conversation

philipp-spiess
Copy link
Contributor

This replaces #51144

The previous implementation was limiting in various factors:

  • The sanitze option of the Markdown library is deprecated since forever. It actually logged a bunch of nasty warnings to the console. The recommended way to deal with escaping is by using a library (like DOMPurify in our case) after the markdown step.

    We already do the above, hooray! This means that in no point in time did we ever had a XSS vulnerability but only a style related issue.

    The problem is that sometimes Cody emits HTML outside of code blocks and any visual HTML is allowed by our current DOMPurify config. We could change this, but then we would have to maintain a complicated allowlist for all HTML tags generated in the markdown transformation and all compliant tags are removed. In addition to that, Cody could still emit these HTML events outside of code blocks. E.g. use this prompt "Write some HTML but don't use Markdown to format it". The rule of thumb here is that anything from cody should be relayed to the user 1:1, so we can neither remove nor "render" some tags (even if they are just empty <div> that do nothing). When Cody returns <div>I’m a banana</div> we want to surface the <div> string to the user.

  • We have to support two use cases where we insert HTML into the message deliberately (and we will have more of these as we add more code intel goodies): Error messages and hallucination detection. I wanted to keep these abstractions similar to where they are though, as anything more complicated would require bigger restructurings.

Because of this, I came up with a different implementation: We have two clear boundaries of where these messages come from: Either form the Cody API endpoint or from the user input (because similarly, if a user types <div>I’m a banana</div> into the input box and presses enter, it would be strange if only I’m a banana shows up in the prompt.).

At these distinct places, we now call escapeCodyMarkdown which will replace < and > to &lt; and &gt; respectively (as long as the content is outside of a code block where we just leave it as-is.

Remember: This is not a security related XSS prevention. We already do that because the output from the markdown parser is being escaped but DOMPurify. The goal here is just to preserve HTML tags in the prompt and relay them to the end user.

ToDo

  • Apply the same fix to the web client

Test plan

Screenshot 2023-04-26 at 14 27 41

Screenshot 2023-04-26 at 14 17 13

@philipp-spiess philipp-spiess requested review from eseliger, abeatrix and a team April 26, 2023 12:47
@philipp-spiess philipp-spiess self-assigned this Apr 26, 2023
@cla-bot cla-bot bot added the cla-signed label Apr 26, 2023
@github-actions github-actions bot added the team/code-exploration Issues owned by the Code Exploration team label Apr 26, 2023
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Apr 26, 2023

Codenotify: Notifying subscribers in OWNERS files for diff 3d20392...27d0914.

Notify File(s)
@sourcegraph/code-exploration-devs client/common/src/util/markdown/markdown.ts

@sg-e2e-regression-test-bob
Copy link

sg-e2e-regression-test-bob commented Apr 26, 2023

Bundle size report 📦

Initial size Total size Async size Modules
-0.00% (-0.02 kb) 0.00% (+0.38 kb) 0.00% (+0.40 kb) 0.00% (0)

Look at the Statoscope report for a full comparison between the commits 27d0914 and 3ec3c34 or learn more.

Open explanation
  • Initial size is the size of the initial bundle (the one that is loaded when you open the page)
  • Total size is the size of the initial bundle + all the async loaded chunks
  • Async size is the size of all the async loaded chunks
  • Modules is the number of modules in the initial bundle

Copy link
Contributor

@vdavid vdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left one minor suggestion to further improve it.

Co-authored-by: David Veszelovszki <veszelovszki@gmail.com>
Copy link
Contributor

@abeatrix abeatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* that becomes necessary, we can add that.
*/
export function renderMarkdown(markdown: string): string {
export function renderCodyMarkdown(markdown: string): string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this export so it's more obvious if we ever import the wrong file by accident (it happened to me while working on this PR)

@philipp-spiess philipp-spiess merged commit 21b8501 into main Apr 26, 2023
15 checks passed
@philipp-spiess philipp-spiess deleted the ps/cody-escaping branch April 26, 2023 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed team/code-exploration Issues owned by the Code Exploration team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants