-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cody: Handle inline code blocks and use marked to lex markdown #51576
Conversation
Codenotify: Notifying subscribers in OWNERS files for diff 5eeb210...e257ca5. No notifications. |
} | ||
|
||
function highlightLine(line: string, tokens: HighlightedToken[]): string { | ||
let highlightedLine = line | ||
for (const token of tokens) { | ||
highlightedLine = highlightedLine.replaceAll(token.outerValue, getHighlightedTokenHTML(token)) | ||
highlightedLine = highlightedLine.replaceAll(token.innerValue, getHighlightedTokenHTML(token)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@novoselrok Do you have some context why we used the outerValue
here? it was screwing up formatting in some cases, e.g. the boundary could start with a comma (,
) which would be put inside the highlighted line for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was because the Markdown backticks styling was messing up something with the highlights underline. If everything works and looks correct, then go ahead with your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I can test specifically regarding backticks to make sure, ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is still an issue, reverting.
Bundle size report 📦
Look at the Statoscope report for a full comparison between the commits e257ca5 and 5eeb210 or learn more. Open explanation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
if (isStreaming) { | ||
return markdown | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's the odd thing, it already visually renders a codeblock like this, but when I use the lexer
not the render
feature, it would not detect the in-progress code block as such. That's so odd :(
This PR visually works the way you describe but by basically not escaping HTML while the message is not done. so it might render a <h1>Title</h1>
as a title and then shift to showing the <h1>
tags when done. I found this to be better (and less common) than the inverse where the encoding inside the code block is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
Fixes #51768 When working on #51576 I was trying to fix some issues where the hallucination detection logic highlighting was not working well and later, based on Rok's input, noticed that this was causing a regression. The problem? I forgot to push my revert 😭 <img width="896" alt="Screenshot 2023-05-11 at 12 29 47" src="https://github.com/sourcegraph/sourcegraph/assets/458591/af334243-7a30-4f4b-af90-ec8f5e2bc86e"> So here's the revert for the changes in #51576 ## Test plan <img width="556" alt="Screenshot 2023-05-11 at 12 31 03" src="https://github.com/sourcegraph/sourcegraph/assets/458591/c7ce8465-2ac7-4b49-b205-3555c433942a"> <!-- All pull requests REQUIRE a test plan: https://docs.sourcegraph.com/dev/background-information/testing_principles -->
Closes #51528
We found more issues where the naive markdown parser would not work well, including inline code blocks. To fix this, this PR uses the
marked
lexer to parse Markdown instead of doing our own thing. This comes with some other issues, though:For some reason, tilde fenced blocks are not working if we use the inline lexer on the whole text. Instead, we have to use a combination of the normal lexer (for root blocks) and inline lexer for detecting nested layouts like inline code blocks. We only use the inline lexer for escaping and not the hallucination detection which means that inline code blocks can have hallucination spans which actually seems desired from my testing
The
marked
lexer produces different results for documents that aren't fully streamed yet. In general, we have some issues already where applying the markdown formatter for every token that is streamed in, can cause layout shifts and I observed another issue where a top-line code block was not detected as one by the lexer, if the ending fence was missing. What's odd is that it does seem to parse the markdown in this case. This resulted in a very noticeable layout shift from>html ...
to<html ...
which was super annoying. To fix this, we now do not escape HTML while the message is in flight instead. This means that if you let Cody generate some HTML (without using markdown) you'd see it formatted first before the full message is received and we can properly replace.IMO this is still desired as there is no harm in rendering the HTML (especially when we do more aggressive deny-listing here) however you still want the html tags shown to you at the end because we are, after all, we generate code and sometimes Cody is not using the markdown properly.
There are some other changes to the hallucination detection that I'll mark inline
Test plan
Did a bunch of different prompts to ensure it kinda works. Also piped the test cases through the right functions and added a case for inline code snippets.