Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit: Improve response reliability with different chat models #3192

Merged
merged 5 commits into from
Mar 27, 2024

Conversation

umpox
Copy link
Contributor

@umpox umpox commented Feb 16, 2024

Description

This PR improves the responseTransformer (prev contentSanitizer) to better handle different responses from the LLM.

Specific changes:

  • Better handling of XML tags, if the numbers differ from what we specified
  • Markdown blocks are stripped (e.g. ```typescript foo ```)

Test plan

  1. Make Edits
  2. Confirm that there's no regressions in output quality

@umpox umpox force-pushed the tr/edit-response-reliability branch from 71b8a1d to a6a122c Compare March 26, 2024 16:24
@umpox umpox marked this pull request as ready for review March 26, 2024 16:28
@umpox umpox requested a review from abeatrix March 26, 2024 16:31
Copy link
Contributor

@abeatrix abeatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done! I tested this with Claude 2.0, 2.1, instant, 3 Opus, 3 Sonnet, 3 Haiku, Ollama llama2, gpt-3.5, gpt-4, didn't spot any big issue with Edit and Doc commands!

With test commands, most of the models work fine except GPT-3.5 where it would behave strangely and open 2 files before inserting. I can reproduce on main though so not related to this PR but just wanted to flag:

Screen.Recording.2024-03-26.at.7.55.00.PM.mov

I did receive this response when using ChatGPT 4 once (I can't reproduce it again):
image

Maybe we should skip all the none-code responses (responses that are not enclosed with the XML tag)?

@umpox
Copy link
Contributor Author

umpox commented Mar 27, 2024

@abeatrix You are the best! Thank you so much for thoroughly testing this.

Maybe we should skip all the none-code responses (responses that are not enclosed with the XML tag)?

Hmmm, I thought we only look for the XML tag and skip if it's not there. I'll double check. There's still a potential issue where we can a non-code response in the XML tag though 🤔

I couldn't find any other issues either so I'll merge this and look at that in a follow-up

@umpox umpox enabled auto-merge (squash) March 27, 2024 09:26
@umpox umpox merged commit 64bfb54 into main Mar 27, 2024
19 of 20 checks passed
@umpox umpox deleted the tr/edit-response-reliability branch March 27, 2024 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants