Remove trailing Markdown code tags in completion suggestions #726

bartleusink · 2024-04-11T18:17:29Z

This is my attempt at fixing #686
I have also added a unit test and adapted the MockProvider a little bit so that unit tests can test different responses.
As a result I've also changed the assertions in the test_handle_stream_request test to what I think are the intended responses.

welcome · 2024-04-11T18:17:32Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.

You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

krassowski · 2024-04-12T08:02:02Z

packages/jupyter-ai/jupyter_ai/tests/completions/test_handlers.py

+        MockProvider,
+        {
+            "model_id": "model",
+            "responses": ["```python\nTest python code\n```"],


What if there is a trailing new line after the triple backtick? Will the test still passe? Maybe worth to paramterise it?

krassowski

It looks good in general, thank you! One suggestion to test if it works with trailing whitespace/new lines too.

bartleusink · 2024-04-12T12:57:49Z

Thanks @krassowski! The current code does not work with trailing whitespace/newlines (since in my testing GPT never returned any whitespace after the closing markdown suffix) but I will add some code to handle any trailing whitespace and will adjust the test accordingly.

srdas · 2024-04-12T14:19:56Z

@bartleusink Thanks for taking up this issue #698, much appreciated. I was also trying to fix this one, see PR #698, which I will drop once we have a better one (thanks @krassowski for your feedback on that one). I tested your modification and am getting erratic behavior, can you also try your solution out to see if I am doing something wrong? See two examples below (one where the backticks remain, and one where they are gone but the forward ticks remain). Is this the intended behavior?

FYI, I am using bedrock-chat:anthropic.claude-instant-v1 as the LLM, but each LLM may exhibit different behavior as well.

krassowski · 2024-04-12T14:27:12Z

FYI @srdas #717 will enable tuning the post-processing on per-provider level, so this should help with your use case.

krassowski · 2024-04-12T14:30:52Z

Frankly, the issue is that bedrock-chat:anthropic.claude-instant-v1 is not a great code completion LLM because it generates a lot of explanatory comments without putting them into the right comment syntax. @srdas did you try adjusting the template by overriding get_completion_prompt_template() for bedrock-chat provider and putting some more explicit instructions (or maybe even examples) to make it add # before comments and not add triple backtins?

srdas

I spoke with @dlqqq and we both are inclined to move forward with this PR, thanks a lot to @bartleusink @krassowski for all your help with resolving this issue. I closed my PR #698 in favor of this.

However, the proposed PR still fails if the LLM generates commentary. We should not expect every LLM to generate a suitable code completion even with artful prompting. After all, a LLM may sometimes only generate code, and sometimes also generate commentary. We should make a best effort at generating correct code even when the LLM produces unnecessary commentary.

If we expect LLMs to never generate commentary, why not do some post-processing to “assist” LLMs which are unable to do so? I recommend using a line-by-line approach as suggested in my PR #698. More specifically, we should drop all lines of text not surrounded by markdown code delimiters.

Can you implement this and include the unit tests?

krassowski · 2024-04-12T21:02:31Z

If we expect LLMs to never generate commentary, why not do some post-processing to “assist” LLMs which are unable to do so?

This is not what I meant to say (and I think I did not say this). The expectation in my view should be that LLMs generate commentary wrapped in appropriate comment syntax for given language.

More specifically, we should drop all lines of text not surrounded by markdown code delimiters.

This sounds like a good default, especially if customisable, but I would suggest that this is planned for separately and implemented in a new pull request. This is because implementing it correctly is not trivial if you consider that an LLM may begin generating code with or without prefixing it with markdown code delimiters, and that those may be included in the code itself, especially code for generating Markdown (which is not an uncommon task, including in notebooks).

I spoke with ...

That's great, and I understand that the internal back channels may be easier for communication, but it would be great if design decisions for open source projects were discussed in public issues, or at least in public meetings :)

3coins · 2024-04-16T17:55:51Z

@krassowski

That's great, and I understand that the internal back channels may be easier for communication, but it would be great if design decisions for open source projects were discussed in public issues, or at least in public meetings :)

We agree, infact @srdas had planned to discuss this in the last JLab meeting on Wed, but got sidetracked with other priorities, he has attempted to post all discussion points in his comments here.

I see that @dlqqq has approved this PR, is there anything else missing before we merge this?

dlqqq · 2024-04-22T21:42:38Z

@3coins I was about to merge this before realizing that this PR may introduce conflicts with #717, which is currently in review.

@krassowski Will #717 supersede the implementation in this PR? If so, I think we should close this PR in favor of #717.

krassowski · 2024-04-22T21:57:26Z

Will #717 supersede the implementation in this PR?

#717 moves the default post-processing method to a different file, but it is still there and I think changes in this PR are still useful even after #717 is merged.

resolved based on recent discussion

for more information, see https://pre-commit.ci

welcome · 2024-04-22T22:50:26Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

…lab#726) * Remove closing markdown identifiers (jupyterlab#686) * Remove whitespace after closing markdown identifier * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

krassowski added the bug Something isn't working label Apr 12, 2024

krassowski reviewed Apr 12, 2024

View reviewed changes

srdas previously requested changes Apr 12, 2024

View reviewed changes

srdas mentioned this pull request Apr 12, 2024

Modifiy post processing in chat #698

Closed

dlqqq approved these changes Apr 15, 2024

View reviewed changes

dlqqq force-pushed the remove_markdown_suffix branch from 742fd78 to 6078fa8 Compare April 15, 2024 22:58

dlqqq changed the title ~~Remove closing markdown identifiers (#686)~~ Remove closing markdown identifiers Apr 15, 2024

dlqqq changed the title ~~Remove closing markdown identifiers~~ Remove trailing Markdown code tags in completion suggestions Apr 15, 2024

dlqqq force-pushed the remove_markdown_suffix branch from 6078fa8 to 23350ba Compare April 22, 2024 21:39

bartleusink and others added 3 commits April 22, 2024 15:35

Remove closing markdown identifiers (jupyterlab#686)

fe69c54

Remove whitespace after closing markdown identifier

6ba1f1b

[pre-commit.ci] auto fixes from pre-commit.com hooks

313b8b7

for more information, see https://pre-commit.ci

dlqqq force-pushed the remove_markdown_suffix branch from 23350ba to 313b8b7 Compare April 22, 2024 22:35

dlqqq merged commit 0a6a029 into jupyterlab:main Apr 22, 2024
8 checks passed

dlqqq mentioned this pull request Apr 22, 2024

Move methods generating completion replies to the provider #717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove trailing Markdown code tags in completion suggestions #726

Remove trailing Markdown code tags in completion suggestions #726

bartleusink commented Apr 11, 2024

welcome bot commented Apr 11, 2024

krassowski Apr 12, 2024

krassowski left a comment

bartleusink commented Apr 12, 2024

srdas commented Apr 12, 2024 •

edited

Loading

krassowski commented Apr 12, 2024

krassowski commented Apr 12, 2024

srdas left a comment

krassowski commented Apr 12, 2024

3coins commented Apr 16, 2024

dlqqq commented Apr 22, 2024

krassowski commented Apr 22, 2024

welcome bot commented Apr 22, 2024

Remove trailing Markdown code tags in completion suggestions #726

Remove trailing Markdown code tags in completion suggestions #726

Conversation

bartleusink commented Apr 11, 2024

welcome bot commented Apr 11, 2024

krassowski Apr 12, 2024

Choose a reason for hiding this comment

krassowski left a comment

Choose a reason for hiding this comment

bartleusink commented Apr 12, 2024

srdas commented Apr 12, 2024 • edited Loading

krassowski commented Apr 12, 2024

krassowski commented Apr 12, 2024

srdas left a comment

Choose a reason for hiding this comment

krassowski commented Apr 12, 2024

3coins commented Apr 16, 2024

dlqqq commented Apr 22, 2024

krassowski commented Apr 22, 2024

welcome bot commented Apr 22, 2024

srdas commented Apr 12, 2024 •

edited

Loading