Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add starcoder2 (and dolphincoder) support to autocomplete (not complete yet) #174

Closed

Conversation

hafriedlander
Copy link

This needed two changes:

  • Starcoder2 requires raw mode for ollama. The model template that ollama wraps around the prompt otherwise means FIM is not detected by the model
  • The file details passed to the autocomplete template needed to be structured, as starcoder2 needs them in a different format to what the fixed string encoding was (see https://arxiv.org/pdf/2402.19173.pdf for format details)

To do before this could be merged:

  • Change the file context to also be a structured object, rather than just a string. That needs changes to file-interaction cache.
  • Probably broke tests

And some other ideas to improve results generally:

  • Add language-specific stop words (which is how continue stops very long multi-line autocompletes)
  • A different idea would be to look at the depth in the tree hierarchy and stop if the autocomplete goes up a level (so if you're inside an if block, stop once the block has been completed at most)

BTW, I think there's a bug in the file-interaction code - onDidOpenTextDocument doesn't track focus or which window is active, multiple files can be "open" in different windows at the same time. I think it should be rewritten to use onDidChangeActiveTextEditor instead?

Raising for now to start discussion.

This needed two changes:
- Starcoder2 requires raw mode for ollama. The model
  template otherwise means FIM is not detected by the model
- The file details passed to the autocomplete template
  needed to be structured, as starcoder2 needs them in a
  different format to what the fixed string encoding was
@hafriedlander
Copy link
Author

BTW, here's the list of extra stop-words that Continue uses per language. https://github.com/continuedev/continue/blob/main/core/autocomplete/languages.ts.

@rjmacarthy
Copy link
Collaborator

rjmacarthy commented Mar 14, 2024

This is awesome thanks, is it ready for merge or are you adding more commits?

Edit: I see that there are more commits to come, no problem. One problem I have with starcoder2 is that it completions are followed by random code from other source files, do you notice it too? Does this PR fix that?

I realise that there are some improvements to make on FIM completions but I have not had the time to concentrate on it. Those tests we're a bit brittle and a foundation so we can skip them if necessary for now to make improvements. I welcome more PRs from you if you have some improvements you would like to make.

Many thanks,

@hafriedlander
Copy link
Author

Yeah, I do notice starcoder2 doesn't know when to stop - or at least the 15B version I use. There's a note in the paper that they messed up the 15B FIM training, but the actual results (at least with the dolphincoder finetune) seem good. I'll test with 7B later. I have an idea for how to fix it anyway though.

(I still think deepseek-coder-33B gives better results, but you need different -base and -instruct versions and it's a much bigger model generally)

@rjmacarthy
Copy link
Collaborator

Since this PR is a month old, will close for now, many conflicts and things have changed. Please reopen if willing to finish, many thanks.

@rjmacarthy rjmacarthy closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants