Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: Mistral Nemo #19

Closed
1 of 4 tasks
Tracked by #21
offgridtech opened this issue Jul 27, 2024 · 6 comments
Closed
1 of 4 tasks
Tracked by #21

model: Mistral Nemo #19

offgridtech opened this issue Jul 27, 2024 · 6 comments
Assignees
Labels
P1: important Important feature / fix type: bug Something isn't working type: model request

Comments

@offgridtech
Copy link

offgridtech commented Jul 27, 2024

  • I have searched the existing issues

Current behavior

I see a bunch of stuff on HuggingFace and llama.cpp Git about pre-tokenizers causing issues upon initial release of the quantizied Mistal Nemo model, but it seemed everything was cleared up over the last few days due to a llama.cpp update. What worked for other people didn't work for Jan. I've tried several quant versions, and it fails to start. Saw KoboldCPP and LMStudio say they made some updates, and it's fixed now. I'm guessing you all need to do the same. Thanks

More information here:
ggerganov/llama.cpp#8579
ggerganov/llama.cpp#8604

Minimum reproduction step

It doesn't start. Other models like llama 3.1 start fine.

Expected behavior

The model starts

Screenshots / Logs

image

This log looks like it is the pre-tokenizer issue they were talking about.

Jan version

v0.5.2

In which operating systems have you tested?

  • macOS
  • Windows
  • Linux

Environment details

AppImage on Linux

@offgridtech offgridtech added the type: bug Something isn't working label Jul 27, 2024
@dan-homebrew
Copy link

dan-homebrew commented Aug 30, 2024

@nguyenhoangthuan99 Can you look into this:

  • Is this the Tekken tokenizer?
  • This would need to be refactored into tokenizer.cpp?
  • I've scheduled for this sprint: scope is to just investigate and articulate what long-term path is
  • However: if there's a fast solution, we should go for it

@nguyenhoangthuan99
Copy link
Contributor

  • Mistral Nemo can be supported by cortex.llamacpp engine now. I tested with current source of llamacpp and it can load and answer question correctly
  • Next steps:
    • Create model hub mistral nemo and upload model
    • Integrate with cortex, investigate chat template, stop token,...

@dan-homebrew dan-homebrew transferred this issue from janhq/jan Sep 10, 2024
@dan-homebrew dan-homebrew changed the title bug: Unable to Run Mistral Nemo model: Mistral Nemo Sep 10, 2024
@dan-homebrew
Copy link

@offgridtech I am transferring this issue to cortex.cpp repo. We should be working on it, ETA 2 weeks

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Sep 25, 2024

I think Mistral Nemo is the first model for us to do this pipeline automatically. To add new model support from hugging face

  • Create a model repo mistral-nemo under cortexso, cc @0xSage @dan-homebrew for helping me to create, my account doesn't have permission to do so
  • Prepare ReadMe.md, model.yml for this model arch
  • Run the CI with this instruction to automatically, pull, convert and quantize model with different quantization levels.

@nguyenhoangthuan99
Copy link
Contributor

Mistral-nemo is supported now at cortexso. 10 quantization levels are available now. All models are created and uploaded automatically through CI.
Image

Can try mistral-nemo with cortex-nightly
Image

@dan-homebrew dan-homebrew transferred this issue from janhq/cortex.cpp Sep 29, 2024
@0xSage 0xSage added the P1: important Important feature / fix label Sep 29, 2024
@0xSage
Copy link
Contributor

0xSage commented Oct 13, 2024

closing as done and QA'd

@0xSage 0xSage closed this as completed Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix type: bug Something isn't working type: model request
Projects
Archived in project
Status: To Announce (Jan and/or Cortex)
Development

No branches or pull requests

5 participants