Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Jan supports safetensors #1056

Closed
0xSage opened this issue Dec 18, 2023 · 4 comments
Closed

feat: Jan supports safetensors #1056

0xSage opened this issue Dec 18, 2023 · 4 comments
Assignees
Labels
engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model P1: important Important feature / fix type: feature request A new feature
Milestone

Comments

@0xSage
Copy link
Contributor

0xSage commented Dec 18, 2023

@Van-QA quoted this from feature request #2723:

problem
You can only use the GGUF model, not a wide range of models. So, if you can use the Transformer model, you can use most models.

Success Criteria
Find the perfect hugging face Transformers model and make it available.

無題 2024-04-15 16-27-20

無題 2024-04-15 16-28-09

@0xSage 0xSage added the type: epic A major feature or initiative label Dec 18, 2023
@0xSage 0xSage added this to the Jan supports multiple Inference Engines milestone Dec 18, 2023
@0xSage 0xSage changed the title epic: jan supports safetensors epic: Jan supports safetensors Dec 18, 2023
@0xSage 0xSage added type: feature request A new feature and removed type: epic A major feature or initiative labels Dec 18, 2023
@0xSage 0xSage changed the title epic: Jan supports safetensors feat: Jan supports safetensors Dec 18, 2023
@0xSage 0xSage added the engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model label Dec 22, 2023
@0xSage 0xSage removed this from the Jan supports multiple Inference Engines milestone Dec 27, 2023
@hiro-v
Copy link
Contributor

hiro-v commented Feb 11, 2024

Supports in #1972 for converting huggingface safetensor to gguf and use

@Van-QA Van-QA added this to the v0.4.8 milestone Feb 18, 2024
@Van-QA Van-QA added the P1: important Important feature / fix label Feb 18, 2024
@Van-QA Van-QA modified the milestones: v0.4.8, v0.4.10 Mar 1, 2024
@Van-QA
Copy link
Contributor

Van-QA commented Mar 6, 2024

Although the issue #2167 is resolved, the Import via Hugging Face is on hold until this epic janhq/cortex#571 is complete.

@hiro-v hiro-v removed their assignment Mar 14, 2024
@Van-QA Van-QA modified the milestones: v0.4.10, v0.4.11 Mar 25, 2024
@Van-QA Van-QA modified the milestones: v0.4.11, v0.4.12 Apr 4, 2024
@louis-jan louis-jan modified the milestones: v0.4.12, v0.4.13 Apr 16, 2024
@louis-jan louis-jan assigned Inchoker and unassigned namchuai and louis-jan Apr 16, 2024
@hiro-v
Copy link
Contributor

hiro-v commented May 17, 2024

I have checked the technical possibilities for this.

Please read more in this doc (draft): https://f1da82fe.docs-9ba.pages.dev/guides/glossaries/gguf

Basically there are 2 steps in order to have a single GGUF model:

  • Convert Huggingface .safetensor to GGUF BF16 (normally takes around 2mins). This requires the use of convert-hf-to-gguf in python (which can be executed using cortex python runtime). The example command is: python llama.cpp/convert-hf-to-gguf.py models --outtype bf16 --outfile "${{ env.MODEL_NAME }}/${{ env.bf16 }}"
  • Once we have GGUF BF16 model, user can choose the quantization they want and run the quantization (around 2 mins). For this it has CPP low level API in quantize.
    The example command: ./llama.cpp/quantize "${{ env.MODEL_NAME }}/${{ env.bf16 }}" "${{ env.MODEL_NAME }}/$qtype" "$method"

I think this would help a lot with the adoption of cortex-cli and jan app

@0xSage
Copy link
Contributor Author

0xSage commented Jun 11, 2024

related: janhq/cortex#555

@0xSage 0xSage closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model P1: important Important feature / fix type: feature request A new feature
Projects
Archived in project
Development

No branches or pull requests

7 participants