Skip to content

Please add support for neural-chat-7b-v3-1 #1284

@odellus

Description

@odellus

Model description

I'm using neural-chat-7b-v3-1 locally on my laptop and it would sure be sweet if I could serve it through tgi.

I can currently use it with python using the pattern

import torch
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

fpath = '/home/thomas/src/neural-chat-7b-v3-1'

tokenizer = AutoTokenizer.from_pretrained(fpath)
model = AutoModelForCausalLM.from_pretrained(
    fpath, 
    device_map = 'auto',
    quantization_config = quantization_config,
)

but when I try to pass the path of the repo I cloned through to tgi I get

(text-generation-inference) thomas@computer-1:~/src/notes/projects/assistant/backend/text-generation-inference$ target/release/text-generation-launcher --model-id /home/thomas/src/neural-chat-7b-v3-1 --port=8080 --quantize bitsandbytes-nf4
2023-11-24T15:56:31.756672Z  INFO text_generation_launcher: Args { model_id: "/home/thomas/src/neural-chat-7b-v3-1", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "0.0.0.0", port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-11-24T15:56:31.756747Z  INFO download: text_generation_launcher: Starting download process.
2023-11-24T15:56:33.786649Z  INFO text_generation_launcher: Peft model detected.

2023-11-24T15:56:33.786688Z  INFO text_generation_launcher: Loading the model it might take a while without feedback

2023-11-24T15:56:34.161062Z ERROR download: text_generation_launcher: Download encountered an error: Traceback (most recent call last):

  File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/utils/config.py", line 117, in from_pretrained
    config_file = hf_hub_download(

  File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)

  File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/thomas/src/neural-chat-7b-v3-1'. Use `repo_type` argument if needed.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/home/thomas/src/notes/projects/assistant/backend/text-generation-inference/server/text_generation_server/utils/peft.py", line 16, in download_and_unload_peft
    model = AutoPeftModelForCausalLM.from_pretrained(

  File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/auto.py", line 69, in from_pretrained
    peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)

  File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/utils/config.py", line 121, in from_pretrained
    raise ValueError(f"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'")

ValueError: Can't find 'adapter_config.json' at '/home/thomas/src/neural-chat-7b-v3-1'

So I'm seeing an error that appears to be related to #1283 in addition to tgi complaining there's no adapter_config.json which is odd because the repo has the full model and is not a peft adapter. But I mean it's doesn't even look like it can see the local repo so I don't know.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

https://huggingface.co/Intel/neural-chat-7b-v3-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions