-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed as not planned
Labels
Description
Model description
I'm using neural-chat-7b-v3-1 locally on my laptop and it would sure be sweet if I could serve it through tgi.
I can currently use it with python using the pattern
import torch
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
fpath = '/home/thomas/src/neural-chat-7b-v3-1'
tokenizer = AutoTokenizer.from_pretrained(fpath)
model = AutoModelForCausalLM.from_pretrained(
fpath,
device_map = 'auto',
quantization_config = quantization_config,
)but when I try to pass the path of the repo I cloned through to tgi I get
(text-generation-inference) thomas@computer-1:~/src/notes/projects/assistant/backend/text-generation-inference$ target/release/text-generation-launcher --model-id /home/thomas/src/neural-chat-7b-v3-1 --port=8080 --quantize bitsandbytes-nf4
2023-11-24T15:56:31.756672Z INFO text_generation_launcher: Args { model_id: "/home/thomas/src/neural-chat-7b-v3-1", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "0.0.0.0", port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-11-24T15:56:31.756747Z INFO download: text_generation_launcher: Starting download process.
2023-11-24T15:56:33.786649Z INFO text_generation_launcher: Peft model detected.
2023-11-24T15:56:33.786688Z INFO text_generation_launcher: Loading the model it might take a while without feedback
2023-11-24T15:56:34.161062Z ERROR download: text_generation_launcher: Download encountered an error: Traceback (most recent call last):
File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/utils/config.py", line 117, in from_pretrained
config_file = hf_hub_download(
File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/thomas/src/neural-chat-7b-v3-1'. Use `repo_type` argument if needed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/thomas/src/notes/projects/assistant/backend/text-generation-inference/server/text_generation_server/utils/peft.py", line 16, in download_and_unload_peft
model = AutoPeftModelForCausalLM.from_pretrained(
File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/auto.py", line 69, in from_pretrained
peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/thomas/miniconda3/envs/text-generation-inference/lib/python3.9/site-packages/peft/utils/config.py", line 121, in from_pretrained
raise ValueError(f"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'")
ValueError: Can't find 'adapter_config.json' at '/home/thomas/src/neural-chat-7b-v3-1'
So I'm seeing an error that appears to be related to #1283 in addition to tgi complaining there's no adapter_config.json which is odd because the repo has the full model and is not a peft adapter. But I mean it's doesn't even look like it can see the local repo so I don't know.
Open source status
- The model implementation is available
- The model weights are available