-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to make imatrix (and likely quant) for nvidia's ChatQA-1.5 8B #7046
Comments
The extra tensors are the |
@slaren thanks for the insight, is there anything I can do as a temporary fix? even if it's hardcoding some values here and there Interestingly, if I use convert.py, it converts it in a way that it's happy with (besides the missing pretokenizer) |
This should work. I guess you also added the tokenizer hash as llama3? --- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -1426,8 +1426,9 @@ class LlamaModel(Model):
n_experts = self.hparams.get("num_local_experts")
experts = dict()
for name, data_torch in self.get_tensors():
# we don't need these
- if name.endswith((".attention.masked_bias", ".attention.bias", ".attention.rotary_emb.inv_freq")):
+ if name.endswith((".attention.masked_bias", ".attention.bias", ".rotary_emb.inv_freq")):
continue
old_dtype = data_torch.dtype |
Actually the tokenizer hash lined up already (after I fixed their tokenizer.json to match latest version from meta) |
with that change, so far so good! imatrix started up no issue and I seem to be have the right pre-tokenizer Any reason to replace .attention.rotary_emb.inv_freq with .rotary_emb.inv_freq vs just adding it to the list? |
The name of the tensor is |
Everything looks good, it's generating properly and tokenizing correctly based on the results of ./tokenize Thanks for your help @slaren, would you like me to open a PR to make your proposed change? Or would you rather get to it yourself |
A PR would be very welcome. |
This model: https://huggingface.co/nvidia/ChatQA-1.5-8B
Conversion worked no issue, but then when it's time to calculate the imatrix I see:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 323, got 291
Doing a search, last time slaren mentioned doing a gguf-dump.py, here's the output:
The text was updated successfully, but these errors were encountered: