Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert : update Falcon script for new HF config #3448

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

cebtenzzre
Copy link
Collaborator

This aligns the Falcon convert script with the recent changes to the HuggingFace models.

@Green-Sky
Copy link
Collaborator

is this a continuation of #3049 ? :)

@cebtenzzre
Copy link
Collaborator Author

cebtenzzre commented Oct 3, 2023

is this a continuation of #3049 ? :)

#3049 is a separate script that deals with 180B safetensors files. I am trying to convert 7B pytorch files using convert-falcon-hf-to-gguf.py.

By recent changes, I mean three days ago.

@Green-Sky
Copy link
Collaborator

Yea, but your changes are a subset of the actual changes done for 180B

$ diff convert-falcon-hf-to-gguf.py convert-falcon180-hf-to-gguf.py --color=always -w
2c2
< # HF falcon--> gguf conversion
---
> # HF falcon180B--> gguf conversion
16a17
> from safetensors import safe_open
48c49
<         if filename.startswith("pytorch_model-"):
---
>         if filename.startswith("model-00"):
90c91
< if hparams["architectures"][0] != "RWForCausalLM":
---
> if hparams["architectures"][0] != "FalconForCausalLM":
103c104
< block_count = hparams["n_layer"]
---
> block_count = hparams["num_hidden_layers"]
111,113c112,114
< gguf_writer.add_head_count(hparams["n_head"])
< if "n_head_kv" in hparams:
<     gguf_writer.add_head_count_kv(hparams["n_head_kv"])
---
> gguf_writer.add_head_count(hparams["num_attention_heads"])
> if "num_kv_heads" in hparams:
>     gguf_writer.add_head_count_kv(hparams["num_kv_heads"])
181,182c182,183
< n_head    = hparams["n_head"]
< n_head_kv = hparams["n_head_kv"] if "n_head_kv" in hparams else 1
---
> n_head    = hparams["num_attention_heads"]
> n_head_kv = hparams["num_kv_heads"] if "num_kv_heads" in hparams else 1
193c194
<         f"pytorch_model-{n:05}-of-{num_parts:05}.bin" for n in range(1, num_parts + 1)
---
>         f"model-{n:05}-of-{num_parts:05}.safetensors" for n in range(1, num_parts + 1)
200c201
<     model_part = torch.load(dir_model / part_name, map_location="cpu")
---
>     with safe_open(dir_model / part_name, framework="pt", device="cpu") as model_part:
203c204
<         data = model_part[name]
---
>             data = model_part.get_tensor(name)

(i compared an older version of convert-falcon, bc there have been some small changes on master since.)

@cebtenzzre
Copy link
Collaborator Author

Yea, but your changes are a subset of the actual changes done for 180B

Okay, so what do you suggest I do in order to get Falcon <180B working on master again?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need #3049 ?

@cebtenzzre
Copy link
Collaborator Author

Do we still need #3049 ?

The script in this PR does not know how to read safetensors files, so I think we still need that PR.

@FNsi
Copy link
Contributor

FNsi commented Oct 5, 2023

By no means, why not merge #3049 first, abandon it then, finally merge this.

No one will lose the contribution records, and the falcon inference keep working.

@cebtenzzre
Copy link
Collaborator Author

Could someone test this script with Falcon-180B?

@ggerganov
Copy link
Owner

Testing atm - will take some time.
Would this change fix the reported issues in: #3484

@cebtenzzre
Copy link
Collaborator Author

Would this change fix the reported issues in: #3484

Unfortunately not. That appears to be another regression caused by #3252

@ggerganov
Copy link
Owner

ggerganov commented Oct 5, 2023

Could someone test this script with Falcon-180B?

It works with this repo: https://huggingface.co/tiiuae/falcon-180B/tree/main

Which is strange, since this looks like safetensors data, which we don't expect to be able to read with this script.

@cebtenzzre
Copy link
Collaborator Author

Which is strange, since this looks like safetensors data, which we don't expect to be able to read with this script.

I recently added safetensors support to this PR so it can support 180B, which is why I asked.

Copy link
Collaborator

@Green-Sky Green-Sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont have the means to test the 180b, but looks like you merged it nicely.

@cebtenzzre cebtenzzre merged commit 48edda3 into ggerganov:master Oct 5, 2023
10 checks passed
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 6, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  kv cache slot search improvements (ggerganov#3493)
  prompts : fix editorconfig checks after ggerganov#3416
  parallel : add option to load external prompt file (ggerganov#3416)
  server : reuse llama_sample_token common util (ggerganov#3494)
  llama : correct hparams comparison (ggerganov#3446)
  ci : fix xcodebuild destinations (ggerganov#3491)
  convert : update Falcon script for new HF config (ggerganov#3448)
  build : use std::make_tuple() for compatibility with older GCC versions (ggerganov#3488)
  common : process escape sequences in reverse prompts (ggerganov#3461)
  CLBlast: Fix handling of on-device tensor data
  server : fix incorrect num_tokens_predicted (ggerganov#3480)
  swift : disable ACCELERATE_NEW_LAPACK (ggerganov#3481)
  ci : add swift build via xcodebuild (ggerganov#3482)
yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023
Also adds Falcon-180B support.
Closes ggerganov#3049

Co-authored-by: jb <jonathan.t.barnard@gmail.com>
@rlanday
Copy link
Contributor

rlanday commented Oct 18, 2023

This change breaks the old architecture (RWForCausalLM). Kind of annoying because there was another change that requires reconverting the original HuggingFace model:
#3484

@cebtenzzre
Copy link
Collaborator Author

This change breaks the old architecture (RWForCausalLM).

The official falcon-7b, falcon-40b, and falcon-180B models on HF are compatible with the new version of this script. What is your use case?

@rlanday
Copy link
Contributor

rlanday commented Oct 19, 2023

I have this model downloaded from Hugging Face:
https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/tree/main

Is this the official version of Falcon?
https://huggingface.co/tiiuae/falcon-7b/tree/main
The most recent commit message on the model files is “Upload RWForCausalLM.” The convert script works properly on this?

@cebtenzzre
Copy link
Collaborator Author

The most recent commit message on the model files is “Upload RWForCausalLM.” The convert script works properly on this?

The latest commit is 898df13 "Move to in-library checkpoint (for real this time)". That's the one you want.

I suppose I could add back support for the older format until the finetunes are updated.

@rlanday
Copy link
Contributor

rlanday commented Oct 20, 2023

Oh, I see; they updated the config.json file but the actual multi-GB model files did not change. I think this might not be clear to some people, who might end up re-downloading the whole model again, which is not a good user experience. Also, some fine-tunes might not ever be updated. In my opinion, it would be preferable to support both formats, but I will let you decide how to prioritize this. Thank you for looking into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants