Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py doesn't know how to handle large models #35

Closed
leedrake5 opened this issue Dec 7, 2023 · 8 comments
Closed

convert.py doesn't know how to handle large models #35

leedrake5 opened this issue Dec 7, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@leedrake5
Copy link

Instructions here don't work for large models, which are often stored as shards. Can /llama/convert.py be adapted to handle models available on hugginface such as this? Even saving the shards individually with mlx support can work.

@awni awni added the enhancement New feature or request label Dec 7, 2023
@awni awni mentioned this issue Dec 7, 2023
@BuildBackBuehler
Copy link

BuildBackBuehler commented Dec 8, 2023

? You can cat shards together. That's no problem.

What I am running into, though, is none of the models on HuggingFace seem to include the (apparently required) OSError: SavedModel file does not exist at: [InsertPathHere]/{saved_model.pbtxt|saved_model.pb} – instead they'll have JSONs for the weights and so on and so forth and unfortunately there is no straightforward conversion to protobuf.

(For providing an arg for the tf model's weight conf.)

That is just for Tensorflow. So for my Llama conversion, I have a feeling that with Pytorch, it'd go smoothly. At least I believe everything is typically bundled into a .tar/.zip or w/e. TBH, though I couldn't find any documentation on it, I figure that the .safetensors, etc. do have some sort of archive-like contents to them but it can't be readily seen and doesn't necessarily always include the same files.

I found that the Llama one was a tad more difficult to modify for TF. The Mistral one was easy enough I think anyone could modify it quite quickly.

Edit: Hmmmm well, glazing over the code previously, got the feeling that it should still work. Looks like the JSONs provide all the necessary info...I'll have to look into it more with sleep.

https://sourcegraph.com/github.com/tensorflow/tensorflow/-/blob/tensorflow/python/saved_model/loader_impl.py
I'll have to look into TF more, maybe I just misused a function and/or an alternative to this one that seems to be at the root.

Edit Edit: Agh. Silly me. I think of TF and safetensors as one package and it looks like I should be importing safetensor's functionalities to try and convert...whoop

@leedrake5
Copy link
Author

Embarrassed I didn't think to use cat. That worked, though still errors out with safe tensors

File "~/GitHub/mlx-examples/llama/convert.py", line 49, in <module>
   state = torch.load(args.torch_weights)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "~/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/serialization.py", line 1040, in load
   return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "~/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/serialization.py", line 1258, in _legacy_load
   magic_number = pickle_module.load(f, **pickle_load_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '\x04'.

...and with original .bin

Traceback (most recent call last):
  File "~/GitHub/mlx-examples/llama/convert.py", line 49, in <module>
    state = torch.load(args.torch_weights)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/serialization.py", line 1005, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/serialization.py", line 457, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted

Looking through code and examples it's not clear what torch_weights means. Maybe .chkpt? But not sure how to recreate those from either safe tensors or .bin

@leedrake5
Copy link
Author

leedrake5 commented Dec 8, 2023

Ok, so what failed less dramatically was this:

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("~/LLM/Phind-CodeLlama-34B-v2")
model = LlamaForCausalLM.from_pretrained("~/LLM/Phind-CodeLlama-34B-v2")

torch.save(model, "~/LLM/phind.pth")

Then running their example:

python ~/GitHub/mlx-examples/llama/convert.py ~/LLM/phind.pth ~/LLM/mlx_phind_weights.npz

But this also fails.

~/miniconda3/envs/pytorch/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
~/miniconda3/envs/pytorch/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
~/miniconda3/envs/pytorch/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Traceback (most recent call last):
  File "~/GitHub/mlx-examples/llama/convert.py", line 52, in <module>
    **{k: v for k, v in starmap(map_torch_to_mlx, state.items()) if k is not None}
                                                  ^^^^^^^^^^^
  File "~/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaForCausalLM' object has no attribute 'items'

I'm increasingly confident I can figure out how to get the model to work, but the convert.py script has very specific but not defined expectations for what the .pth file should look like. More documentation would make it much easier to use.

@leedrake5
Copy link
Author

And finally a solution. convert.py has to be modified to handle this method of conversion. This part of convert.py:

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert Llama weights to MLX")
    parser.add_argument("torch_weights")
    parser.add_argument("output_file")
    args = parser.parse_args()

    state = torch.load(args.torch_weights)
    np.savez(
        args.output_file,
        **{k: v for k, v in starmap(map_torch_to_mlx, state.items()) if k is not None}
    )

Has to be changed to:

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert Llama weights to MLX")
    parser.add_argument("torch_weights")
    parser.add_argument("output_file")
    args = parser.parse_args()

    model = torch.load(args.torch_weights)  # Load the entire model object instead of just its state dictionary
    np.savez(
        args.output_file,
        **{k: v for k, v in starmap(map_torch_to_mlx, model.state_dict().items()) if k is not None}  # Use .state_dict() to access parameters
    )

And that (plus my above comment) is a workflow to get the large Hugging Face models to produce a mlx model. That said, getting key errors trying to predict from the model, so more work needs to be done. Again, this could all be solved by some more specific instructions on what the example scripts are looking for.

@BuildBackBuehler
Copy link

BuildBackBuehler commented Dec 9, 2023

Ah I see! So if I'm reading this correctly, you mitigated that issue I had mentioned(?) Therein providing the out-the-box torch.bin you get for some models

I imagine the way you are doing it would take a little more time, but not knowing how the inner machinations work, I don't know if it can quickly find the weights within the full torch.bin model, therefore a negligible time and a good shortcut/workaround.

There's a HuggingFace convert.py that has a wholeeee lot more scenarios. It may elucidate the variations of code needed to make it work as-intended. Still haven't gotten around to fixing up the .safetensor attempt I did. Got interested in a Mistral model I'm trying next, should be straightforward. Then will be running this for a llama-based model.

Edit: It appears that, actually, 99% of (new) models use .GGUF which bundles everything up! So we will need to do it with your fix

@BuildBackBuehler
Copy link

BuildBackBuehler commented Dec 9, 2023

I've found that everything needed is definitely within Safetensor's docs
https://huggingface.co/docs/safetensors/api/torch#safetensors.torch.load.returns

Still having problems, nonetheless. Also realized that .gguf is not optimal for Silicon. Better to use GPTQ, perhaps compressed with exllama2. I don't know what MLX does as far as how well AARM64 handles higher FPs...just know that for Stable Diff., needed an 8-bit.

I think what also needs a bit of expansion is what exactly is the "output file" arg as well. While it'd normally make sense that it would just be to provide a name for the convert.py's final product, I could've sworn I saw some example that showed that argument as a means of identifying the mmapping or whatever else.

–-

`from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("/LLM/Phind-CodeLlama-34B-v2")
model = LlamaForCausalLM.from_pretrained("
/LLM/Phind-CodeLlama-34B-v2")

torch.save(model, "~/LLM/phind.pth")`

Works if you want to also download the repo with that code. I couldn't get it to work with just weights – which from I read seems to be the pytorch_bin.index.json.

I tried to replace torch.save(model)

Personally, I'd already cloned the repo and just wanted to get a cached away torch/safetensors_weight file for convert.py but couldn't quite get the code right.

I have a feeling it has to do with ST's usage of load vs load_file.

Or save/safetensors.torch.save_file, seems the _file is added for support files.

The other thing I noted is that what convert.py necessitates is the "model.state_dict()" – as to say that the pre-convert.py output file shouldn't be "save(model, [PATH]) but save(model.state_dict(), [PATH]) – but this didn't work on a goround. May have messed something small up.

No sleep. If I remember I'll reorganize my thoughts. Figured in the meantime, might help someone.

Also, this is valuable to anyone who likes to reverse engineer things
https://github.com/huggingface/transformers/blob/df5c5c62ae253055336f5bb0828ca8e3e15ab6bd/src/transformers/models/llama/convert_llama_weights_to_hf.py

@AlexanderIstomin
Copy link

You can do it using the transformers library which can load sharded model automatically:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("./Mistral-7B-Instruct-v0.2/")

^^^ sharded model folder

and then

np.savez(
        str("./weights.npz"),
        **{k: v.to(torch.float16).numpy() for k, v in model.state_dict().items()}
    )

@awni
Copy link
Member

awni commented Dec 17, 2023

This should work in the latest llama example #82 . Let me know if you run into issues!

@awni awni closed this as completed Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants