Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

Cannot convert bigger models #12

Closed
Conquerix opened this issue Apr 16, 2023 · 5 comments
Closed

Cannot convert bigger models #12

Conquerix opened this issue Apr 16, 2023 · 5 comments

Comments

@Conquerix
Copy link

Hello,
I wanted to try the bigger models, but these come in several .bin files. When I launch the converter python script I get that it cannot find python_model.bin (since there are 4 pytorch_model-0000x-of-00004.bin).
Do I have to merge them using cat *.bin > pytorch_model.bin (ensuring there is no other bin file in the directory obviously) ?

@Conquerix
Copy link
Author

Ok so it was not how to do it, it returned errors.
I asked chatGPT to help me with this and it gave me this script, which seems to work :

import torch
import sys
import os

if len(sys.argv) < 3:
    print("Usage: python combine_model_files.py <model_directory> <output_file_name>")
    sys.exit(1)

model_directory = sys.argv[1]
output_file_name = sys.argv[2]

model_files = [file for file in os.listdir(model_directory) if file.startswith("pytorch_model-") and file.endswith(".bin")]

if not model_files:
    print("No model files found in the specified directory.")
    sys.exit(1)

combined_state_dict = {}

for model_file in sorted(model_files):
    file_path = os.path.join(model_directory, model_file)
    partial_state_dict = torch.load(file_path, map_location="cpu")
    combined_state_dict.update(partial_state_dict)

output_file_path = os.path.join(model_directory, output_file_name)
torch.save(combined_state_dict, output_file_path)
print(f"Combined model saved as {output_file_path}")

The docker instance launched successfully, I could convert and quantize with no problem.

@ravenscroftj
Copy link
Owner

Hey @PierreFrn

Thanks for doing some digging here. It's strange that you were getting that error before. Can I double check what OS you're running and whether you were using python3 with the requirements from requirements.txt to run the script?

Transformers 4.27 when paired with accelerate should automatically load sharded models (the ones with multiple bins that you describe) without the need for a manual conversion. Here is the output I get on my system

$ conda create -n tbp python=3.10
Collecting package metadata (current_repodata.json): done
Solving environment: done
...

Then

conda activate tbp
pip install -r turbopilot/requirements.txt 

Then if I activate the environment and run the script, it loads the shards:

python turbopilot/convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 1
Loading checkpoint shards:   0%|                     | 0/2 [00:00<?, ?it/s]

I'm guessing there's some discrepancy between operating systems or library behaviour at play here. Thanks for supplying the conversion script you used. I will add this thread to the documentation just in case others run into the same limitation.

@ravenscroftj
Copy link
Owner

(Leaving open in case you reply and we're able to work out what is going on)

@Conquerix
Copy link
Author

Conquerix commented Apr 16, 2023

I am using NixOS, I made a virtual env in a Nix shell as follow :
I used this shell.nix file (basically to enable the use of virtualenv without having it on the whole system):

{ pkgs ? import <nixpkgs> {} }:
(pkgs.buildFHSUserEnv {
  name = "pipzone";
  targetPkgs = pkgs: (with pkgs; [
    python310
    python310Packages.pip
    python310Packages.virtualenv
  ]);
  runScript = "bash";
}).env

Then :

nix-shell shell.nix
virtualenv venv
pip install -r requirements.txt
source venv/bin/activate

I will try to see if I can find anything.

@ravenscroftj
Copy link
Owner

Thanks a lot - let me know if you figure out the weirdness. For now, I documented this thread in the model conversion wiki page.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants