Cannot convert bigger models #12

Conquerix · 2023-04-16T07:55:58Z

Hello,
I wanted to try the bigger models, but these come in several .bin files. When I launch the converter python script I get that it cannot find python_model.bin (since there are 4 pytorch_model-0000x-of-00004.bin).
Do I have to merge them using cat *.bin > pytorch_model.bin (ensuring there is no other bin file in the directory obviously) ?

The text was updated successfully, but these errors were encountered:

Conquerix · 2023-04-16T09:53:28Z

Ok so it was not how to do it, it returned errors.
I asked chatGPT to help me with this and it gave me this script, which seems to work :

import torch
import sys
import os

if len(sys.argv) < 3:
    print("Usage: python combine_model_files.py <model_directory> <output_file_name>")
    sys.exit(1)

model_directory = sys.argv[1]
output_file_name = sys.argv[2]

model_files = [file for file in os.listdir(model_directory) if file.startswith("pytorch_model-") and file.endswith(".bin")]

if not model_files:
    print("No model files found in the specified directory.")
    sys.exit(1)

combined_state_dict = {}

for model_file in sorted(model_files):
    file_path = os.path.join(model_directory, model_file)
    partial_state_dict = torch.load(file_path, map_location="cpu")
    combined_state_dict.update(partial_state_dict)

output_file_path = os.path.join(model_directory, output_file_name)
torch.save(combined_state_dict, output_file_path)
print(f"Combined model saved as {output_file_path}")

The docker instance launched successfully, I could convert and quantize with no problem.

ravenscroftj · 2023-04-16T11:08:25Z

Hey @PierreFrn

Thanks for doing some digging here. It's strange that you were getting that error before. Can I double check what OS you're running and whether you were using python3 with the requirements from requirements.txt to run the script?

Transformers 4.27 when paired with accelerate should automatically load sharded models (the ones with multiple bins that you describe) without the need for a manual conversion. Here is the output I get on my system

$ conda create -n tbp python=3.10
Collecting package metadata (current_repodata.json): done
Solving environment: done
...

Then

conda activate tbp
pip install -r turbopilot/requirements.txt

Then if I activate the environment and run the script, it loads the shards:

python turbopilot/convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 1
Loading checkpoint shards:   0%|                     | 0/2 [00:00<?, ?it/s]

I'm guessing there's some discrepancy between operating systems or library behaviour at play here. Thanks for supplying the conversion script you used. I will add this thread to the documentation just in case others run into the same limitation.

ravenscroftj · 2023-04-16T11:08:58Z

(Leaving open in case you reply and we're able to work out what is going on)

Conquerix · 2023-04-16T11:17:18Z

I am using NixOS, I made a virtual env in a Nix shell as follow :
I used this shell.nix file (basically to enable the use of virtualenv without having it on the whole system):

{ pkgs ? import <nixpkgs> {} }:
(pkgs.buildFHSUserEnv {
  name = "pipzone";
  targetPkgs = pkgs: (with pkgs; [
    python310
    python310Packages.pip
    python310Packages.virtualenv
  ]);
  runScript = "bash";
}).env

Then :

nix-shell shell.nix
virtualenv venv
pip install -r requirements.txt
source venv/bin/activate

I will try to see if I can find anything.

ravenscroftj · 2023-04-17T18:54:25Z

Thanks a lot - let me know if you figure out the weirdness. For now, I documented this thread in the model conversion wiki page.

ravenscroftj closed this as completed Apr 16, 2023

ravenscroftj reopened this Apr 16, 2023

ravenscroftj closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot convert bigger models #12

Cannot convert bigger models #12

Conquerix commented Apr 16, 2023

Conquerix commented Apr 16, 2023

ravenscroftj commented Apr 16, 2023

ravenscroftj commented Apr 16, 2023

Conquerix commented Apr 16, 2023 •

edited

Loading

ravenscroftj commented Apr 17, 2023

Cannot convert bigger models #12

Cannot convert bigger models #12

Comments

Conquerix commented Apr 16, 2023

Conquerix commented Apr 16, 2023

ravenscroftj commented Apr 16, 2023

ravenscroftj commented Apr 16, 2023

Conquerix commented Apr 16, 2023 • edited Loading

ravenscroftj commented Apr 17, 2023

Conquerix commented Apr 16, 2023 •

edited

Loading