Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't load OPT-30B and OPT-66B through checkpoints.json #2616

Open
anselmwang opened this issue Dec 15, 2022 · 19 comments
Open

[BUG] Can't load OPT-30B and OPT-66B through checkpoints.json #2616

anselmwang opened this issue Dec 15, 2022 · 19 comments
Labels
bug Something isn't working inference

Comments

@anselmwang
Copy link

anselmwang commented Dec 15, 2022

Describe the bug

I can't load OPT-30B and OPT-66B through checkpoints.json. If I load them with Huggingface from_pretrained, everything works fine. This bug is troublesome because my production nodes have far less memory than my dev node, so they don't have enough CPU memory to load OPT-30B and OPT-66B.

To Reproduce
python 3.7.7

git clone https://github.com/anselmwang/transformers-bloom-inference/
cd transformers-bloom-inference
git checkout explore_ds

pip install --upgrade pip
pip install transformers>=4.21.3 accelerate>=0.12.0
pip install deepspeed>=0.7.3

Without checkpoints_json, this command works date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-30b; date

Below is the stack trace when using checkpoints.json date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-30b --use_checkpoints_json; date

Traceback (most recent call last):                                                                                                                                                        
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 192, in <module>                                               
    model = deepspeed.init_inference(                                                                                                                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference                                
    engine = InferenceEngine(model, config=ds_inference_config)                                                                                                                           
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 127, in __init__
    self.module.to(device)                                                                                                                                                                
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1682, in to  
    return super().to(*args, **kwargs)                                                       
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 987, in to
    return self._apply(convert)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 662, in _apply
    param_applied = fn(param)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 985, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

For OPT-66B, this command works date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-66b; date

But when turning on checkpoints.json, date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-66b --use_checkpoints_json; date, below is the stack trace

Traceback (most recent call last):                                                                                                                                                [9/1869]
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 190, in <module>                                               
    model = deepspeed.init_inference(                                                                                                                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)                            
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 124, in __init__
    self._apply_injection_policy(config)                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 349, in _apply_injection_policy                   replace_transformer_layer(client_module,                                                                                                                                              
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 926, in replace_transformer_layer
    load_model_with_checkpoint(                                                              
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 349, in load_model_with_checkpoin
t
    load_module_recursive(r_module)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 341, in load_module_recursive
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 258, in load_transformer_layer
    maybe_copy_qkv(module.attention,
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 203, in maybe_copy_qkv
    k = sd[0][src_names[1]]
KeyError: 'model.decoder.layers.28.self_attn.k_proj.weight'

Expected behavior

ds_report output
Please run ds_report to give us details about your setup.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report                                                                                              
--------------------------------------------------                                                                                 
NOTE: Ops not installed will be just-in-time (JIT) compiled at                    
      runtime if needed. Op compatibility means that your system                                                                                          
      meet the required dependencies to JIT install the op.                                                         
--------------------------------------------------
JIT compiled ops requires ninja                                                                                                              
ninja .................. [OKAY]                                                            
--------------------------------------------------
op name ................ installed .. compatible                                                                        
--------------------------------------------------        
cpu_adam ............... [NO] ....... [OKAY]            
cpu_adagrad ............ [NO] ....... [OKAY]                                                             
fused_adam ............. [NO] ....... [OKAY]                                                                      
fused_lamb ............. [NO] ....... [OKAY]                                                                       
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
spatial_inference ...... [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/tmp/code/transformers-bloom-inference/venv/lib/python3.7/site-packages/torch']
torch version .................... 1.13.0+cu117
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed install path ........... ['/tmp/code/transformers-bloom-inference/venv/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.7.6, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Ubuntu 18.04
  • GPU count and types: 1 node with 4 A6000, 46GB memory per GPU
  • (if applicable) what DeepSpeed-MII version are you using
  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions
transformers             4.25.1
deepspeed                0.7.7
torch                    1.13.0
  • Python version: 3.7.7
  • Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?
Not use docker
Additional context
Add any other context about the problem here.

@anselmwang anselmwang added bug Something isn't working inference labels Dec 15, 2022
@mrwyattii
Copy link
Contributor

I can confirm that I'm able to replicate this. Interestingly, I'm finding that smaller OPT models work loading with meta tensor. It appears that models that are split in the HuggingFace checkpoints are causing this error (e.g., they have multiple pytorch_model-*-of-*.bin).

@RezaYazdaniAminabadi any idea the cause? I'm guessing we don't catch this in our unit tests because we use small versions of these larger models to save time.

@mrwyattii
Copy link
Contributor

mrwyattii commented Dec 20, 2022

@anselmwang I see you mentioned you are only trying to load the models with meta tensor on your production node. One possible solution (until we determine the cause of this error) would be to create a pre-sharded version of each model on your dev node and copy that over to the production node. I'm able to properly load these models from DeepSpeed-sharded checkpoints. See my comment here on how to generate those sharded checkpoints: #2379 (comment)

@felifri
Copy link

felifri commented Dec 21, 2022

I'm experiencing the same issue with the BLOOM models

@asafkar
Copy link

asafkar commented Dec 26, 2022

Regarding Bloom models, downgrading deepspeed to 0.7.6 works for me.
Using 0.7.7 / 0.8.0 gets this error (using this script - https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py)

@njhill
Copy link

njhill commented Dec 27, 2022

Also encountered this when upgrading from 0.7.6 to 0.7.7, with BLOOM 176B.

@RezaYazdaniAminabadi
Copy link
Contributor

Hi,

I have fixed some bugs regarding the checkpoint loading for these model architectures. Could you please retry using this PR? You can also try our updated test-suite here.
Thanks,
Reza

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @niumanar, @asafkar and @anselmwang,

I just wanted to see if you get a chance to use this PR and see if it fixed the issue?

Thanks,
Reza

@njhill
Copy link

njhill commented Jan 19, 2023

@RezaYazdaniAminabadi I can confirm that version 0.8.0 fixed the issue for me.

@anselmwang
Copy link
Author

@RezaYazdaniAminabadi , @njhill said version 0.8.0 fixed the issue, unfortunately this version doesn't fix for me.

For PR #2662 , it fixes OPT-30B, which is command date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-30b --use_checkpoints_json; date

OPT-66B date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-66b --use_checkpoints_json; date meets another error.

Traceback (most recent call last):
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 194, in <module>
    model = deepspeed.init_inference(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 126, in __init__
    self._apply_injection_policy(config)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 339, in _apply_injection_policy
Traceback (most recent call last):
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 194, in <module>
    replace_transformer_layer(client_module,
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 820, in replace_transformer_layer
Traceback (most recent call last):
    model = deepspeed.init_inference(  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 194, in <module>

  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
    load_model_with_checkpoint(replaced_module,
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 252, in load_model_with_checkpoint
Traceback (most recent call last):
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 126, in __init__
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 194, in <module>
        self._apply_injection_policy(config)load_module_recursive(r_module)

  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 339, in _apply_injection_policy
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 246, in load_module_recursive
    model = deepspeed.init_inference(
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 246, in load_module_recursive
    replace_transformer_layer(client_module,
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 820, in replace_transformer_layer
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 244, in load_module_recursive
    model = deepspeed.init_inference(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 30, in load
    load_model_with_checkpoint(replaced_module,
      File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 252, in load_model_with_checkpoint
    module.weight = mp_replace.copy(module.weight.data, sd[0][prefix + 'weight'])engine = InferenceEngine(model, config=ds_inference_config)

KeyError:   File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv_deepspeed_dev/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 126, in __init__
'decoder.embed_tokens.weight'

@njhill
Copy link

njhill commented Jan 23, 2023

@RezaYazdaniAminabadi apologies I spoke too soon... it's now working for BLOOM 175B with the pre-sharded fp16 weights, but not the original .bin checkpoint shards (which do work with 0.7.6). We those we are getting the NotImplementedError: Cannot copy out of meta tensor; no data! error.

@dhar174
Copy link

dhar174 commented Jan 23, 2023

Me too:
Traceback (most recent call last): File "/home/darf3/buddy/test.py", line 13, in <module> main() File "/home/darf3/buddy/test.py", line 5, in main BlenderBot1b.init() File "/home/darf3/buddy/BlenderBot1b.py", line 325, in init model = GPTJForCausalLM.from_pretrained( File "/home/darf3/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2113, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 353, in wrapper f(module, *args, **kwargs) File "/home/darf3/.local/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 727, in __init__ self.transformer = GPTJModel(config) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 353, in wrapper f(module, *args, **kwargs) File "/home/darf3/.local/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 487, in __init__ self.wte = nn.Embedding(config.vocab_size, self.embed_dim) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 361, in wrapper self._post_init_method(module) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 757, in _post_init_method param.partition() File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 889, in partition self._partition(param_list, has_been_updated=has_been_updated) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1033, in _partition self._partition_param(param, has_been_updated=has_been_updated) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn return func(*args, **kwargs) File "/home/darf3/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1122, in _partition_param param.ds_tensor.copy_(src_tensor) NotImplementedError: Cannot copy out of meta tensor; no data!

Any idea when a fix might be available?

@dhar174
Copy link

dhar174 commented Jan 23, 2023

Also, I seem to get the same "NotImplementedError: Cannot copy out of meta tensor; no data!" error even when I roll back to 0.7.6. Is that expected? How can I get this working?

P.S.: I am attempting to load a model with checkpoints that are split into two .bin files.

@asafkar
Copy link

asafkar commented Feb 1, 2023

@RezaYazdaniAminabadi apologies I spoke too soon... it's now working for BLOOM 175B with the pre-sharded fp16 weights, but not the original .bin checkpoint shards (which do work with 0.7.6). We those we are getting the NotImplementedError: Cannot copy out of meta tensor; no data! error.

same on my end.

@felifri
Copy link

felifri commented Feb 2, 2023

Same for me. Everything works on 0.7.6 now, and before it didn't. However, 0.8.0. does not resolve the issue and gives similar behavior as the others showed.

@dhar174
Copy link

dhar174 commented Feb 2, 2023

@asafkar @felifri Have you tried with low_cpu_mem_usage=False argument in from_pretrained?

But ultimately what I did that I think got it loading correctly (on 0.8.0) was to use CPU (and thus RAM) to load the model once, and to re-save the checkpoints to a local folder in sharded form using save_pretrained, like model.save_pretrained("checkpoint", max_shard_size="200MB") once, and then from there on out loading from that local checkpoint.

I am using Huggingface Accelerate for handling config and initialization, so I am not using deepspeed.initialize() or deepspeed.init_inference at all, instead I'm simply passing my deepspeed config to the huggingface deepspeed config object (something like dschf = HfDeepSpeedConfig(ds_config)). I don't know if using the Accelerate library makes a difference to this problem or not. It wasn't working either way before I used the strategy I mentioned above. I've been able to load a 6B model onto a single 8GB graphics card by offloading unused params to CPU using zero3, haven't tried anything larger yet.

@molohov
Copy link

molohov commented May 17, 2023

I can confirm that I'm able to replicate this. Interestingly, I'm finding that smaller OPT models work loading with meta tensor. It appears that models that are split in the HuggingFace checkpoints are causing this error (e.g., they have multiple pytorch_model-*-of-*.bin).

@RezaYazdaniAminabadi any idea the cause? I'm guessing we don't catch this in our unit tests because we use small versions of these larger models to save time.

On DS 0.9.2, I tried with opt-350m, which only has one .bin file, and it doesn't work (it throws the NotImplementedError: Cannot copy out of meta tensor; no data! error)

@dhar174
Copy link

dhar174 commented May 17, 2023

I can confirm that I'm able to replicate this. Interestingly, I'm finding that smaller OPT models work loading with meta tensor. It appears that models that are split in the HuggingFace checkpoints are causing this error (e.g., they have multiple pytorch_model-*-of-*.bin).
@RezaYazdaniAminabadi any idea the cause? I'm guessing we don't catch this in our unit tests because we use small versions of these larger models to save time.

On DS 0.9.2, I tried with opt-350m, which only has one .bin file, and it doesn't work (it throws the NotImplementedError: Cannot copy out of meta tensor; no data! error)

What is low_cpu_mem_usage set to?

@molohov
Copy link

molohov commented May 17, 2023

I can confirm that I'm able to replicate this. Interestingly, I'm finding that smaller OPT models work loading with meta tensor. It appears that models that are split in the HuggingFace checkpoints are causing this error (e.g., they have multiple pytorch_model-*-of-*.bin).
@RezaYazdaniAminabadi any idea the cause? I'm guessing we don't catch this in our unit tests because we use small versions of these larger models to save time.

On DS 0.9.2, I tried with opt-350m, which only has one .bin file, and it doesn't work (it throws the NotImplementedError: Cannot copy out of meta tensor; no data! error)

What is low_cpu_mem_usage set to?

If I set low_cpu_mem_usage = False, the error still occurs.

@puyuanOT
Copy link

I got the same error with NousResearch/Nous-Capybara-34B,

  File "/home/ec2-user/SageMaker/anaconda3/envs/ot-gpt-package/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ec2-user/SageMaker/anaconda3/envs/ot-gpt-package/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3480, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/ec2-user/SageMaker/anaconda3/envs/ot-gpt-package/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/ec2-user/SageMaker/anaconda3/envs/ot-gpt-package/lib/python3.10/site-packages/transformers/modeling_utils.py", line 751, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/home/ec2-user/SageMaker/anaconda3/envs/ot-gpt-package/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 108, in set_module_quantized_tensor_to_device
    new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

No branches or pull requests

9 participants