ModuleNotFoundError: No module named 'torch._higher_order_ops' #1038

guotong1988 · 2024-05-31T01:55:10Z

System Info

torch==2.0.0	
torchtune==0.1.1	
transformers==4.41.1	
safetensors==0.4.3

Reproduction

from torchtune.utils import FullModelHFCheckpointer
from torchtune.models import convert_weights
import torch

checkpointer = FullModelHFCheckpointer(
    checkpoint_dir="pythonProject/llama3_main/meta-llama-3-8b-instruct/",
    checkpoint_files=["model-00001-of-00004.safetensors", "model-00002-of-00004.safetensors",
                      "model-00003-of-00004.safetensors", "model-00004-of-00004.safetensors"],
    output_dir="./tmp",
    model_type='LLAMA3'
)

print("loading checkpoint")
sd = checkpointer.load_checkpoint()
sd = convert_weights.tune_to_meta(sd['model'])
print("saving checkpoint")
torch.save(sd, "./tmp/checkpoint.pth")

ERROR INFO

Traceback (most recent call last):
  File "pythonProject/convert.py", line 1, in <module>
    from torchtune.utils import FullModelHFCheckpointer
  File "python3.8/site-packages/torchtune/__init__.py", line 9, in <module>
    from torchtune import datasets, models, modules, utils
  File "python3.8/site-packages/torchtune/datasets/__init__.py", line 7, in <module>
    from torchtune.datasets._alpaca import alpaca_cleaned_dataset, alpaca_dataset
  File "python3.8/site-packages/torchtune/datasets/_alpaca.py", line 10, in <module>
    from torchtune.datasets._instruct import InstructDataset
  File "python3.8/site-packages/torchtune/datasets/_instruct.py", line 12, in <module>
    from torchtune.config._utils import _get_instruct_template
  File "python3.8/site-packages/torchtune/config/__init__.py", line 7, in <module>
    from ._instantiate import instantiate
  File "python3.8/site-packages/torchtune/config/_instantiate.py", line 12, in <module>
    from torchtune.config._utils import _get_component_from_path, _has_component
  File "python3.8/site-packages/torchtune/config/_utils.py", line 16, in <module>
    from torchtune.utils import get_logger, get_world_size_and_rank
  File "python3.8/site-packages/torchtune/utils/__init__.py", line 7, in <module>
    from ._checkpointing import (  # noqa
  File "python3.8/site-packages/torchtune/utils/_checkpointing/__init__.py", line 7, in <module>
    from ._checkpointer import (  # noqa
  File "python3.8/site-packages/torchtune/utils/_checkpointing/_checkpointer.py", line 17, in <module>
    from torchtune.models import convert_weights
  File "python3.8/site-packages/torchtune/models/__init__.py", line 7, in <module>
    from torchtune.models import convert_weights, gemma, llama2, mistral  # noqa
  File "python3.8/site-packages/torchtune/models/gemma/__init__.py", line 7, in <module>
    from ._component_builders import gemma  # noqa
  File "python3.8/site-packages/torchtune/models/gemma/_component_builders.py", line 9, in <module>
    from torchtune.modules import (
  File "python3.8/site-packages/torchtune/modules/__init__.py", line 8, in <module>
    from .common_utils import reparametrize_as_dtype_state_dict_post_hook
  File "python3.8/site-packages/torchtune/modules/common_utils.py", line 12, in <module>
    from torchao.dtypes.nf4tensor import NF4Tensor
  File "python3.8/site-packages/torchao/__init__.py", line 2, in <module>
    from .quantization.quant_api import apply_dynamic_quant
  File "python3.8/site-packages/torchao/quantization/__init__.py", line 7, in <module>
    from .smoothquant import *  # noqa: F403
  File "python3.8/site-packages/torchao/quantization/smoothquant.py", line 18, in <module>
    import torchao.quantization.quant_api as quant_api
  File "python3.8/site-packages/torchao/quantization/quant_api.py", line 22, in <module>
    from .dynamic_quant import DynamicallyPerAxisQuantizedLinear
  File "python3.8/site-packages/torchao/quantization/dynamic_quant.py", line 10, in <module>
    from .quant_primitives import (
  File "python3.8/site-packages/torchao/quantization/quant_primitives.py", line 9, in <module>
    from torch._higher_order_ops.out_dtype import out_dtype
ModuleNotFoundError: No module named 'torch._higher_order_ops'

The text was updated successfully, but these errors were encountered:

RdoubleA · 2024-06-03T17:29:39Z

Thanks for adding all the details to repro. I was not able to reproduce this error. Can you update your torch version to the latest stable? Also, what is your torchao version? I did not have any issues with torch==2.3.0, torchao==0.1, torchtune==0.1.1

guotong1988 · 2024-06-04T01:15:20Z

CUDA Version: 11.4
torch==2.3.0
torchao==0.1
torchtune==0.1.1

Try torch==2.3.0, then ERROR:
libtorch_cuda.so: undefined symbol: ncclCommRegister

Then I install torch==2.2.0
Then Ok.

Then I re-install torch==2.3.0
Also Ok .

guotong1988 closed this as completed Jun 4, 2024

This was referenced Jun 4, 2024

ModuleNotFoundError: No module named 'torch._higher_order_ops' huggingface/safetensors#482

Closed

torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister NVIDIA/nccl#1180

Open

SalmanMohammadi mentioned this issue Jun 7, 2024

Missing module torch._higher_order_ops when running a simple $ tune #1071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'torch._higher_order_ops' #1038

ModuleNotFoundError: No module named 'torch._higher_order_ops' #1038

guotong1988 commented May 31, 2024 •

edited

Loading

RdoubleA commented Jun 3, 2024

guotong1988 commented Jun 4, 2024 •

edited

Loading

ModuleNotFoundError: No module named 'torch._higher_order_ops' #1038

ModuleNotFoundError: No module named 'torch._higher_order_ops' #1038

Comments

guotong1988 commented May 31, 2024 • edited Loading

System Info

Reproduction

ERROR INFO

RdoubleA commented Jun 3, 2024

guotong1988 commented Jun 4, 2024 • edited Loading

guotong1988 commented May 31, 2024 •

edited

Loading

guotong1988 commented Jun 4, 2024 •

edited

Loading