Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torchvision.models' has no attribute 'get_model' #6761

Closed
satpalsr opened this issue Oct 13, 2022 · 7 comments
Closed

Comments

@satpalsr
Copy link

馃悰 Describe the bug

Reproduce with Colab

I am trying to execute RetinaNet training with

torchrun --nproc_per_node=1 /content/vision/references/detection/train.py\
    --dataset coco --data-path=/content/vision/dataset --model retinanet_resnet50_fpn --epochs 26\
    --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.01 --weights-backbone ResNet50_Weights.IMAGENET1K_V1

but get's an AttributeError: module 'torchvision.models' has no attribute 'get_model'

Complete trace:

| distributed init (rank 0): env://
Namespace(amp=False, aspect_ratio_group_factor=3, batch_size=2, data_augmentation='hflip', data_path='/content/vision/dataset', dataset='coco', device='cuda', dist_backend='nccl', dist_url='env://', distributed=True, epochs=26, gpu=0, lr=0.01, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='retinanet_resnet50_fpn', momentum=0.9, norm_weight_decay=None, opt='sgd', output_dir='.', print_freq=20, rank=0, resume='', rpn_score_thresh=None, start_epoch=0, sync_bn=False, test_only=False, trainable_backbone_layers=None, use_copypaste=False, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone='ResNet50_Weights.IMAGENET1K_V1', workers=4, world_size=1)
Loading data
loading annotations into memory...
Done (t=14.51s)
creating index...
index created!
loading annotations into memory...
Done (t=2.34s)
creating index...
index created!
Creating data loaders
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [  104   982 24236  2332  8225 74466  5763  1158]
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:566: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
Creating model
Traceback (most recent call last):
  File "/content/vision/references/detection/train.py", line 311, in <module>
    main(args)
  File "/content/vision/references/detection/train.py", line 222, in main
    model = torchvision.models.get_model(
AttributeError: module 'torchvision.models' has no attribute 'get_model'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1179) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 755, in run
    )(*cmd_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/content/vision/references/detection/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-10-13_08:26:16
  host      : 291a9c949d94
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1179)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Versions

PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.22.6
Libc version: glibc-2.26

Python version: 3.7.14 (default, Sep  8 2022, 00:06:44)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.2.152
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect
@NicolasHug
Copy link
Member

Hi @satpalsr
get_model() is only available in the dev branch (main) and in the nightly release of torchvision. It will be released in the upcoming release in a few week. Mind you though, it's still marked as Beta right now, so backward compatibility isn't guaranteed.

@satpalsr
Copy link
Author

satpalsr commented Oct 13, 2022

I cloned the repo and did python setup.py install for installation. That should have fixed?

@NicolasHug
Copy link
Member

Yes, installing from source should make get_model() available

@satpalsr
Copy link
Author

But I am having trouble in Colab

@NicolasHug
Copy link
Member

Looking at your logs, the source build is failing. You'll also need to install the nightly version of torch core (check out our instructions in the readme). But your best bet is to install the nightly version of torchvision instead of building from source

@BUGUANLAN
Copy link

i meet the same problem ,my torchvision's version :0.2.2 ,i know the version is too low ,so the torchvision hasnot 'get_model'.have you resolve it? my GPU is very old ,so i cannot update my torchvision ,i donot know how to make it .can you share your ideal?

@datumbox
Copy link
Contributor

@BUGUANLAN The get_model() method was added at v0.14. If you can't upgrade to the latest version due to hardware constrains, then I think the best option for you would be to fetch the models by using the legacy idiom:

torchvision.models.__dict__[model_name](**kwargs)

Given there is nothing to resolve, I'll be closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants