Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

Imagenet example fails during accuracy calculation (v0.2.2 on 1.8.1) #150

Closed
10 tasks
assapin opened this issue May 30, 2021 · 1 comment
Closed
10 tasks

Comments

@assapin
Copy link

assapin commented May 30, 2021

🐛 Bug

When running the imagenet example from examples/imagenet,
I get the following error:

[INFO] 2021-05-30 13:09:18,531 api: [default] Starting worker group
=> set cuda device = 0
=> creating model: resnet18
=> no workers have checkpoints, starting from epoch 0
=> start_epoch: 0, best_acc1: 0
Traceback (most recent call last):
File "main.py", line 594, in
main()
File "main.py", line 183, in main
train(train_loader, model, criterion, optimizer, epoch, device_id, print_freq)
File "main.py", line 455, in train
acc1, acc5 = accuracy(output, target, topk=(1, 5))
File "main.py", line 588, in accuracy
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Component (check all that applies):

  • state api
  • train_step api
  • train_loop
  • rendezvous
  • checkpoint
  • rollback
  • metrics
  • petctl
  • [ X] examples
  • docker
  • other

To Reproduce

See environment

Expected behavior

Training should work and accuracy should be reported correctly

Environment

Dockerfile:

FROM pytorch/pytorch:1.8.1-cuda11.1-cudnn8-runtime

RUN apt-get -q update && apt-get -q install -y wget unzip
RUN pip install torchelastic==0.2.2

RUN mkdir ./train
COPY elastic/examples/imagenet/main.py ./train
WORKDIR ./train
RUN chmod -R a+w .
USER root
ENTRYPOINT ["python", "-m", "torchelastic.distributed.launch"]
CMD ["--help"]

@assapin
Copy link
Author

assapin commented May 30, 2021

I see you fixed it in master.
Was going to do a pull request.... next time :-)

@assapin assapin closed this as completed May 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant