Bugs? #1

WangWenhao0716 · 2021-12-10T17:58:26Z

Congratulations! We really appreciate the work. When I run the

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 --batch-size 16 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

I come across

Traceback (most recent call last):                                              
  File "v107.py", line 774, in <module>
    train(args)
  File "v107.py", line 425, in train
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 5 terminated with the following error:
Traceback (most recent call last):
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/wangwenhao/fbisc-descriptor-1st/exp/v107.py", line 573, in main_worker
    train_one_epoch(train_loader, model, loss_fn, optimizer, scaler, epoch, args)
  File "/home/wangwenhao/fbisc-descriptor-1st/exp/v107.py", line 595, in train_one_epoch
    labels = torch.cat([torch.tile(i, dims=(args.ncrops,)), torch.tensor(j)])
ValueError: only one element tensors can be converted to Python scalars

Do you know how to fix it?
Thanks.

The text was updated successfully, but these errors were encountered:

lyakaap · 2021-12-10T19:00:37Z

Congrats to you too! Your winnings in both tracks are incredible :)

I haven't ever seen such an error. Do you put image files on correct location? If you have already done this correctly, print i and j value and show me outputs. It might be useful information for debug.

WangWenhao0716 · 2021-12-11T07:26:28Z

Thanks for your reply. In fact, I know that might be useful for debugging and I have tried it yesterday. However, I cannot work it out by myself.

i =  tensor([1537, 1191])

j =  [tensor([1546283, 1867690]), tensor([1780914, 1504719]), tensor([1353055, 1878239]), tensor([1931255, 1205254]), tensor([1178165, 1401500]), tensor([1713147, 1749940]), tensor([1333900, 1671408]), tensor([1732070, 1593446]), tensor([1475793, 1149125]), tensor([1002561, 1548406]), tensor([1634161, 1714439]), tensor([1729160, 1631621]), tensor([1257713, 1890521]), tensor([1896319, 1713320]), tensor([1085255, 1081381]), tensor([1392220, 1799155]), tensor([1460125, 1605860]), tensor([1426539, 1045038]), tensor([1722017, 1349333]), tensor([1371985, 1360729]), tensor([1332006, 1671282]), tensor([1339213, 1493030]), tensor([1909343, 1060632]), tensor([1400760, 1459965]), tensor([1692564, 1535537]), tensor([1494376, 1822024]), tensor([1878225, 1558317]), tensor([1288187, 1682532]), tensor([1793712, 1596738]), tensor([1348662, 1096824])]

A toy example:

import torch
i = torch.Tensor([1537, 1191])
j = [torch.Tensor([1380528, 1715717]), torch.Tensor([1614647, 1619035])]
torch.cat([torch.tile(i, dims=(2,)), torch.tensor(j)])

It will also result in

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_1097398/2617762925.py in <module>
----> 1 torch.cat([torch.tile(i, dims=(2,)), torch.tensor(j)])

ValueError: only one element tensors can be converted to Python scalars

Looking forward to your reply.

lyakaap · 2021-12-12T09:30:00Z

In my case, i and j are as follows:

i:

tensor([1537])

j:

[tensor([1493751]), tensor([1594483]), tensor([1310616]), tensor([1566637]), tensor([1041634]), tensor([1321072]), tensor([1756446]), tensor([1876031]), tensor([1949834]), tensor([1317828]), tensor([1293972]), tensor([1700646]), tensor([1928488]), tensor([1719636])
, tensor([1716178]), tensor([1565452]), tensor([1281302]), tensor([1904498]), tensor([1212152]), tensor([1821218]), tensor([1004454]), tensor([1903469]), tensor([1583914]), tensor([1809848]), tensor([1894128]), tensor([1311861]), tensor([1405172]), tensor([1122038]), tensor([1628
859]), tensor([1761828])]

It is strange that tensors have two elements in your case.
Please confirm again your data location and your pytorch version.

WangWenhao0716 · 2021-12-12T09:57:07Z

In my case, i and j are as follows:

i:

tensor([1537])

j:

[tensor([1493751]), tensor([1594483]), tensor([1310616]), tensor([1566637]), tensor([1041634]), tensor([1321072]), tensor([1756446]), tensor([1876031]), tensor([1949834]), tensor([1317828]), tensor([1293972]), tensor([1700646]), tensor([1928488]), tensor([1719636])
, tensor([1716178]), tensor([1565452]), tensor([1281302]), tensor([1904498]), tensor([1212152]), tensor([1821218]), tensor([1004454]), tensor([1903469]), tensor([1583914]), tensor([1809848]), tensor([1894128]), tensor([1311861]), tensor([1405172]), tensor([1122038]), tensor([1628
859]), tensor([1761828])]

It is strange that tensors have two elements in your case. Please confirm again your data location and your pytorch version.

Thanks. It is interesting. I will double-check all the related files and give you feedback.

WangWenhao0716 · 2021-12-13T15:22:03Z

Hi, I have double-checked the PyTorch version:

>>> import torch
>>> torch.__version__
'1.9.0+cu111'

Also the data-dir:

input
  query_images
  reference_images
  training_images
  public_ground_truth.csv
exp
...

However, the problem still exists 😭😭😭.
Please make sure that you run the v107.py rather than others (others perform well).
It is very strange.
Or can you have a real-time meeting (such as zoom) with me to reproduce the bug?
Thanks a lot!

WangWenhao0716 · 2021-12-13T15:23:10Z

I will do some future works on this topic, therefore, as a benchmark, your method is crucial to my research. Thanks!

lyakaap · 2021-12-14T05:23:09Z

That's truly strange.
Okay, let's arrange real-time meeting schedule via email: bepemgdlp@gmail.com

WangWenhao0716 · 2021-12-14T05:53:19Z

That's truly strange. Okay, let's arrange real-time meeting schedule via email: bepemgdlp@gmail.com

I'm free at any time today. Could you arrange a zoom meeting at your convenience time? Zoom does not allow Chinese guys to arrange a meeting. Thanks!

WangWenhao0716 · 2021-12-14T05:53:37Z

My email is wangwenhao0716@gmail.com

lyakaap · 2021-12-15T12:14:43Z

I noticed that my code doesn't consider the cases using fewer GPUs than 16 GPUs in v107.py.
I will fix it and commit.
Thanks @WangWenhao0716

WangWenhao0716 · 2021-12-15T12:20:11Z

Thanks for your reply and all your contribution.

lyakaap · 2021-12-16T10:32:50Z

Fixed it. Close this issue.

WangWenhao0716 · 2021-12-18T14:05:11Z

All the other parts work well. Thanks again for your work. By the way, faiss works with A100 well (faiss 1.7.1 with cuda11.1).

lyakaap · 2021-12-20T08:57:22Z

Thanks for reporting! Will take a look

lyakaap closed this as completed Dec 16, 2021

1191658517 mentioned this issue Apr 6, 2023

how to export to ONNX model? #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs? #1

Bugs? #1

WangWenhao0716 commented Dec 10, 2021

lyakaap commented Dec 10, 2021

WangWenhao0716 commented Dec 11, 2021

lyakaap commented Dec 12, 2021

WangWenhao0716 commented Dec 12, 2021

WangWenhao0716 commented Dec 13, 2021

WangWenhao0716 commented Dec 13, 2021

lyakaap commented Dec 14, 2021

WangWenhao0716 commented Dec 14, 2021

WangWenhao0716 commented Dec 14, 2021

lyakaap commented Dec 15, 2021

WangWenhao0716 commented Dec 15, 2021

lyakaap commented Dec 16, 2021

WangWenhao0716 commented Dec 18, 2021

lyakaap commented Dec 20, 2021

Bugs? #1

Bugs? #1

Comments

WangWenhao0716 commented Dec 10, 2021

lyakaap commented Dec 10, 2021

WangWenhao0716 commented Dec 11, 2021

lyakaap commented Dec 12, 2021

WangWenhao0716 commented Dec 12, 2021

WangWenhao0716 commented Dec 13, 2021

WangWenhao0716 commented Dec 13, 2021

lyakaap commented Dec 14, 2021

WangWenhao0716 commented Dec 14, 2021

WangWenhao0716 commented Dec 14, 2021

lyakaap commented Dec 15, 2021

WangWenhao0716 commented Dec 15, 2021

lyakaap commented Dec 16, 2021

WangWenhao0716 commented Dec 18, 2021

lyakaap commented Dec 20, 2021