[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

rileybolen · 2024-05-24T14:22:14Z

Prerequisites

I have read the documentation.
I have checked other issues for similar problems.

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Error Logs

100%|██████████| 13/13 [00:10<00:00, 1.51it/s]/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1712608935911/work/torch/csrc/utils/tensor_new.cpp:274.)
batch_image_sizes = torch.tensor([x["orig_size"] for x in batch])
INFO: 10.16.9.183:64413 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.27.38:51108 - "GET /ui/is_model_training HTTP/1.1" 200 OK
ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/main.py", line 199, in train
trainer.train()
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2311, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2721, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in evaluate
output = eval_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3854, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/app/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py", line 188, in object_detection_metrics
for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
File "/app/env/lib/python3.10/site-packages/torch/_tensor.py", line 1047, in iter
raise TypeError("iteration over a 0-d tensor")
TypeError: iteration over a 0-d tensor

ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:121 - iteration over a 0-d tensor
INFO | 2024-05-24 14:09:32 | autotrain.trainers.common:pause_space:77 - Pausing space...

33%|███▎ | 100/300 [01:32<03:04, 1.08it/s]

Additional Information

The training is able to start and make some progress, but it seems that after the first epoch of training is completed the training fails with this error.

abhishekkrthakur · 2024-05-24T14:27:17Z

did you also upload validation data or just training data?

rileybolen · 2024-05-24T14:32:38Z

I only uploaded training data, it looked like it automatically did the train/val split. I did find an image that was listed in my metadata twice, so I am wondering if maybe it one of those entries ended up in validation and one in training, causing the image to not be found in the validation set. I fixed this and I am trying again. I can also try manually splitting and uploading my validation data. I will let you know if that fixes the error.

abhishekkrthakur · 2024-05-24T14:38:15Z

it does auto splitting. that shouldnt be an issue.

I did find an image that was listed in my metadata twice, so I am wondering if maybe it one of those entries ended up in validation and one in training, causing the image to not be found in the validation set. I fixed this and I am trying again

please let me know. this case should be caught earlier

rileybolen · 2024-05-24T14:39:29Z

@abhishekkrthakur I tried removing the duplicated image record from metadata.jsonl and I still got the same error.

abhishekkrthakur · 2024-05-24T14:54:04Z

okay. so the issue is happening for datasets that have a single class. im fixing the issue and will update here asap.
i really hope it works for you end to end now. and deep apologies.

rileybolen · 2024-05-24T14:55:53Z

@abhishekkrthakur Sounds good, thanks! And no problem, I'm glad I can help test a new feature.

abhishekkrthakur · 2024-05-24T15:11:39Z

just pushed a fix and tried on my own please make sure you are on v0.7.110 or above.

abhishekkrthakur · 2024-05-24T15:51:05Z

please let me know if you still face issues

rileybolen · 2024-05-24T15:52:40Z

It seems that the training has worked, thanks! I am just facing issues now with the Serverless Inference API, but I think that is separate from this repo. So I think this issue is solved now!

abhishekkrthakur · 2024-05-24T17:51:25Z

The api wont work immediately. Try a few minutes after training is done :) and thank you so much for all the help :)

abhishekkrthakur · 2024-05-25T13:50:46Z

@rileybolen thank you very much for helping debugging this and apologies for the inconveniences. As a gratitude, we have added a $25 credit to your hugging face account that you can use for spaces, inference endpoints, autotrain or other huggingface services.

abhishekkrthakur · 2024-05-29T18:01:00Z

fixed

rileybolen added the bug Something isn't working label May 24, 2024

abhishekkrthakur closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

abhishekkrthakur commented May 25, 2024

abhishekkrthakur commented May 29, 2024

[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

Comments

rileybolen commented May 24, 2024

Prerequisites

Backend

Interface Used

CLI Command

UI Screenshots & Parameters

Error Logs

Additional Information

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

abhishekkrthakur commented May 24, 2024

rileybolen commented May 24, 2024

abhishekkrthakur commented May 24, 2024

abhishekkrthakur commented May 25, 2024

abhishekkrthakur commented May 29, 2024