Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656

Closed
2 tasks done
rileybolen opened this issue May 24, 2024 · 12 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@rileybolen
Copy link

Prerequisites

  • I have read the documentation.
  • I have checked other issues for similar problems.

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Screenshot 2024-05-22 at 8 01 39 AM

Error Logs

100%|██████████| 13/13 [00:10<00:00, 1.51it/s]/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1712608935911/work/torch/csrc/utils/tensor_new.cpp:274.)
batch_image_sizes = torch.tensor([x["orig_size"] for x in batch])
INFO: 10.16.9.183:64413 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.27.38:51108 - "GET /ui/is_model_training HTTP/1.1" 200 OK
ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/main.py", line 199, in train
trainer.train()
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2311, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2721, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in evaluate
output = eval_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3854, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/app/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py", line 188, in object_detection_metrics
for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
File "/app/env/lib/python3.10/site-packages/torch/_tensor.py", line 1047, in iter
raise TypeError("iteration over a 0-d tensor")
TypeError: iteration over a 0-d tensor

ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:121 - iteration over a 0-d tensor
INFO | 2024-05-24 14:09:32 | autotrain.trainers.common:pause_space:77 - Pausing space...

33%|███▎ | 100/300 [01:32<03:04, 1.08it/s]

Additional Information

The training is able to start and make some progress, but it seems that after the first epoch of training is completed the training fails with this error.

@rileybolen rileybolen added the bug Something isn't working label May 24, 2024
@abhishekkrthakur
Copy link
Member

did you also upload validation data or just training data?

@rileybolen
Copy link
Author

I only uploaded training data, it looked like it automatically did the train/val split. I did find an image that was listed in my metadata twice, so I am wondering if maybe it one of those entries ended up in validation and one in training, causing the image to not be found in the validation set. I fixed this and I am trying again. I can also try manually splitting and uploading my validation data. I will let you know if that fixes the error.

@abhishekkrthakur
Copy link
Member

it does auto splitting. that shouldnt be an issue.

I did find an image that was listed in my metadata twice, so I am wondering if maybe it one of those entries ended up in validation and one in training, causing the image to not be found in the validation set. I fixed this and I am trying again

please let me know. this case should be caught earlier

@rileybolen
Copy link
Author

@abhishekkrthakur I tried removing the duplicated image record from metadata.jsonl and I still got the same error.

@abhishekkrthakur
Copy link
Member

okay. so the issue is happening for datasets that have a single class. im fixing the issue and will update here asap.
i really hope it works for you end to end now. and deep apologies.

@rileybolen
Copy link
Author

@abhishekkrthakur Sounds good, thanks! And no problem, I'm glad I can help test a new feature.

@abhishekkrthakur
Copy link
Member

just pushed a fix and tried on my own please make sure you are on v0.7.110 or above.

@abhishekkrthakur
Copy link
Member

please let me know if you still face issues

@rileybolen
Copy link
Author

It seems that the training has worked, thanks! I am just facing issues now with the Serverless Inference API, but I think that is separate from this repo. So I think this issue is solved now!

@abhishekkrthakur
Copy link
Member

The api wont work immediately. Try a few minutes after training is done :) and thank you so much for all the help :)

@abhishekkrthakur
Copy link
Member

@rileybolen thank you very much for helping debugging this and apologies for the inconveniences. As a gratitude, we have added a $25 credit to your hugging face account that you can use for spaces, inference endpoints, autotrain or other huggingface services.

@abhishekkrthakur
Copy link
Member

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants