-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Object Detection AutoTrain Error: iteration over a 0-d tensor #656
Comments
did you also upload validation data or just training data? |
I only uploaded training data, it looked like it automatically did the train/val split. I did find an image that was listed in my metadata twice, so I am wondering if maybe it one of those entries ended up in validation and one in training, causing the image to not be found in the validation set. I fixed this and I am trying again. I can also try manually splitting and uploading my validation data. I will let you know if that fixes the error. |
it does auto splitting. that shouldnt be an issue.
please let me know. this case should be caught earlier |
@abhishekkrthakur I tried removing the duplicated image record from |
okay. so the issue is happening for datasets that have a single class. im fixing the issue and will update here asap. |
@abhishekkrthakur Sounds good, thanks! And no problem, I'm glad I can help test a new feature. |
just pushed a fix and tried on my own please make sure you are on v0.7.110 or above. |
please let me know if you still face issues |
It seems that the training has worked, thanks! I am just facing issues now with the Serverless Inference API, but I think that is separate from this repo. So I think this issue is solved now! |
The api wont work immediately. Try a few minutes after training is done :) and thank you so much for all the help :) |
@rileybolen thank you very much for helping debugging this and apologies for the inconveniences. As a gratitude, we have added a $25 credit to your hugging face account that you can use for spaces, inference endpoints, autotrain or other huggingface services. |
fixed |
Prerequisites
Backend
Hugging Face Space/Endpoints
Interface Used
UI
CLI Command
No response
UI Screenshots & Parameters
Error Logs
100%|██████████| 13/13 [00:10<00:00, 1.51it/s]/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1712608935911/work/torch/csrc/utils/tensor_new.cpp:274.)
batch_image_sizes = torch.tensor([x["orig_size"] for x in batch])
INFO: 10.16.9.183:64413 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.27.38:51108 - "GET /ui/is_model_training HTTP/1.1" 200 OK
ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/main.py", line 199, in train
trainer.train()
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2311, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2721, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3572, in evaluate
output = eval_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3854, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/app/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/utils.py", line 188, in object_detection_metrics
for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):
File "/app/env/lib/python3.10/site-packages/torch/_tensor.py", line 1047, in iter
raise TypeError("iteration over a 0-d tensor")
TypeError: iteration over a 0-d tensor
ERROR | 2024-05-24 14:09:32 | autotrain.trainers.common:wrapper:121 - iteration over a 0-d tensor
INFO | 2024-05-24 14:09:32 | autotrain.trainers.common:pause_space:77 - Pausing space...
33%|███▎ | 100/300 [01:32<03:04, 1.08it/s]
Additional Information
The training is able to start and make some progress, but it seems that after the first epoch of training is completed the training fails with this error.
The text was updated successfully, but these errors were encountered: