Skip to content

[Ray + YOLOv8] YOLOv8 model.tune  #47859

@lucasescucha

Description

@lucasescucha

Hi there!

I'm trying to tune a YOLOv8 model using Ray. To do this, I'm using the following code:

model = YOLO(model_version)
model.to('cuda:0')

model.tune(data=model_config_file_path, epochs=trial_epochs, batch=0.9, iterations=n_trials, use_ray=True, gpu_per_trial=1)

where

model_version: str = "yolov8n.pt"
n_trials: int = 50
trial_epochs: int = 50

model_config_file_path is a file containing this data

names:
- abcd
nc: 1
test: test/images
train: train/images
val: valid/images

When I run the scripts the following error appears:

(_tune pid=62308)  22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
(_tune pid=62308) Model summary: 225 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs
(_tune pid=62308)
(_tune pid=62308) Transferred 319/355 items from pretrained weights
(_tune pid=62308) Freezing layer 'model.22.dfl.conv.weight'
(_tune pid=62308) AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
(_tune pid=62308) Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
  0%|          | 0.00/6.25M [00:00<?, ?B/s]
 28%|██▊       | 1.75M/6.25M [00:00<00:00, 9.04MB/s]
 44%|████▍     | 2.75M/6.25M [00:00<00:00, 9.63MB/s]
 60%|██████    | 3.75M/6.25M [00:00<00:00, 9.77MB/s]
 76%|███████▌  | 4.75M/6.25M [00:00<00:00, 10.0MB/s]
 92%|█████████▏| 5.75M/6.25M [00:00<00:00, 9.95MB/s]
100%|██████████| 6.25M/6.25M [00:00<00:00, 9.87MB/s]
(_tune pid=62308) AMP: checks passed ✅
(_tune pid=62308) AutoBatch: Computing optimal batch size for imgsz=640 at 90.0% CUDA memory utilization.
(_tune pid=62308) AutoBatch: CUDA:0 (NVIDIA GeForce RTX 4080 Laptop GPU) 11.99G total, 0.09G reserved, 0.08G allocated, 11.82G free
(_tune pid=62308)       Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
(_tune pid=62308)      3011043       8.194         0.214            13         28.51        (1, 3, 640, 640)                    list
(_tune pid=62308)      3011043       16.39         0.308         12.01         17.69        (2, 3, 640, 640)                    list
(_tune pid=62308)      3011043       32.78         0.537         14.03         16.56        (4, 3, 640, 640)                    list
(_tune pid=62308)      3011043       65.55         1.015         15.66         18.55        (8, 3, 640, 640)                    list
(_tune pid=62308)      3011043       131.1         2.003         22.33         26.39       (16, 3, 640, 640)                    list
(_tune pid=62308) AutoBatch: Using batch-size 88 for CUDA:0 10.81G/11.99G (90%) ✅
train: Scanning ******working folder*****\data\processed\train\labels.catrain: Scanning ******working folder*****\data\processed\train\labels.cache... 119 images, 1983 backgrounds, 0 corrupt: 100%|██████████| 2102/2102 [00:00<?, ?it/s]
2024-09-29 16:08:43,180 ERROR tune_controller.py:1331 -- Trial task failed for trial _tune_4fc7c_00000
Traceback (most recent call last):
  File "******working folder*****\.venv\Lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
    result = ray.get(future)
             ^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 2691, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 871, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): ray::ImplicitFunc.train() (pid=62308, ip=127.0.0.1, actor_id=922bada3f118e93ed6f578c401000000, repr=_tune)        
  File "python\ray\_raylet.pyx", line 1859, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1800, in ray._raylet.execute_task.function_executor
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\function_manager.py", line 696, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "******working folder*****\.venv\Lib\site-packages\ray\air\_internal\util.py", line 104, in run
    self._ret = self._target(*self._args, **self._kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 45, in <lambda>
    training_func=lambda: self._trainable_func(self.config),
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 250, in _trainable_func
    output = fn()
             ^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\utils\tuner.py", line 103, in _tune
    results = model_to_train.train(**config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\model.py", line 803, in train
    self.trainer.train()
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 207, in train
    self._do_train(world_size)
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 327, in _do_train
    self._setup_train(world_size)
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 291, in _setup_train
    self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=LOCAL_RANK, mode="train")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
    return build_dataloader(dataset, batch_size, workers, shuffle, rank)  # return dataloader
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 135, in build_dataloader
    return InfiniteDataLoader(
           ^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 39, in __init__
    self.iterator = super().__iter__()
                    ^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 440, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1038, in __init__
    w.start()
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 337, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\popen_spawn_win32.py", line 75, in __init__
    hp, ht, pid, tid = _winapi.CreateProcess(
                       ^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 87] El parámetro no es corrector

I tried to debug the code but I couldn't get anything useful.

For installing ray I used pip install -U ultralytics "ray[tune]"

Versions / Dependencies

Python version: 3.12.6
OS version: Windows 11 23H2

Installed packages:

aiosignal                          1.3.1
alembic                            1.13.3
aniso8601                          9.0.1
asttokens                          2.4.1
attrs                              24.2.0
backcall                           0.2.0
beautifulsoup4                     4.12.3
black                              24.8.0
bleach                             6.1.0
blinker                            1.8.2
cachetools                         5.5.0
certifi                            2024.8.30
charset-normalizer                 3.3.2
click                              8.1.7
cloudpickle                        3.0.0
colorama                           0.4.6
colorlog                           6.8.2
comm                               0.2.2
contourpy                          1.3.0
cycler                             0.12.1
databricks-sdk                     0.32.3
debugpy                            1.8.5
decorator                          5.1.1
defusedxml                         0.7.1
Deprecated                         1.2.14
docker                             7.1.0
docker-pycreds                     0.4.0
docopt                             0.6.2
executing                          2.1.0
fastjsonschema                     2.20.0
filelock                           3.16.1
filetype                           1.2.0
flake8                             7.1.1
Flask                              3.0.3
fonttools                          4.53.1
frozenlist                         1.4.1
fsspec                             2024.9.0
gitdb                              4.0.11
GitPython                          3.1.43
google-auth                        2.35.0
graphene                           3.3
graphql-core                       3.2.4
graphql-relay                      3.2.0
greenlet                           3.1.1
idna                               3.7
importlib_metadata                 8.4.0
ipykernel                          6.29.5
ipython                            8.12.3
isort                              5.13.2
itsdangerous                       2.2.0
jedi                               0.19.1
Jinja2                             3.1.4
joblib                             1.4.2
jsonschema                         4.23.0
jsonschema-specifications          2023.12.1
jupyter_client                     8.6.3
jupyter_core                       5.7.2
jupyterlab_pygments                0.3.0
kiwisolver                         1.4.7
loguru                             0.7.2
Mako                               1.3.5
Markdown                           3.7
markdown-it-py                     3.0.0
MarkupSafe                         2.1.5
matplotlib                         3.9.2
matplotlib-inline                  0.1.7
mccabe                             0.7.0
mdurl                              0.1.2
mistune                            3.0.2
mlflow                             2.16.2
mlflow-skinny                      2.16.2
mpmath                             1.3.0
msgpack                            1.1.0
mypy-extensions                    1.0.0
nbclient                           0.10.0
nbconvert                          7.16.4
nbformat                           5.10.4
nest-asyncio                       1.6.0
networkx                           3.3
numpy                              1.26.4
opencv-python                      4.10.0.84
opencv-python-headless             4.10.0.84
opentelemetry-api                  1.27.0
opentelemetry-sdk                  1.27.0
opentelemetry-semantic-conventions 0.48b0
optuna                             4.0.0
packaging                          24.1
pandas                             2.2.3
pandocfilters                      1.5.1
parso                              0.8.4
pathspec                           0.12.1
pickleshare                        0.7.5
pillow                             10.4.0
pip                                24.2
pipreqs                            0.5.0
platformdirs                       4.3.6
prompt_toolkit                     3.0.47
protobuf                           5.28.2
psutil                             6.0.0
pure_eval                          0.2.3
py-cpuinfo                         9.0.0
pyarrow                            17.0.0
pyasn1                             0.6.1
pyasn1_modules                     0.4.1
pycodestyle                        2.12.1
pyflakes                           3.2.0
Pygments                           2.18.0
pyparsing                          3.1.4
python-dateutil                    2.9.0.post0
python-dotenv                      1.0.1
pytz                               2024.2
pywin32                            306
PyYAML                             6.0.2
pyzmq                              26.2.0
ray                                2.37.0
referencing                        0.35.1
requests                           2.32.3
requests-toolbelt                  1.0.0
rich                               13.8.1
roboflow                           1.1.45
rpds-py                            0.20.0
rsa                                4.9
scikit-learn                       1.5.2
scipy                              1.14.1
seaborn                            0.13.2
sentry-sdk                         2.14.0
setproctitle                       1.3.3
setuptools                         75.1.0
shellingham                        1.5.4
six                                1.16.0
smmap                              5.0.1
soupsieve                          2.6
SQLAlchemy                         2.0.35
sqlparse                           0.5.1
stack-data                         0.6.3
sympy                              1.13.3
tensorboardX                       2.6.2.2
threadpoolctl                      3.5.0
tinycss2                           1.3.0
torch                              2.4.1+cu124
torchaudio                         2.4.1+cu124
torchvision                        0.19.1+cu124
tornado                            6.4.1
tqdm                               4.66.5
traitlets                          5.14.3
typer                              0.12.5
typing_extensions                  4.12.2
tzdata                             2024.1
ultralytics                        8.2.103
ultralytics-thop                   2.0.6
urllib3                            2.2.3
waitress                           3.0.0
wcwidth                            0.2.13
webencodings                       0.5.1
Werkzeug                           3.0.4
win32-setctime                     1.1.0
wrapt                              1.16.0
yarg                               0.1.9
zipp                               3.20.2

Reproduction script

from ultralytics import YOLO

model = YOLO("yolov8n.yaml").load("yolov8n.pt")
results = model.tune(data="coco8.yaml", epochs=100, imgsz=640, use_ray=True, gpu_per_trial=1)

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)tuneTune-related issueswindows

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions