Skip to content

Race condition when importing Menagerie descriptions concurrently from multiple processes #114

@hartikainen

Description

@hartikainen

Bug description

When import import MuJoCo Menagerie descriptions concurrently from more than one process, the git clone fails. I was expecting (or at least hoping) for this to work.

Reproduction steps

Steps to reproduce the bug:

  1. Install GNU parallel and robot_descriptions
  2. Run:
rm -rf ~/.cache/robot_descriptions/ && \
parallel --verbose -j 4 "
    python -c 'from robot_descriptions import fr3_mj_description; print({1}, fr3_mj_description.MJCF_PATH)'
" ::: {1..4}
  1. See error

Code

See above

Logs

Full error:
$ rm -rf ~/.cache/robot_descriptions/ && \
$ parallel --verbose -j 4 "
    python -c 'from robot_descriptions import fr3_mj_description; print({1}, fr3_mj_description.MJCF_PATH)'
" ::: {1..4}

    python -c 'from robot_descriptions import fr3_mj_description; print(1, fr3_mj_description.MJCF_PATH)'


    python -c 'from robot_descriptions import fr3_mj_description; print(2, fr3_mj_description.MJCF_PATH)'


    python -c 'from robot_descriptions import fr3_mj_description; print(3, fr3_mj_description.MJCF_PATH)'


    python -c 'from robot_descriptions import fr3_mj_description; print(4, fr3_mj_description.MJCF_PATH)'

Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
    REPOSITORY_PATH: str = _clone_to_cache(
                           ^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
    clone = clone_to_directory(
            ^^^^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
    os.makedirs(target_dir)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
    REPOSITORY_PATH: str = _clone_to_cache(
                           ^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
    clone = clone_to_directory(
            ^^^^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
    os.makedirs(target_dir)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
    REPOSITORY_PATH: str = _clone_to_cache(
                           ^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
    clone = clone_to_directory(
            ^^^^^^^^^^^^^^^^^^^
  File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
    os.makedirs(target_dir)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
4 /Users/user/.cache/robot_descriptions/mujoco_menagerie/franka_fr3/fr3.xml
100%|██████████| 1058.0/1058.0 [00:24<00:00, 43.20it/s]

System

  • OS: macOS Sequoia 15.0.1
  • robot_descriptions version: 1.13.0

Additional context

I spent a bit of time trying to get some sort of locking mechanism to work for clone_to_cache. I never got to finishing it but below's what I had so far. Just putting it out here in case someone finds useful and wants to pick it up.

`clone_to_cache_with_lock`
def clone_to_cache_with_lock(description_name: str, commit: Optional[str] = None) -> str:

    lock_file_path = pathlib.Path(
        os.environ.get(
            "ROBOT_DESCRIPTIONS_CACHE",
            "~/.cache/robot_descriptions/CLONE_LOCK",
        )
    ).expanduser()
    lock_file_path.parent.mkdir(parents=True, exist_ok=True)

    print(f"{lock_file_path=}")

    timeout = 300
    poll_every = 5
    start_time = time.time()
    while True:
        try:
            # Attempt to create the lock file atomically
            lock_fd = os.open(str(lock_file_path), os.O_CREAT | os.O_EXCL | os.O_RDWR)

            try:
                print(f"Process {os.getpid()} is cloning the repository...")
                clone_to_cache(description_name, commit)
                print(f"Process {os.getpid()} completed cloning.")
            except subprocess.CalledProcessError as e:
                print(f"Process {os.getpid()} failed to clone the repository.")
                raise e
            finally:
                os.close(lock_fd)
                lock_file_path.unlink()
            break
        except FileExistsError as e:
            # Lock file exists, wait for cloning to complete
            print(f"Process {os.getpid()} detected cloning in progress. Waiting...")
            time.sleep(poll_every)
            elapsed_time = time.time() - start_time
            if elapsed_time > timeout:
                print(f"Process {os.getpid()} timeout exceeded. Removing stale lock file.")
                try:
                    lock_file_path.unlink()
                except FileNotFoundError:
                    pass
                continue
        except Exception as e:
            print(f"Process {os.getpid()} encountered an error: {e}")
            raise e

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions