-
Notifications
You must be signed in to change notification settings - Fork 60
Closed
Description
Bug description
When import import MuJoCo Menagerie descriptions concurrently from more than one process, the git clone fails. I was expecting (or at least hoping) for this to work.
Reproduction steps
Steps to reproduce the bug:
- Install GNU parallel and robot_descriptions
- Run:
rm -rf ~/.cache/robot_descriptions/ && \
parallel --verbose -j 4 "
python -c 'from robot_descriptions import fr3_mj_description; print({1}, fr3_mj_description.MJCF_PATH)'
" ::: {1..4}- See error
Code
See above
Logs
Full error:
$ rm -rf ~/.cache/robot_descriptions/ && \
$ parallel --verbose -j 4 "
python -c 'from robot_descriptions import fr3_mj_description; print({1}, fr3_mj_description.MJCF_PATH)'
" ::: {1..4}
python -c 'from robot_descriptions import fr3_mj_description; print(1, fr3_mj_description.MJCF_PATH)'
python -c 'from robot_descriptions import fr3_mj_description; print(2, fr3_mj_description.MJCF_PATH)'
python -c 'from robot_descriptions import fr3_mj_description; print(3, fr3_mj_description.MJCF_PATH)'
python -c 'from robot_descriptions import fr3_mj_description; print(4, fr3_mj_description.MJCF_PATH)'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
REPOSITORY_PATH: str = _clone_to_cache(
^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
clone = clone_to_directory(
^^^^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
os.makedirs(target_dir)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
REPOSITORY_PATH: str = _clone_to_cache(
^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
clone = clone_to_directory(
^^^^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
os.makedirs(target_dir)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/fr3_mj_description.py", line 14, in <module>
REPOSITORY_PATH: str = _clone_to_cache(
^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 138, in clone_to_cache
clone = clone_to_directory(
^^^^^^^^^^^^^^^^^^^
File "/Users/user/github/Code/code-worktree-3/.venv/lib/python3.12/site-packages/robot_descriptions/_cache.py", line 80, in clone_to_directory
os.makedirs(target_dir)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Users/user/.cache/robot_descriptions/mujoco_menagerie'
Cloning https://github.com/deepmind/mujoco_menagerie.git...
4 /Users/user/.cache/robot_descriptions/mujoco_menagerie/franka_fr3/fr3.xml
100%|██████████| 1058.0/1058.0 [00:24<00:00, 43.20it/s]System
- OS: macOS Sequoia 15.0.1
- robot_descriptions version: 1.13.0
Additional context
I spent a bit of time trying to get some sort of locking mechanism to work for clone_to_cache. I never got to finishing it but below's what I had so far. Just putting it out here in case someone finds useful and wants to pick it up.
`clone_to_cache_with_lock`
def clone_to_cache_with_lock(description_name: str, commit: Optional[str] = None) -> str:
lock_file_path = pathlib.Path(
os.environ.get(
"ROBOT_DESCRIPTIONS_CACHE",
"~/.cache/robot_descriptions/CLONE_LOCK",
)
).expanduser()
lock_file_path.parent.mkdir(parents=True, exist_ok=True)
print(f"{lock_file_path=}")
timeout = 300
poll_every = 5
start_time = time.time()
while True:
try:
# Attempt to create the lock file atomically
lock_fd = os.open(str(lock_file_path), os.O_CREAT | os.O_EXCL | os.O_RDWR)
try:
print(f"Process {os.getpid()} is cloning the repository...")
clone_to_cache(description_name, commit)
print(f"Process {os.getpid()} completed cloning.")
except subprocess.CalledProcessError as e:
print(f"Process {os.getpid()} failed to clone the repository.")
raise e
finally:
os.close(lock_fd)
lock_file_path.unlink()
break
except FileExistsError as e:
# Lock file exists, wait for cloning to complete
print(f"Process {os.getpid()} detected cloning in progress. Waiting...")
time.sleep(poll_every)
elapsed_time = time.time() - start_time
if elapsed_time > timeout:
print(f"Process {os.getpid()} timeout exceeded. Removing stale lock file.")
try:
lock_file_path.unlink()
except FileNotFoundError:
pass
continue
except Exception as e:
print(f"Process {os.getpid()} encountered an error: {e}")
raise eMetadata
Metadata
Assignees
Labels
No labels