Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc checkout: permission errors with hardlinks and shared cache #10057

Open
montry25 opened this issue Oct 28, 2023 · 0 comments
Open

dvc checkout: permission errors with hardlinks and shared cache #10057

montry25 opened this issue Oct 28, 2023 · 0 comments
Labels
A: data-management Related to dvc add/checkout/commit/move/remove research

Comments

@montry25
Copy link

Bug Report

Issue name

dvc checkout: permission errors with hardlinks and shared cache

Description

While working with hardlinks/symlinks and shared cache, DVC doesn't allow users different from owner to checkout (more precisely, create links between cache and their workspace) even if they belong to the same group as cache files.

Reproduce

0. Setup users, group and dvc cache path:

  • Users: max, egor, storage -> home folders are located in /home/{user}, mounted to the same volume
  • All users belong to group scoring
  • /home/storage and all its contents belong to group scoring

1. Create dvc cache dir with necessary permissions + setgid:

  • mkdir /home/storage/dvc_cache_test
  • sudo chown :scoring /home/storage/dvc_cache_test
  • sudo chmod -R g+s /home/storage/dvc_cache_test

2. Init repo, configure dvc cache

  • git init
  • dvc init
  • dvc config --local cache.dir /home/storage/dvc_cache_test
  • dvc config --local cache.type hardlink
  • dvc config --local cache.shared group

3. Add data to cache

  • echo "some data" > test.txt
  • dvc add test.txt
  • *Git add, commit, push
  • Check cache permissions find /home/storage/dvc_cache_test -type d -exec ls -al {} +:
/home/storage/dvc_cache_test:
total 12
drwxrwsr-x+  3 max     scoring 4096 Oct 28 12:35 .
drwxrws---+ 11 storage scoring 4096 Oct 28 10:59 ..
drwxrwsr-x+  3 max     scoring 4096 Oct 28 12:35 files

/home/storage/dvc_cache_test/files:
total 12
drwxrwsr-x+ 3 max scoring 4096 Oct 28 12:35 .
drwxrwsr-x+ 3 max scoring 4096 Oct 28 12:35 ..
drwxrwsr-x+ 3 max scoring 4096 Oct 28 13:01 md5

/home/storage/dvc_cache_test/files/md5:
total 12
drwxrwsr-x+ 3 max scoring 4096 Oct 28 13:01 .
drwxrwsr-x+ 3 max scoring 4096 Oct 28 12:35 ..
drwxrwsr-x+ 2 max scoring 4096 Oct 28 12:35 5f

/home/storage/dvc_cache_test/files/md5/5f:
total 12
drwxrwsr-x+ 2 max scoring 4096 Oct 28 12:35 .
drwxrwsr-x+ 3 max scoring 4096 Oct 28 13:01 ..
-r--r--r--  2 max scoring   10 Oct 28 11:18 ebbef14389ebcfc3e501fa1091adcb

4. Change user and try to get data from cache

  • *Change user to egor, git clone, setup dvc cache as above
  • dvc checkout --verbose
2023-10-28 13:05:02,608 DEBUG: v3.27.0 (conda), CPython 3.12.0 on Linux-5.4.0-153-generic-x86_64-with-glibc2.31                              
2023-10-28 13:05:02,608 DEBUG: command: /home/egor/.conda/envs/dvc_test/bin/dvc checkout --verbose                                           
Building workspace index                                                                                           |0.00 [00:00,    ?entry/s]
Comparing indexes                                                                                                 |2.00 [00:00, 1.45kentry/s]
2023-10-28 13:05:02,869 DEBUG: failed to create '/home/egor/dvc_test/test.txt' from '/home/storage/dvc_cache_test/files/md5/5f/ebbef14389ebcfc3e501fa1091adcb' - [Errno 95] no more link types left to try out: [Errno 1] Operation not permitted: '/home/storage/dvc_cache_test/files/md5/5f/ebbef14389ebcfc3e501fa1091adcb' -> '/home/egor/dvc_test/test.txt'
Traceback (most recent call last):
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 273, in _try_links
    _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 62, in _link
    func(from_path, to_path)
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/base.py", line 387, in link
    return self.fs.link(from_info, to_info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/local.py", line 168, in link
    return system.hardlink(path1, path2)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/system.py", line 32, in hardlink
    os.link(src, link_name)
PermissionError: [Errno 1] Operation not permitted: '/home/storage/dvc_cache_test/files/md5/5f/ebbef14389ebcfc3e501fa1091adcb' -> '/home/egor/dvc_test/test.txt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 331, in transfer
    _try_links(
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 290, in _try_links
    raise OSError(errno.ENOTSUP, "no more link types left to try out") from error
OSError: [Errno 95] no more link types left to try out

Applying changes                                                                                                   |0.00 [00:00,     ?file/s]
2023-10-28 13:05:02,870 DEBUG: Removing '/home/egor/dvc_test/test.txt'
2023-10-28 13:05:02,870 ERROR: Checkout failed for following targets:
test.txt
Is your cache up to date?
<https://error.dvc.org/missing-files>
Traceback (most recent call last):
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/cli/__init__.py", line 209, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/commands/checkout.py", line 54, in run
    raise exc
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/commands/checkout.py", line 34, in run
    stats = self.repo.checkout(
            ^^^^^^^^^^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/repo/__init__.py", line 61, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/egor/.conda/envs/dvc_test/lib/python3.12/site-packages/dvc/repo/checkout.py", line 207, in checkout
    raise CheckoutError([relpath(out_path) for out_path in failed], stats)
dvc.exceptions.CheckoutError: Checkout failed for following targets:
test.txt
Is your cache up to date?
<https://error.dvc.org/missing-files>

2023-10-28 13:05:02,872 DEBUG: Analytics is enabled.
2023-10-28 13:05:02,898 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpc4u_hu01']'
2023-10-28 13:05:02,900 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpc4u_hu01']'

Expected

Successful operation as for the file owner, dvc checkout:

Building workspace index                                                                                           |0.00 [00:00,    ?entry/s]
Comparing indexes                                                                                                 |2.00 [00:00, 1.47kentry/s]
Applying changes                                                                                                   |1.00 [00:00,   193file/s]
A       test.txt

Environment information

Output of dvc doctor:

DVC version: 3.27.0 (conda)                                                                                                                  
---------------------------                                                                                                                  
Platform: Python 3.12.0 on Linux-5.4.0-153-generic-x86_64-with-glibc2.31                                                                     
Subprojects:                                                                                                                                 
        dvc_data = 2.18.1                                                                                                                    
        dvc_objects = 1.0.1                                                                                                                  
        dvc_render = 0.6.0                                                                                                                   
        dvc_task = 0.3.0                                                                                                                     
        scmrepo = 1.4.0                                                                                                                      
Supports:                                                                                                                                    
        http (aiohttp = 3.9.0b0, aiohttp-retry = 2.8.3),                                                                                     
        https (aiohttp = 3.9.0b0, aiohttp-retry = 2.8.3)                                                                                     
Config:                                                                                                                                      
        Global: /home/egor/.config/dvc                                                                                                       
        System: /etc/xdg/dvc                                                                                                                 
Cache types: hardlink, symlink                                                                                                               
Cache directory: ext4 on /dev/sda1                                                                                                           
Caches: local                                                                                                                                
Remotes: None                                                                                                                                
Workspace directory: ext4 on /dev/sda1                                                                                                       
Repo: dvc, git                                                                                                                               
Repo.site_cache_dir: /var/tmp/dvc/repo/c384d60d3f3370db552be3c1646fb9ec   

Additional Information (if any):

P.S. Some bad ways to fix the problem:

  • run dvc commands as root
  • change group permissions on all the cache files to rwx -> currently that will contradict necessary 444 rights for cache files and lead to issues with other commands (like dvc exp run)
@dberenbaum dberenbaum added p1-important Important, aka current backlog of things to do A: data-management Related to dvc add/checkout/commit/move/remove and removed p1-important Important, aka current backlog of things to do labels Dec 1, 2023
@efiop efiop assigned efiop and skshetry and unassigned efiop Dec 5, 2023
@skshetry skshetry removed their assignment Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove research
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

5 participants