Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add : deletes files from the cache on NAS servers on Windows when duplicate images are present #6368

Closed
louistransfer opened this issue Jul 28, 2021 · 11 comments
Labels
awaiting response we are waiting for your reply, please respond! :) P: windows Related to the Platform: Windows

Comments

@louistransfer
Copy link

louistransfer commented Jul 28, 2021

Bug Report

Description

With @anasitomtn, we have been working on using DVC on a Windows NAS server using a NTFS file system. One of our data scientists reported a strange issue when he started to use DVC, as files which were supposed to be copied to the cache completely disappeared when he used the dvc add command.

We managed to narrow the issue down. Initially we couldn't manage to reproduct the issue with the same dvc and python versions on windows and with different images. However, when we used the same images as him, the issue appeared again. Reducing the images folder to only 2 duplicate images yielded the bug.

We are also investigating an issue with links on a Windows NAS which may or may not be linked to this.

There appears to be an issue with os.rename (see the "Additional information" section). A theory of ours is that when a duplicate is present, a cache file is created by dvc with a hash for the first duplicate image. For an unknown reason DVC supposes that all hashes are unique when building the cache, but when it tries to create a cache file for the second duplicate image, it fails as it has insufficient permissions to replace this existing cache file with the new one (which has the exact same name as the hash is deterministic). However this hypothesis needs to be confirmed. Please note that all files are removed from the original folder in any case (not only the duplicates).

This issue from 2019 appears to have a similar configuration, however it is run from Ubuntu and not from Windows (in our case, there are no issues on Linux). I do not think that the original issue was fixed, instead the ticket was closed when another bug related to "dvc version" evoked in the ticket thread was fixed.

Fortunately, all of our tests were run on test data, but we believe that this bug can be very dangerous for data scientists who would want to run experiments from production data on a NAS, as it can happen any time in the dvc workflow (before any push to a S3 remote for instance).
Even worse, the bug wipes any batch of images from the workspace as long as they are included in the new dvc add : if you do a "dvc add images_folder" after having added 1000 images containing only two duplicates to a folder which is already tracked by dvc , the 1000 images will be deleted from the workspace and will not be added to the cache. If a lot of images are already present in the workspace, the DS may never notice that those new images have disappeared. If a production pipeline is run on a windows NAS with dvc add commands for ML experiments, some images could disappear silently.

Reproduce

The results of the two scenarios are the same, we added them for easier reproduction.

1 - Initialisation situation

  1. Get in an empty folder on the NAS server
  2. Put an "images" folder inside containing at least two image duplicates (and other images if you want)
  3. git init
  4. dvc init

The project should look like this :

PROJECT
    |
    ____ images/
    |
    ____ .dvc/
    |
    ____ .git/
  1. dvc add images/ -v

2 - Adding a new batch of images

  1. Put a batch of images containing at least two duplicates inside the project folder (it doesn't matter if the files are added to the existing "images" folder or in a new "images_2" folder).
    The project should look like this :
PROJECT
    |
    ____ images/
    |
    ____ .dvc/
    |
    ____ .git/

or

PROJECT
    |
    ____ images/
    |
    ____ images_2/
    |
    ____ .dvc/
    |
    ____ .git/
  1. dvc add images/ -v

Expected

The images should be moved to the cache without being removed from the workspace.

DVC should at least output an error when it fails to copy the files to the cache and not touch any of the original files.

Environment information

Any version of DVC and python running on Windows, on a NAS server.

2021-07-28 17:40:28,766 DEBUG: Version info for developers:
DVC version: 2.5.4 (pip)

Platform: Python 3.7.10 on Windows-10-10.0.17763-SP0
Supports:
        http (requests = 2.26.0),
        https (requests = 2.26.0)
Cache types:
Cache directory: ('unknown', 'none')
Caches: local
Remotes: None
Workspace directory: ('unknown', 'none')
Repo: dvc, git

Additional Information (if any):

Here are the logs (note : "accès refusé" means "access not permitted") :

2021-07-28 15:33:50,749 DEBUG: Removing 'random_images\test\img_with_labels_batch_0_img_2.jpg'
2021-07-28 15:33:50,801 DEBUG: state save (3148873755809107765, 1624811388452225792, 25167) 86054b29dbbdec89e748fe596058c843
2021-07-28 15:33:50,902 DEBUG: 'random_images\test\img_with_labels_batch_0_img_6.jpg' file already exists, skipping
2021-07-28 15:33:50,903 DEBUG: Removing 'random_images\test\img_with_labels_batch_0_img_6.jpg'
2021-07-28 15:33:50,934 DEBUG: state save (3150879666208347930, 1624811388499105280, 2259) d00fece80aab30a39148e4418ce4ca6a
2021-07-28 15:33:51,182 ERROR: unexpected error - [WinError -2147024891] Acc▒s refus▒


Traceback (most recent call last):
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\shutil.py", line 566, in move
    os.rename(src, real_dst)
PermissionError: [WinError 32] Le processus ne peut pas acc▒der au fichier car ce fichier est utilis▒ par un autre processus: 'N:\\Projets01\\STAGE\\test_dvc_isma_5\\.dvc\\cache\\d0\\0fece80aab30a39148e4418ce4ca6a.a3eEysHxDTQD2RF3DvqSdQ' -> 'N:\\Projets01\\STAGE\\test_dvc_isma_5\\.dvc\\cache\\d0\\0fece80aab30a39148e4418ce4ca6a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\command\base.py", line 50, in do_run
    return self.run()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\command\add.py", line 32, in run
    jobs=self.args.jobs,
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\repo\__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\repo\add.py", line 131, in add
    **kwargs,
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\repo\add.py", line 195, in _process_stages
    stage.commit()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\stage\__init__.py", line 492, in commit
    out.commit(filter_info=filter_info)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\output.py", line 567, in commit
    objects.save(self.odb, obj)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\objects\__init__.py", line 29, in save
    future.result()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\concurrent\futures\_base.py", line 428, in result
    return self.__get_result()
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\objects\db\base.py", line 59, in add
    self.fs.move(path_info, cache_info)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\dvc\utils\fs.py", line 110, in move
    shutil.move(tmp, dst)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\shutil.py", line 580, in move
    copy_function(src, real_dst)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\shutil.py", line 266, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "c:\users\elatifi\miniforge3\envs\py37-tf250\lib\site-packages\speedcopy\__init__.py", line 289, in copyfile
    '\\\\?\\' + dest_file, None)
  File "_ctypes/callproc.c", line 922, in GetResult
PermissionError: [WinError -2147024891] Acc▒s refus▒`
@efiop efiop added the P: windows Related to the Platform: Windows label Jul 28, 2021
@efiop
Copy link
Contributor

efiop commented Jul 28, 2021

Hi @louistransfer !

Could you try pip uninstall speedcopy(make sure that it is no longer present in the same env that dvc is installed) and try this again, please?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jul 28, 2021
@louistransfer
Copy link
Author

Hi @efiop !

We executed the command, then the command pip freeze | grep -i speedcopy is empty as expected on git bash on the environment but we still experience the bug.

The output has slightly changed however, as the WinErrors "Accès refusé" have turned into "[Errno 13] Permission denied":

2021-07-29 09:12:36,905 DEBUG: Check for update is enabled.
2021-07-29 09:12:36,984 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-07-29 09:12:37,140 DEBUG: Spawned '['daemon', '-q', 'updater']'
2021-07-29 09:12:37,905 DEBUG: Adding 'duplicate_images' to '.gitignore'.
2021-07-29 09:12:38,140 DEBUG: state save (3148487627879621958, 1624811388452225792, 25167) 86054b29dbbdec89e748fe596058c843
2021-07-29 09:12:38,140 DEBUG: state save (3149166022054064103, 1624811388468200704, 25167) 86054b29dbbdec89e748fe596058c843
2021-07-29 09:12:38,452 DEBUG: state save (3152817586669890752, 1627542758384071168, 186) 78f9ee65de388114d9519f727d353216.dir
2021-07-29 09:12:38,483 DEBUG: state save (3151620508966472395, 305f694047fd278c51c13d4cc0f4b798, 50334) 78f9ee65de388114d9519f727d353216.dir
2021-07-29 09:12:38,483 DEBUG: {'duplicate_images': 'modified'}
2021-07-29 09:12:38,811 DEBUG: Computed stage: 'duplicate_images.dvc' md5: 'None'
2021-07-29 09:12:39,171 DEBUG: state save (3149166022054064103, 1624811388468200704, 25167) 86054b29dbbdec89e748fe596058c843
2021-07-29 09:12:39,374 ERROR: unexpected error - [Errno 13] Permission denied: 'N:\\Projets01\\STAGE\\DE\\Louis-Anas\\tests-dvc\\tests-anas\\test_bug_no_speedcopy\\.dvc\\cache\\86\\054b29dbbdec89e748fe596058c843'

Traceback (most recent call last):
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\shutil.py", line 566, in move
    os.rename(src, real_dst)
PermissionError: [WinError 5] Acc▒s refus▒: 'N:\\Projets01\\STAGE\\DE\\Louis-Anas\\tests-dvc\\tests-anas\\test_bug_no_speedcopy\\.dvc\\cache\\86\\054b29dbbdec89e748fe596058c843.4TE6eucqdqRnCbPYUw4ZmT' -> 'N:\\Projets01\\STAGE\\DE\\Louis-Anas\\tests-dvc\\tests-anas\\test_bug_no_speedcopy\\.dvc\\cache\\86\\054b29dbbdec89e748fe596058c843'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\command\base.py", line 50, in do_run
    return self.run()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\command\add.py", line 32, in run
    jobs=self.args.jobs,
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\utils\collections.py", line 128, in inner
    result = func(*ba.args, **ba.kwargs)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\repo\__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\repo\add.py", line 187, in add
    stage.commit()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\stage\__init__.py", line 507, in commit
    out.commit(filter_info=filter_info)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\output.py", line 576, in commit
    objects.save(self.odb, obj)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\objects\__init__.py", line 39, in save
    future.result()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\concurrent\futures\_base.py", line 428, in result
    return self.__get_result()
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\objects\db\base.py", line 78, in add
    self.fs.move(path_info, cache_info)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\site-packages\dvc\utils\fs.py", line 114, in move
    shutil.move(tmp, dst)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\shutil.py", line 580, in move
    copy_function(src, real_dst)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\shutil.py", line 266, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "C:\Users\CHETOUA\Miniconda3\envs\env-py3.7\lib\shutil.py", line 121, in copyfile
    with open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: 'N:\\Projets01\\STAGE\\DE\\Louis-Anas\\tests-dvc\\tests-anas\\test_bug_no_speedcopy\\.dvc\\cache\\86\\054b29dbbdec89e748fe596058c843'

@efiop
Copy link
Contributor

efiop commented Jul 29, 2021

@louistransfer Looks like you don't have rights to write files in that location. Could you check if you can create files in N:\\Projets01\\STAGE\\DE\\Louis-Anas\\tests-dvc\\tests-anas\\test_bug_no_speedcopy\\?

@louistransfer
Copy link
Author

@eliop I managed to create a new txt file in the directory and I also managed to move image files inside so I have the rights to write files in that location. Maybe python uses different permissions ?

@cubrink
Copy link

cubrink commented Aug 3, 2021

I am also running into this problem. I can reproduce the error as described and I get the same verbose output as @louistransfer.

I've done some debugging and I'm thinking it is some sort of a race condition. While debugging I saw this error which wasn't shown in the verbose output:

[WinError 183] Cannot create a file when that file already exists: '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350.dKvQKmJbSYaXVe6gD9k2j6' -> '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350'
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\shutil.py", line 791, in move
    os.rename(src, real_dst)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\site-packages\dvc\utils\fs.py", line 114, in move
    shutil.move(tmp, dst)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\site-packages\dvc\objects\db\base.py", line 78, in add
    self.fs.move(path_info, cache_info)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\concurrent\futures\thread.py", line 80, in _worker
    work_item.run()
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\cbrinker\Anaconda3\envs\dvc-test\Lib\threading.py", line 890, in _bootstrap
    self._bootstrap_inner()

(Note that the file already existed as I had duplicate files that I was adding)

I saw in the verbose stack trace that dvc.objects.__init__.save() was being called, which uses concurrent.futures.ThreadPoolExecutor. I tried to limit to 1 thread using dvc add images/ -v --jobs 1, but apparently --jobs requires --to-remote to be used.

Instead I put a breakpoint inside of dvc.objects.__init__.save() and set jobs = 1 manually. Doing this prevented the files from being removed.

Environment

dvc doctor

DVC version: 2.5.4 (pip)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        http (requests = 2.25.1),
        https (requests = 2.25.1)
Cache types:
Cache directory: ('unknown', 'none')
Caches: local
Remotes: local, local, local, local, local
Workspace directory: ('unknown', 'none')
Repo: dvc, git

Also, I've ran pip list and I do not have speedcopy installed

@efiop
Copy link
Contributor

efiop commented Aug 3, 2021

@cubrink Are you also using nfs?

@efiop
Copy link
Contributor

efiop commented Aug 3, 2021

Regarding the deletion, dvc add moves stuff into cache instead of copying to make it as fast as possible, but the problem is that if some error arises during linking, we are not rolling back safely, so even though the data is not lost (it is in pieces though, in workspace and in cache dir), it is not intact in the workspace. We are planning to introduce a rollback mechanism in the near future. #6387

But it is more important to debug the errors you guys ran into, as that is the main problem that prevents dvc add from operating correctly.

@cubrink
Copy link

cubrink commented Aug 4, 2021

@efiop Just spoke to IT as I couldn't speak authoritatively about the network configuration. I was told accessing the NAS is via SMB protocol, not NFS

@efiop
Copy link
Contributor

efiop commented Aug 4, 2021

@cubrink Could you show us full log, please?

@cubrink
Copy link

cubrink commented Aug 4, 2021

@efiop, I ran some tests and have logged the output. If there are other logs that you need just let me know.

Setup

These tests are with 3 total files.

  • 2 files are duplicate
  • 1 unique file
images
├───img0-a.jpg
├───img0-b.jpg
└───img1.jpg

I ran the following tests, which vary by the state of the cache:

  1. The cache is empty
  2. The cache already contains the duplicate file
  3. There are files in the cache, but not for the duplicate files

1. Empty cache

Results

All images in images/ were removed

Output

dvc add images/ -v

2021-08-04 13:22:47,985 DEBUG: Check for update is enabled.
2021-08-04 13:22:48,071 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-08-04 13:22:48,118 DEBUG: Spawned '['daemon', '-q', 'updater']'
Adding...                                                                                                                            2021-08-04 13:22:48,454 DEBUG: state save (8262666151072389689, 1628101306255478528, 7563) b8cf30f4746145cefc80f3dce8e0aee9    ?md5/s]
2021-08-04 13:22:48,459 DEBUG: state save (3369592122099693323, 1628101306224222976, 367529) f65bcf2182da5af309d2b30c77f79350
2021-08-04 13:22:48,462 DEBUG: state save (641099020852704712, 1628101306208571136, 367529) f65bcf2182da5af309d2b30c77f79350
                                                                                                                                     2021-08-04 13:22:48,490 DEBUG: state save (9219199384940878093, 1628101368989038592, 205) 582ccb789a45fc816efc5f5ed47ada9e.dir
2021-08-04 13:22:48,492 DEBUG: state save (8189424156304719068, 7ed664205d3177de31678d30c57a25d7, 742621) 582ccb789a45fc816efc5f5ed47ada9e.dir
2021-08-04 13:22:48,495 DEBUG: {'images': 'modified'}
2021-08-04 13:22:48,506 DEBUG: Computed stage: 'images.dvc' md5: 'None'
                                                                                                                                     2021-08-04 13:22:48,554 DEBUG: state save (8262666151072389689, 1628101306255478528, 7563) b8cf30f4746145cefc80f3dce8e0aee9   ?file/s]
2021-08-04 13:22:48,558 DEBUG: state save (641099020852704712, 1628101306208571136, 367529) f65bcf2182da5af309d2b30c77f79350
Adding...
2021-08-04 13:22:48,590 ERROR: unexpected error - [Errno 13] Permission denied: '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 791, in move
    os.rename(src, real_dst)
PermissionError: [WinError 5] Access is denied: '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350.cJGKLGRMxxHxi39ZLzvE6R' -> '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\base.py", line 50, in do_run
    return self.run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\add.py", line 21, in run
    self.repo.add(
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\collections.py", line 128, in inner
    result = func(*ba.args, **ba.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\add.py", line 187, in add
    stage.commit()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\__init__.py", line 507, in commit
    out.commit(filter_info=filter_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\output.py", line 576, in commit
    objects.save(self.odb, obj)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\__init__.py", line 39, in save
    future.result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\db\base.py", line 78, in add
    self.fs.move(path_info, cache_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\fs.py", line 114, in move
    shutil.move(tmp, dst)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 811, in move
    copy_function(src, real_dst)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 435, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: '<path_to_my_dvc_project>\\.dvc\\cache\\f6\\5bcf2182da5af309d2b30c77f79350'
------------------------------------------------------------
2021-08-04 13:22:50,402 DEBUG: Version info for developers:
DVC version: 2.5.4 (pip)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        http (requests = 2.25.1),
        https (requests = 2.25.1)
Cache types:
Cache directory: ('unknown', 'none')
Caches: local
Remotes: local, local, local, local, local
Workspace directory: ('unknown', 'none')
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-08-04 13:22:50,415 DEBUG: Analytics is enabled.
2021-08-04 13:22:50,417 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmp0ra57hku']'
2021-08-04 13:22:50,471 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmp0ra57hku']'

2. Cache already contains duplicate file

Results

Only duplicate files in images/ were kept

Output

dvc add images/ -v

2021-08-04 13:27:45,325 DEBUG: Check for update is enabled.
2021-08-04 13:27:45,408 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-08-04 13:27:45,473 DEBUG: Spawned '['daemon', '-q', 'updater']'
Adding...                                                                                                                            2021-08-04 13:27:45,780 DEBUG: Adding 'images' to '.gitignore'.
                                                                                                                                     2021-08-04 13:27:45,823 DEBUG: state save (3134965523380195232, 1627999366736115456, 367529) f65bcf2182da5af309d2b30c77f79350  ?md5/s]
2021-08-04 13:27:45,826 DEBUG: state save (5810031232835155488, 1628000036176760576, 7563) b8cf30f4746145cefc80f3dce8e0aee9
2021-08-04 13:27:45,830 DEBUG: state save (8074608159519882170, 1627999366740106240, 367529) f65bcf2182da5af309d2b30c77f79350
                                                                                                                                     2021-08-04 13:27:45,858 DEBUG: state save (39950517512335132, 1628101666359935232, 274) 559e3d5b4ba59eb701d29bd45dd9bdef.dir
2021-08-04 13:27:45,861 DEBUG: state save (8189424156304719068, 103d3b5ea78f3d05f4a38a7dfce13db9, 757981) 559e3d5b4ba59eb701d29bd45dd9bdef.dir
2021-08-04 13:27:45,863 DEBUG: {'images': 'modified'}
2021-08-04 13:27:45,875 DEBUG: Computed stage: 'images.dvc' md5: 'None'
                                                                                                                                     2021-08-04 13:27:45,919 DEBUG: state save (5810031232835155488, 1628000036176760576, 7563) b8cf30f4746145cefc80f3dce8e0aee9
Adding...
2021-08-04 13:27:45,962 ERROR: unexpected error - [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 791, in move
    os.rename(src, real_dst)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db' -> '<path_to_my_dvc_project>\\.dvc\\cache\\39\\efca5cf287d61175024b42a2ca8527.2fUsc9jKPw8SkWgAoFsE26'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\base.py", line 50, in do_run
    return self.run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\add.py", line 21, in run
    self.repo.add(
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\collections.py", line 128, in inner
    result = func(*ba.args, **ba.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\add.py", line 187, in add
    stage.commit()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\__init__.py", line 507, in commit
    out.commit(filter_info=filter_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\output.py", line 576, in commit
    objects.save(self.odb, obj)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\__init__.py", line 39, in save
    future.result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\db\base.py", line 78, in add
    self.fs.move(path_info, cache_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\fs.py", line 112, in move
    shutil.move(src, tmp)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 812, in move
    os.unlink(src)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db'
------------------------------------------------------------
2021-08-04 13:27:46,204 DEBUG: Version info for developers:
DVC version: 2.5.4 (pip)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        http (requests = 2.25.1),
        https (requests = 2.25.1)
Cache types:
Cache directory: ('unknown', 'none')
Caches: local
Remotes: local, local, local, local, local
Workspace directory: ('unknown', 'none')
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-08-04 13:27:46,215 DEBUG: Analytics is enabled.
2021-08-04 13:27:46,218 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmpuw_myffg']'
2021-08-04 13:27:46,263 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmpuw_myffg']'

3. Cache contains only non-duplicate files

Results

Only duplicate files in images/ were removed

Output

dvc add images/ -v

2021-08-04 13:30:26,125 DEBUG: Check for update is enabled.
2021-08-04 13:30:26,213 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-08-04 13:30:26,259 DEBUG: Spawned '['daemon', '-q', 'updater']'
Adding...                                                                                                                            2021-08-04 13:30:26,564 DEBUG: Adding 'images' to '.gitignore'.
                                                                                                                                     2021-08-04 13:30:26,651 DEBUG: state save (232056546587901106, 1628000036176760576, 7563) b8cf30f4746145cefc80f3dce8e0aee9     ?md5/s]
                                                                                                                                     2021-08-04 13:30:26,680 DEBUG: state save (3860289204771833624, 1628101827168513536, 274) 559e3d5b4ba59eb701d29bd45dd9bdef.dir
2021-08-04 13:30:26,683 DEBUG: state save (8189424156304719068, 103d3b5ea78f3d05f4a38a7dfce13db9, 757981) 559e3d5b4ba59eb701d29bd45dd9bdef.dir
2021-08-04 13:30:26,685 DEBUG: {'images': 'modified'}
2021-08-04 13:30:26,697 DEBUG: Computed stage: 'images.dvc' md5: 'None'
                                                                                                                                     2021-08-04 13:30:26,743 DEBUG: state save (8074608159519882170, 1627999366740106240, 367529) f65bcf2182da5af309d2b30c77f79350
Adding...
2021-08-04 13:30:26,785 ERROR: unexpected error - [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 791, in move
    os.rename(src, real_dst)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db' -> '<path_to_my_dvc_project>\\.dvc\\cache\\39\\efca5cf287d61175024b42a2ca8527.iwJpd79FSwcKJkh9uHuVmH'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\base.py", line 50, in do_run
    return self.run()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\command\add.py", line 21, in run
    self.repo.add(
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\collections.py", line 128, in inner
    result = func(*ba.args, **ba.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\repo\add.py", line 187, in add
    stage.commit()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "C:\Users\cbrinker\AppData\Roaming\Python\Python38\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\stage\__init__.py", line 507, in commit
    out.commit(filter_info=filter_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\output.py", line 576, in commit
    objects.save(self.odb, obj)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\__init__.py", line 39, in save
    future.result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\objects\db\base.py", line 78, in add
    self.fs.move(path_info, cache_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\fs\local.py", line 97, in move
    move(from_info, to_info)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\site-packages\dvc\utils\fs.py", line 112, in move
    shutil.move(src, tmp)
  File "c:\users\cbrinker\anaconda3\envs\dvc-test\lib\shutil.py", line 812, in move
    os.unlink(src)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '<path_to_my_dvc_project>\\images\\Thumbs.db'
------------------------------------------------------------
2021-08-04 13:30:27,027 DEBUG: Version info for developers:
DVC version: 2.5.4 (pip)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        http (requests = 2.25.1),
        https (requests = 2.25.1)
Cache types:
Cache directory: ('unknown', 'none')
Caches: local
Remotes: local, local, local, local, local
Workspace directory: ('unknown', 'none')
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-08-04 13:30:27,038 DEBUG: Analytics is enabled.
2021-08-04 13:30:27,041 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmpctiymb6u']'
2021-08-04 13:30:27,088 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\cbrinker\\AppData\\Local\\Temp\\tmpctiymb6u']'

@efiop
Copy link
Contributor

efiop commented Sep 21, 2021

Hm, still not sure what's going on there and not able to reproduce 🙁 Closing as stale for now, we'll definitely improve rollback though in #6387

@efiop efiop closed this as completed Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) P: windows Related to the Platform: Windows
Projects
None yet
Development

No branches or pull requests

3 participants