Skip to content

Conversation

@pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Apr 13, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

push/gc behavior changes from #3600

  • status: (also affects push/pull) when checking .dir checksum exists on the remote, if the .dir checksum exists, presume that checksums for all files contained in that directory also exist on the remote
  • push: push .dir checksums after file checksums. If any file in the directory fails to be uploaded, do not upload the .dir file
  • gc: remove .dir checksums before file checksums

pmrowla added 4 commits April 13, 2020 17:04
- `used_cache()`/`get_used_cache()` in repo/stage/output now return
  tuples of (dir_cache, file_cache) instead of one flat/merged cache
- affects all commands which use `cache_exists()` (remote status)
@pmrowla pmrowla requested review from efiop, pared and skshetry April 13, 2020 11:17
)
logger.warning(msg)
return NamedCache()
return None, NamedCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to push dir into NamedCache? So that NamedCache could contain both dirs and files(e.g. `cache["local"].dirs)? Just an idea, not sure if it would be nicer looking in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedCache supports nesting cache items (for .dir checksums) after 02aaf42, the end result is a bit cleaner now

@efiop
Copy link
Contributor

efiop commented Apr 14, 2020

@pmrowla Please check the tests, travis is failing.

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 14, 2020

@efiop test issue is resolved

@efiop
Copy link
Contributor

efiop commented Apr 14, 2020

@pmrowla Check DS please too, e.g. this https://deepsource.io/gh/iterative/dvc/run/b2b2c666-0af4-4fb4-aed2-0210d2946dfc/python/PYL-R1704/ doesn't look good.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor test mocker decision to be made.

@efiop efiop requested review from pared and skshetry April 14, 2020 12:36
Co-Authored-By: Saugat Pachhai <suagatchhetri@outlook.com>
@pmrowla pmrowla self-assigned this Apr 14, 2020
@pmrowla pmrowla added the performance improvement over resource / time consuming tasks label Apr 14, 2020
@efiop efiop merged commit 7275843 into treeverse:master Apr 14, 2020
@pmrowla pmrowla deleted the push-gc-dir branch April 15, 2020 04:29
)
return dict(dir_status, **file_status)

def _status(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit late to the party :) excellent PR overall. will mention a few things that I noticed reviewing that will might help later somehow.

this function definitely wants some split I would say... extract presentation logic out for example, and the that checks dir and excludes it? or may be some other refactoring ... but it's too long and complicated now

overall "internal-client" functions should be easy to read, even if it feels that extraction does not make much sense (e.g. helpers that are used only in a single place) think how will developers read this. It's easier to read story-like main function and go into details if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance improvement over resource / time consuming tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants