Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Comparison with DVC #2

Closed
silverdna opened this issue Sep 3, 2018 · 2 comments
Closed

Comparison with DVC #2

silverdna opened this issue Sep 3, 2018 · 2 comments

Comments

@silverdna
Copy link

Hello!

First of all thank you for your contribution to the community! I’ve just found out about this and it seems to be a nice project that is growing!

You are probably familiar with dvc (https://github.com/iterative/dvc).

I’ve been investigating it in order to include it in my ML pipeline. Can you explain briefly how/if Lazydata differs from dvc? And any advantages and disadvantages? I understand that there may be some functionalities that maybe are not yet implemented purely due to time constraints or similar. I’m more interested in knowing if there are any differences in terms of paradigm.

Ps- if you have a different channel for these kind of questions please let me know.

Thank you very much!

@rstojnic
Copy link
Owner

rstojnic commented Sep 3, 2018

Hi Andre!

Sure, so at the moment the main differences are in a couple of design decisions:

  1. In DVC there is an extra .dvc file for every data file you have. In lazydata all file metadata is stored in a single file lazydata.yml.
  2. DVC uses the same basic paradigm as git-lfs - data files are tightly coupled to the repository, and after doing git pull you would normally do dvc pull that will pull all the data files. In lazydata, you normally download files "lazily", i.e. calling track() on a file that is tracked but missing from the local copy will download it. The idea is to enable you to simply track all of your data files without worrying about someone else downloading it, because they'll only ever download it if they need it.
  3. The main interface of lazydata is programatic (ie from within Python) and I'll continue developing it in that direction. The main interface of dvc and git-lfs is command-line.

Of course, DVC and git-lfs have more features at the moment, but I expect the differences in these design decisions will remain the same. Hope this answers the question!

@silverdna
Copy link
Author

Thank you very much for your reply @rstojnic. It makes sense, and your point 2., is very appealing.

I’ll make sure to give Lazydata a try soon :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants