diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json index 7138e05c6c..6a7caa903c 100644 --- a/src/Documentation/sidebar.json +++ b/src/Documentation/sidebar.json @@ -119,6 +119,33 @@ "label": "Managing External Data", "slug": "managing-external-data" }, + { + "label": "Data Sharing", + "slug": "data-sharing", + "source": "data-sharing/index.md", + "children": [ + { + "label": "Remote DVC Storage", + "slug": "remote-storage" + }, + { + "label": "Shared Development Server", + "slug": "shared-server" + }, + { + "label": "Mounted DVC Storage", + "slug": "mounted-storage" + }, + { + "label": "Mounted DVC Cache", + "slug": "mounted-cache" + }, + { + "label": "Synced DVC Storage", + "slug": "synced-storage" + } + ] + }, { "label": "Contributing", "slug": "contributing", diff --git a/static/docs/user-guide/data-sharing/index.md b/static/docs/user-guide/data-sharing/index.md new file mode 100644 index 0000000000..56346d4769 --- /dev/null +++ b/static/docs/user-guide/data-sharing/index.md @@ -0,0 +1,52 @@ +# Data Sharing and Collaboration with DVC + +Like Git, DVC facilitates collaboration and data sharing on a distributed +environment. It makes it easy to consistently get all your data files and +directories to any machine, along with the source code. + +![](/static/img/model-sharing-digram.png) + +There are several ways to setup data sharing with DVC. We will discuss the most +common scenarios. + +- [Sharing Data Through a Remote DVC Storage](/doc/user-guide/data-sharing/remote-storage) + + This is the recommended and the most common case of data sharing. In this case + we setup a [remote storage](/doc/command-reference/remote) on a data storage + provider, to store data files online, where others can reach them. Currently + DVC supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, + SSH, HDFS, and other remote locations, and the list is constantly growing. + +- [Using Local Storage on a Shared Development Server](/doc/user-guide/data-sharing/shared-server) + + Some teams may prefer to use a single shared machine for running their + experiments. This allows them to have better resource utilization such as the + ability to use multiple GPUs, etc. In this case we can use a local data + storage, which allows the team to store and share data very efficiently, with + no duplication of data files and instantaneous transfer time. + +- [Sharing Data Through a Mounted DVC Storage](/doc/user-guide/data-sharing/mounted-storage) + + If the data storage server (or provider) has a protocol that is not supported + yet by DVC, but it allows us to mount a remote directory on the local + filesystem, then we can still make a setup for data sharing with DVC. This + case might be useful for example when the data files are located on a + network-attached storage (NAS) and can be accessed through protocols like NFS, + Samba, SSHFS, etc. + +- [Sharing Data Through a Mounted DVC Cache](/doc/user-guide/data-sharing/mounted-cache) + + This case is similar to the Mounted DVC Storage (mentioned above), but instead + of mounting the DVC storage from the server, we can directly mount the cache + directory (`.dvc/cache/`). If all the users do this, then effectively they + will be using the same cache directory (which is mounted from the NAS server). + So, if one of them adds something to the cache, it will appear automatically + to the cache of all the others. + +- [Sharing Data Through a Synchronized DVC Storage](/doc/user-guide/data-sharing/synched-storage) + + There are cloud data storage providers that are not supported yet by DVC. But + this does not mean that we cannot use them to share data with the help of DVC. + If it is possible to synchronize a local directory with a remote one (which is + supported by almost all storage providers), then we can still make a setup + that allows us to share DVC data. diff --git a/static/docs/user-guide/data-sharing/mounted-cache.md b/static/docs/user-guide/data-sharing/mounted-cache.md new file mode 100644 index 0000000000..f12afda41e --- /dev/null +++ b/static/docs/user-guide/data-sharing/mounted-cache.md @@ -0,0 +1,144 @@ +# Sharing Data Through a Mounted Cache + +We have seen already how to share data through a +[mounted DVC storage](/doc/user-guide/data-sharing/mounted-storage). In that +case we have a copy of the data on the DVC storage and at least one copy on each +user project, since deduplication does not work across filesystems. + +However the data management can be further optimized if we use a shared cache. +The idea is that instead of mounting the DVC storage from the server, we can +directly mount the cache directory (`.dvc/cache/`). If all the users do this, +then effectively they will be using the same cache directory (which is mounted +from the NAS server). So, if one of them adds something to the cache, it will +appear automatically to the cache of all the others. As a result, no `dvc push` +and `dvc pull` are needed to share the data, just a `dvc checkout` will be +sufficient. + +> ** ❗ Caution:** Deleting data from the cache will also make it disappear from +> the cache of the other users. So, be careful with the command `dvc gc` (which +> cleans obsolete data from the cache) and consult the other users of the +> project before using this command. + +The optimization in data management comes from using the _symlink_ cache type. +You can find more details about it in the page of +[Large Dataset Optimization](https://dvc.org/doc/user-guide/large-dataset-optimization). + +## Mounted Cache Example + +In this example we will see how to share data with the help of a cache directory +that is mounted through SSHFS. We are using a SSHFS example because it is easy +to network-mount a directory with SSHFS. However once you understand how it +works, it should be easy to implement it for other types of network-mounted +storages (like NFS, Samba, etc.). + +> For more detailed instructions check out this +> [interactive example](https://katacoda.com/dvc/courses/examples/mounted-cache). + +

+ +

+ +
+ +### Prerequisites: Setup the server + +We have to do these configurations on the SSH server: + +- Create accounts for each user and add them to groups for accessing the Git + repository and the DVC storage. +- Create a bare git repository (for example on `/srv/project.git/`) and an empty + directory for the DVC cache (for example on `/srv/project.cache/`). +- Grant users read/write access to these directories (through the groups). + +
+ +
+ +### Setup each user + +When we have to access a SSH server, we definitely want to generate ssh key +pairs and setup the SSH config so that we can access the server without a +password. + +Let's assume that for each user we can use the private ssh key +`~/.ssh/dvc-server` to access the server without a password, and we have also +added on `~/.ssh/config` lines like these: + +``` +Host dvc-server + HostName host01 + User user1 + IdentityFile ~/.ssh/dvc-server + IdentitiesOnly yes +``` + +Here `dvc-server` is the name or alias that we can use for our server, `host01` +can actually be the IP or the FQDN of the server, and `user1` is the username of +the first user on the server. + +
+ +### Mount the DVC cache + +With SSHFS (and the SSH configuration on the section above), we can mount the +remote directory to `.dvc/cache/` of the project like this: + +```dvc +$ mkdir -p ~/project/.dvc/cache +$ sshfs \ + dvc-server:/srv/project.cache/ \ + ~/project/.dvc/cache/ +``` + +### Optimize data management + +Since the cache directory is located on a mounted filesystem, we cannot use the +_reflink_ optimization for data management. However we can use _symlinks_ (which +work across the filesystems): + +```dvc +$ dvc config cache.type 'reflink,symlink,hardlink,copy' +$ dvc config cache.protected true +``` + +The configuration file `.dvc/config` should look like this: + +```ini +[cache] +type = "reflink,symlink,hardlink,copy" +protected = true +``` + +This configuration is the same for all the users, so we can add it to Git in +order to share it with the other users: + +```dvc +$ git add .dvc/config +$ git commit -m "Use symlinks if reflinks are not available" +$ git push +``` + +### Sharing data + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created, the data is stored in `.dvc/cache/`, and it is linked (with symlink) +from the workspace. + +We can share the DVC-files with: + +```dvc +$ git push +``` + +In order to receive the changes, the other users should do: + +```dvc +$ git pull +$ dvc checkout +``` + +Notice that there is no need to use `dvc push` and `dvc pull` for sharing the +data, because it is like all the collaborating users are using the same +directory for the DVC cache. As soon as one of them saves a file in cache, it is +immediately available for `dvc checkout` to all the others. All they need to do +is to synchronize their DVC-files (with `git push` and `git pull`). diff --git a/static/docs/user-guide/data-sharing/mounted-storage.md b/static/docs/user-guide/data-sharing/mounted-storage.md new file mode 100644 index 0000000000..1e9901b1f4 --- /dev/null +++ b/static/docs/user-guide/data-sharing/mounted-storage.md @@ -0,0 +1,149 @@ +# Sharing Data Through a Mounted DVC Storage + +If the data storage server (or provider) has a protocol that is not supported +yet by DVC, but it allows us to mount a remote directory on the local +filesystem, then we can still make a setup for data sharing with DVC. + +> This case might be useful when the data files are located on a +> network-attached storage (NAS) for example, and can be accessed through +> protocols like NFS, Samba, SSHFS, etc. + +The solution is very similar to that of a +[Shared Development Server](/doc/user-guide/data-sharing/shared-server), using a +local DVC storage, which is actually located on the mounted directory. Whenever +we push data to our mounted storage, it is made available immediately to the +mounted storage of each user. So, the data sharing workflow is the normal one, +with `dvc push` and `dvc pull`. + +> Different from the case of Shared Development Server, the local DVC storage +> and the project cannot be on the same filesystem (because the DVC storage is +> on a mounted remote directory). So, the deduplication optimization does not +> work that well. We have a copy of the data on the DVC storage, and at least +> one copy on each user project. + +## Mounted Storage Example + +In this example we will see how to share data with the help of a storage +directory that is mounted through SSHFS. + +> Normally we don't need to do this, since we can +> [use a SSH remote storage](https://katacoda.com/dvc/courses/examples/ssh-storage) +> directly. But we are using it just as an example, since it is easy to +> network-mount a directory with SSHFS. Once you understand how it works, it +> should be easy to implement it for other types of mounted storages (like NFS, +> Samba, etc.). + +

+ +

+ +> For more detailed instructions check out this +> [interactive example](https://katacoda.com/dvc/courses/examples/mounted-storage). + +
+ +### Prerequisite: Setup the server + +We have to do these configurations on the SSH server: + +- Create accounts for each user and add them to groups for accessing the Git + repository and the DVC storage. +- Create a bare git repository (for example on `/srv/project.git/`) and an empty + directory for the DVC storage (for example on `/srv/project.cache/`). +- Grant users read/write access to these directories (through the groups). + +
+ +
+ +### Prerequisite: Setup each user + +When we have to access a SSH server, we definitely want to generate ssh key +pairs and setup the SSH config so that we can access the server without a +password. + +Let's assume that for each user we can use the private ssh key +`~/.ssh/dvc-server` to access the server without a password, and we have also +added on `~/.ssh/config` lines like these: + +``` +Host dvc-server + HostName host01 + User user1 + IdentityFile ~/.ssh/dvc-server + IdentitiesOnly yes +``` + +Here `dvc-server` is the name or alias that we can use for our server, `host01` +can actually be the IP or the FQDN of the server, and `user1` is the username of +the first user on the server. + +
+ +
+ +### Prerequisite: Mount the remote storage directory + +With SSHFS (and the SSH configuration on the section above) we can mount the +remote directory on the server to a local one (let's say `$HOME/project.cache`), +like this: + +```dvc +$ mkdir -p $HOME/project.cache +$ sshfs \ + dvc-server:/srv/project.cache \ + $HOME/project.cache +``` + +
+ +### Set the DVC storage + +We can setup the project to use `$HOME/project.cache` as +[local DVC storage](/doc/user-guide/external-data/local#local-dvc-storage), by +adding a _default remote_ like this: + +```dvc +$ dvc remote add --local --default \ + mounted-storage $HOME/project.cache + +$ dvc remote list --local +mounted-storage /home/username/project.cache +``` + +Note that this configuration is specific for each user, so we have used the +`--local` option in order to save it on `.dvc/config.local`, which is ignored by +Git. + +Now this configuration file should have a content like this: + +``` +['remote "mounted-storage"'] +url = /home/username/project.cache +[core] +remote = mounted-storage +``` + +### Sharing data + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created and the data is stored in `.dvc/cache/`. We can upload DVC-files to the +Git server with `git push`, and upload the cached files to the DVC storage with +`dvc push`: + +```dvc +$ git push +$ dvc push +``` + +The command `dvc push` copies the cached files from `.dvc/cache/` to +`$HOME/project.cache/`. However, since this is a mounted directory, the cached +files are immediately copied to the server as well, and they become available on +the mounted directories of the other users. + +The other users can receive the DVC-files and the cached data like this: + +```dvc +$ git pull +$ dvc pull +``` diff --git a/static/docs/user-guide/data-sharing/remote-storage.md b/static/docs/user-guide/data-sharing/remote-storage.md new file mode 100644 index 0000000000..b409c27237 --- /dev/null +++ b/static/docs/user-guide/data-sharing/remote-storage.md @@ -0,0 +1,194 @@ +# Sharing Data Through a Remote DVC Storage + +We can setup a _default_ [remote storage](/doc/user-guide/external-data) to a +data storage provider, where we can upload the cached data files, so that the +other users can access them. + +> Currently DVC supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob +> Storage, SSH, HDFS, and other remote storage types/providers, and the list is +> constantly growing. + +We can share data using `git push` (to upload DVC-files) and `dvc push` (to +upload cached data files). The other users can use `git pull` followed by +`dvc pull` to receive them. + +> This is the recommended and the most common case of data sharing. + +## Example: S3 Remote Storage + +As an example, let's take a look at how we could setup an +[Amazon S3 remote storage](/doc/user-guide/external-data/amazon) for a DVC +project, and share data through it. + +
+ +### Prerequisite: Create first an S3 bucket + +If you don't already have one available in your S3 account, follow instructions +in +[Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). +As an advanced alternative, you may use the +[`aws s3 mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) +command instead. + +
+ +### Set the DVC storage + +To setup an S3 DVC storage we need to create a _default_ remote like this: + +```dvc +$ dvc remote add --default s3storage s3://mybucket/myproject +Setting 's3storage' as a default remote. + +$ dvc remote list +s3storage s3://mybucket/myproject +``` + +This command will add to `.dvc/config` some lines like these: + +```dvc +['remote "s3storage"'] +url = s3://mybucket/myproject +[core] +remote = s3storage +``` + +This configuration is the same for all the users, so let's commit it to Git: + +```dvc +$ git add .dvc/config +$ git commit -m 'Setup S3 storage' +$ git push +``` + +### Sharing data + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created and the data is stored in `.dvc/cache/`. We can upload DVC-files to the +Git server with `git push`, and upload the cached files to the remote storage +with `dvc push`: + +```dvc +$ git push +$ dvc push +``` + +The other users can receive the DVC-files and the cached data files like this: + +```dvc +$ git pull +$ dvc pull +``` + +## Example: SSH Remote Storage + +As an other example, let's see how to setup an +[SSH remote storage](/doc/user-guide/external-data/ssh) for a project and share +data through it. + +> For more detailed instructions check out this +> [interactive example](https://katacoda.com/dvc/courses/examples/ssh-storage). + +In this example we will assume a central data storage server that can be +accessed through SSH from two different users. + +> For the sake of example the central Git repository will be located in this +> server too, but in general it can be anywhere, it doesn't have to be on the +> same server with the DVC data storage. + +

+ +

+ +
+ +### Prequisite: Setup the SSH server + +Usually we need to do these configurations on a SSH server: + +- Create accounts for each user and add them to groups for accessing the Git + repository and the DVC storage. +- Create a bare git repository (for example on `/srv/project.git/`) and an empty + directory for the DVC storage (for example on `/srv/project.cache/`). +- Grant users read/write access to these directories (through the groups). + +
+ +
+ +### Prerequisite: Setup each user + +When we have to access a SSH server, we definitely want to generate ssh key +pairs and setup the SSH config so that we can access the server without a +password. + +Let's assume that for each user we can use the private ssh key +`~/.ssh/dvc-server` to access the server without a password, and we have also +added on `~/.ssh/config` lines like these: + +``` +Host dvc-server + HostName host01 + User user1 + IdentityFile ~/.ssh/dvc-server + IdentitiesOnly yes +``` + +Here `dvc-server` is the name or alias that we can use for our server, `host01` +can actually be the IP or the FQDN of the server, and `user1` is the username of +the first user on the server. + +
+ +### Set the DVC storage + +We can setup the project to use the +[SSH remote storage](/doc/user-guide/external-data/ssh) by adding a _default +remote_ like this: + +```dvc +$ dvc remote add --default \ + ssh-storage ssh://dvc-server:/srv/project.cache +Setting 'ssh-storage' as a default remote. + +$ dvc remote list +ssh-storage ssh://dvc-server:/srv/project.cache +``` + +The configuration file `.dvc/config` now should look like this: + +``` +['remote "ssh-storage"'] +url = ssh://dvc-server:/srv/project.cache +[core] +remote = ssh-storage +``` + +This configuration is the same for all the users, so we can add it to Git in +order to share it with the other users: + +```dvc +$ git add .dvc/config +$ git commit -m 'Add a SSH remote storage' +$ git push +``` + +### Sharing data + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created and the data is stored in `.dvc/cache/`. We can upload DVC-files to the +Git server with `git push`, and upload the cached files to the remote storage +with `dvc push`: + +```dvc +$ git push +$ dvc push +``` + +The other users can receive the DVC-files and the cached data like this: + +```dvc +$ git pull +$ dvc pull +``` diff --git a/static/docs/user-guide/data-sharing/shared-server.md b/static/docs/user-guide/data-sharing/shared-server.md new file mode 100644 index 0000000000..c6a01669f4 --- /dev/null +++ b/static/docs/user-guide/data-sharing/shared-server.md @@ -0,0 +1,125 @@ +# Local Storage on a Shared Development Server + +Some teams may prefer to use a single shared machine for running their +experiments. This allows them to have better resource utilization such as the +ability to use multiple GPUs, etc. + +With DVC, we can easily setup a +[local data storage](/doc/user-guide/external-data/local#local-dvc-storage) on +the shared server. To share data we use the normal DVC workflow of `dvc push` +(for sending cached data to the local DVC storage), and `dvc pull` (for +retrieving them from the DVC storage). + +> For having the best performance on this workflow we should make sure that the +> (local) DVC storage and all the user projects are located on the same +> deduplicating filesystem. In this case DVC would automatically use _reflink +> copy_ and this would ensure a minimal disk space usage and an instantaneous +> data transfer speed. + +## Shared Server Example + +Let's see an example of how two different users on the same host can share data +with the help of a +[local DVC storage](/doc/user-guide/external-data/local#local-dvc-storage). So, +both of the users and the data storage are located on the same machine and no +remote server or storage is involved. + +> For more detailed instructions check out this +> [interactive example](https://katacoda.com/dvc/courses/examples/shared-server). + +

+ +

+ +
+ +### Prerequisite: Setup the server + +We need to do these configurations on the server: + +- Create accounts for each user and add them to groups for accessing the Git + repository and the DVC storage. +- Create a bare git repository (for example on `/var/local/data/project.git/`) + and an empty directory for the DVC storage (for example on + `/var/local/data/project.cache/`). +- Grant users read/write access to these directories (through the groups). + +
+ +### Set the DVC storage + +We can setup the project to use the +[local DVC storage](/doc/user-guide/external-data/local#local-dvc-storage) by +adding a _default remote_, like this: + +```dvc +$ export DATA=/var/local/data +$ dvc remote add --default local-storage $DATA/project.cache +Setting 'local-storage' as a default remote. + +$ dvc remote list +local-storage /var/local/data/project.cache +``` + +The configuration file `.dvc/config` now should look like this: + +``` +['remote "local-storage"'] +url = /var/local/data/project.cache +[core] +remote = local-storage +``` + +We can add it to Git and commit, since it is the same for all the users: + +```dvc +$ git add .dvc/config +$ git commit -m "Setup local DVC storage" +$ git push +``` + +### Sharing data + +Data sharing among the different users is done the normal way, with `dvc push` +and `dvc pull`, except that in this case it is the _local DVC storage_ that is +acting as an intermediary between the users, instead of a remote one. + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created and the data is stored in `.dvc/cache/`. We can upload DVC-files to the +Git server with `git push`, and upload the cached files to the DVC storage with +`dvc push`: + +```dvc +$ git push +$ dvc push +``` + +The other users can receive the DVC-files and the cached data like this: + +```dvc +$ git pull +$ dvc pull +``` + +
+ +### Data sharing optimizations + +If all the user projects and the local DVC storage are located on the same +_deduplicating_ filesystem, then everything is fine, copying data around will be +done instantly and without increasing the disk usage. + +If they are not on the same filesystem, or if the filesystem does not support +deduplication of data, then some optimizations are needed to make things +efficient. These optimizations may include: + +1. Creating, formatting and mounting a deduplicating filesystem (like XFS, + Btrfs, etc.) +2. Locating the DVC storage and all the user projects on this filesystem. +3. Adding symbolic links from the home directories of the users to their + projects (which are located on the optimized filesystem). + +For more detailed instructions check out this +[interactive example](https://katacoda.com/dvc/courses/examples/shared-server). + +
diff --git a/static/docs/user-guide/data-sharing/synced-storage.md b/static/docs/user-guide/data-sharing/synced-storage.md new file mode 100644 index 0000000000..e880553c21 --- /dev/null +++ b/static/docs/user-guide/data-sharing/synced-storage.md @@ -0,0 +1,198 @@ +# Sharing Data Through a Synced DVC Storage + +There are cloud data storage providers that are not supported yet by DVC (for +example look at the ones supported by [rclone](https://rclone.org/)). But this +does not mean that we cannot use them to share data with the help of DVC. If it +is possible to synchronize a local directory with a remote one (which is +supported by almost all storage providers), then we are good to go. We can still +make a setup that allows us to share DVC data. + +This setup is similar to that of a mounted storage, except that the +synchronization of the data does not happen transparently. We first make a +`dvc push` to send data to the local DVC storage, then synchronize the local DVC +storage with the central one. To receive the data we should first synchronize +the central DVC storage with the local one, then we can make a `dvc pull` to get +it from the local DVC storage to the project. + +## Synced Storage Example + +In this example we will see how to achieve this with the help of a SSH storage +and `rsync`. + +> SSH is one the storage types that is already supported by DVC, and normally we +> don't need to do this. But we are using it just as an example, since SSH is +> easy to be used for synchronizing with a remote directory. Once you understand +> how it works, it should be easy to implement it for other storage types. + +

+ +

+ +> For more detailed instructions check out this +> [interactive example](https://katacoda.com/dvc/courses/examples/synced-storage). + +
+ +### Prerequisite: Setup the server + +We have to do these configurations on the SSH server: + +- Create accounts for each user and add them to groups for accessing the Git + repository and the DVC storage. +- Create a bare git repository (for example on `/srv/project.git/`) and an empty + directory for the DVC storage (for example on `/srv/project.cache/`). +- Grant users read/write access to these directories (through the groups). + +
+ +
+ +### Prerequisite: Setup each user + +When we have to access a SSH server, we definitely want to generate ssh key +pairs and setup the SSH config so that we can access the server without a +password. + +Let's assume that for each user we can use the private ssh key +`~/.ssh/dvc-server` to access the server without a password, and we have also +added on `~/.ssh/config` lines like these: + +``` +Host dvc-server + HostName host01 + User user1 + IdentityFile ~/.ssh/dvc-server + IdentitiesOnly yes +``` + +Here `dvc-server` is the name or alias that we can use for our server, `host01` +can actually be the IP or the FQDN of the server, and `user1` is the username of +the first user on the server. + +
+ +### Set the DVC storage + +We will use a local directory as the default storage of the project, like this: + +```dvc +$ mkdir -p $HOME/project.cache +$ dvc remote add --local --default \ + synced-storage $HOME/project.cache + +$ dvc remote list --local +synced-storage /home/username/project.cache +``` + +Note that this configuration is specific for each user, so we have used the +`--local` option in order to save it on `.dvc/config.local`, which is ignored by +Git. Now this configuration file should have a content like this: + +``` +['remote "synced-storage"'] +url = /home/username/project.cache +[core] +remote = synced-storage +``` + +### Sharing data + +When we add data to the project with `dvc add` or `dvc run`, some DVC-files are +created and the data is stored in `.dvc/cache/`. We can upload DVC-files to the +Git server with `git push`, and upload the cached files to the local DVC storage +with `dvc push`: + +```dvc +$ git push +$ dvc push +``` + +The command `dvc push` copies the cached files from `.dvc/cache/` to +`$HOME/project.cache/`. In order to send the data to the server we also have to +synchronize the local DVC storage with the remote one. With `rsync` (and with +the help of SSH configurations that we have done on the previous sections) it +can be as simple as this: + +```dvc +$ rsync -r -P \ + $HOME/project.cache/ \ + dvc-server:/srv/project.cache/ +``` + +To get the cached files on their local DVC storage, the other users have to +synchronize first with a command like this: + +```dvc +$ rsync -r -P \ + dvc-server:/srv/project.cache/ \ + $HOME/project.cache/ +``` + +Then they can receive the DVC-files and the cached data like this: + +```dvc +$ git pull +$ dvc pull +``` + +
+ +### Optimization: Deduplicate the storage + +For each file that is cached, there is a copy on the workspace, a copy on +`.dvc/cache/`, and another copy on `$HOME/project.cache/` (besides the copy on +the remote storage). + +If you have a deduplicating filesystem (like XFS, Btrfs, etc.) then everything +is fine because making copies of the same file does not actually increase the +disk usage. If not, then you can create and mount by loopback a deduplicating +filesystem, and move the project and caches there. + +For more detailed instructions check out the +[interactive example](https://katacoda.com/dvc/courses/examples/synced-storage). + +
+ +
+ +### Optimization: Automate synchronization steps + +Notice that whenever we run `dvc push` we also have to run `rsync`, and before a +`dvc pull` we also have to run `rsync`. This can be automated and simplified by +defining aliases or functions on `~/.bashrc`, which might look like these: + +```dvc +push() { + set -x + git push + dvc push + rsync -rP ~/project.cache/ dvc-server:/srv/project.cache/ + set +x +} + +pull() { + set -x + git pull + rsync -rP dvc-server:/srv/project.cache/ ~/project.cache/ + dvc pull + set +x +} +``` + +Then, to share code changes and data you just run: + +```dvc +$ push +``` + +And to receive code changes and data you just run: + +```dvc +$ pull +``` + +Another way to make the synchronization transparent to the users is to setup +cron jobs that synchronize periodically the local DVC storage with the central +one. + +
diff --git a/static/img/user-guide/data-sharing/mounted-cache.png b/static/img/user-guide/data-sharing/mounted-cache.png new file mode 100644 index 0000000000..a67db9778c Binary files /dev/null and b/static/img/user-guide/data-sharing/mounted-cache.png differ diff --git a/static/img/user-guide/data-sharing/mounted-cache.uxf b/static/img/user-guide/data-sharing/mounted-cache.uxf new file mode 100644 index 0000000000..2d2d3bbb28 --- /dev/null +++ b/static/img/user-guide/data-sharing/mounted-cache.uxf @@ -0,0 +1,225 @@ + + + // Uncomment the following line to change the fontsize and font: +// fontsize=14 +fontfamily=Monospaced //possible: SansSerif,Serif,Monospaced + +////////////////////////////////////////////////////////////////////////////////////////////// +// Welcome to UMLet! +// +// Double-click on elements to add them to the diagram, or to copy them +// Edit elements by modifying the text in this panel +// Hold Ctrl to select multiple elements +// Use Ctrl+mouse to select via lasso +// +// Use +/- or Ctrl+mouse wheel to zoom +// Drag a whole relation at its central square icon +// +// Press Ctrl+C to copy the whole diagram to the system clipboard (then just paste it to, eg, Word) +// Edit the files in the "palettes" directory to create your own element palettes +// +// Select "Custom Elements > New..." to create new element types +////////////////////////////////////////////////////////////////////////////////////////////// + + +// This text will be stored with each diagram; use it for notes. + 10 + + UMLClass + + 170 + 270 + 200 + 30 + + halign=left +* /srv/project.cache/ * +bg=#9802f5 +lw=0 + + + + UMLClass + + 180 + 50 + 180 + 30 + + halign=left +*/srv/project.git/ * +bg=#fc5e03 +lw=0 + + + + Relation + + 270 + 70 + 70 + 130 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 10.0;10.0;10.0;110.0;50.0;110.0 + + + Relation + + 210 + 70 + 70 + 130 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 50.0;10.0;50.0;110.0;10.0;110.0 + + + UMLClass + + 300 + 140 + 190 + 110 + + halign=left +*user2/project/* +lt=.. + + + + UMLClass + + 320 + 210 + 130 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 320 + 170 + 130 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLUseCase + + 150 + 30 + 240 + 70 + + lt=.. + + + + + UMLClass + + 290 + 110 + 210 + 150 + + *host2* +halign=left + + + + UMLClass + + 50 + 140 + 190 + 110 + + halign=left +* user1/project/* +lt=.. + + + + UMLClass + + 90 + 210 + 130 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 90 + 170 + 130 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 40 + 110 + 210 + 150 + + *host1* +halign=left + + + + Relation + + 190 + 230 + 30 + 60 + + lt=.. +lw=20 +fg=#9802f5 + 10.0;10.0;10.0;40.0 + + + Relation + + 330 + 230 + 30 + 60 + + lt=.. +lw=20 +fg=#9802f5 + 10.0;40.0;10.0;10.0 + + diff --git a/static/img/user-guide/data-sharing/mounted-storage.png b/static/img/user-guide/data-sharing/mounted-storage.png new file mode 100644 index 0000000000..27f69591d4 Binary files /dev/null and b/static/img/user-guide/data-sharing/mounted-storage.png differ diff --git a/static/img/user-guide/data-sharing/mounted-storage.uxf b/static/img/user-guide/data-sharing/mounted-storage.uxf new file mode 100644 index 0000000000..bac85c30e8 --- /dev/null +++ b/static/img/user-guide/data-sharing/mounted-storage.uxf @@ -0,0 +1,279 @@ + + + // Uncomment the following line to change the fontsize and font: +// fontsize=14 +fontfamily=Monospaced //possible: SansSerif,Serif,Monospaced + +////////////////////////////////////////////////////////////////////////////////////////////// +// Welcome to UMLet! +// +// Double-click on elements to add them to the diagram, or to copy them +// Edit elements by modifying the text in this panel +// Hold Ctrl to select multiple elements +// Use Ctrl+mouse to select via lasso +// +// Use +/- or Ctrl+mouse wheel to zoom +// Drag a whole relation at its central square icon +// +// Press Ctrl+C to copy the whole diagram to the system clipboard (then just paste it to, eg, Word) +// Edit the files in the "palettes" directory to create your own element palettes +// +// Select "Custom Elements > New..." to create new element types +////////////////////////////////////////////////////////////////////////////////////////////// + + +// This text will be stored with each diagram; use it for notes. + 10 + + UMLClass + + 170 + 70 + 180 + 30 + + halign=left +*/srv/project.cache/ * +bg=#9802f5 +lw=0 + + + + UMLClass + + 170 + 110 + 180 + 30 + + halign=left +*/srv/project.git/ * +bg=#fc5e03 +lw=0 + + + + Relation + + 260 + 130 + 70 + 170 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 10.0;10.0;10.0;150.0;50.0;150.0 + + + Relation + + 200 + 130 + 70 + 170 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 50.0;10.0;50.0;150.0;10.0;150.0 + + + Relation + + 430 + 220 + 60 + 120 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 40.0;10.0;40.0;100.0;10.0;100.0 + + + Relation + + 120 + 80 + 70 + 140 + + lt=<<<<<->>>>> +lw=4 +fg=#9802f5 + 50.0;10.0;10.0;10.0;10.0;120.0 + + + UMLClass + + 290 + 240 + 190 + 110 + + halign=left +*user2/project/* +lt=.. + + + + UMLClass + + 310 + 310 + 130 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 310 + 270 + 130 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLUseCase + + 140 + 50 + 240 + 110 + + lt=.. + + + + + UMLClass + + 280 + 170 + 210 + 190 + + *host2* +halign=left + + + + UMLClass + + 40 + 240 + 190 + 110 + + halign=left +* user1/project/* +lt=.. + + + + UMLClass + + 80 + 310 + 130 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 80 + 270 + 130 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 30 + 170 + 210 + 190 + + *host1* +halign=left + + + + UMLClass + + 40 + 200 + 190 + 30 + + halign=left +*user1/project.cache/* +bg=#9802f5 +lw=0 + + + + Relation + + 40 + 220 + 60 + 120 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 10.0;10.0;10.0;100.0;40.0;100.0 + + + UMLClass + + 290 + 200 + 190 + 30 + + halign=left +*user2/project.cache/* +bg=#9802f5 +lw=0 + + + + Relation + + 340 + 80 + 70 + 140 + + lt=<<<<<->>>>> +lw=4 +fg=#9802f5 + 10.0;10.0;50.0;10.0;50.0;120.0 + + diff --git a/static/img/user-guide/data-sharing/shared-server.png b/static/img/user-guide/data-sharing/shared-server.png new file mode 100644 index 0000000000..9c3039472c Binary files /dev/null and b/static/img/user-guide/data-sharing/shared-server.png differ diff --git a/static/img/user-guide/data-sharing/shared-server.uxf b/static/img/user-guide/data-sharing/shared-server.uxf new file mode 100644 index 0000000000..4eee3cdc42 --- /dev/null +++ b/static/img/user-guide/data-sharing/shared-server.uxf @@ -0,0 +1,297 @@ + + + // Uncomment the following line to change the fontsize and font: +// fontsize=14 +fontfamily=Monospaced //possible: SansSerif,Serif,Monospaced + +////////////////////////////////////////////////////////////////////////////////////////////// +// Welcome to UMLet! +// +// Double-click on elements to add them to the diagram, or to copy them +// Edit elements by modifying the text in this panel +// Hold Ctrl to select multiple elements +// Use Ctrl+mouse to select via lasso +// +// Use +/- or Ctrl+mouse wheel to zoom +// Drag a whole relation at its central square icon +// +// Press Ctrl+C to copy the whole diagram to the system clipboard (then just paste it to, eg, Word) +// Edit the files in the "palettes" directory to create your own element palettes +// +// Select "Custom Elements > New..." to create new element types +////////////////////////////////////////////////////////////////////////////////////////////// + + +// This text will be stored with each diagram; use it for notes. + 10 + + UMLClass + + 200 + 30 + 250 + 350 + + halign=left +*/var/local/data (XFS)* +lt=.. + + + + + UMLClass + + 250 + 60 + 150 + 30 + + halign=left +* project.cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 250 + 100 + 150 + 30 + + halign=left +* project.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 250 + 140 + 150 + 110 + + halign=left +*user1-project/* +lt=.. + + + + UMLClass + + 270 + 170 + 110 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 270 + 210 + 110 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 250 + 260 + 150 + 110 + + halign=left +*user2-project/* +lt=.. + + + + UMLClass + + 270 + 290 + 110 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 270 + 330 + 110 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + Relation + + 370 + 110 + 70 + 90 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 30.0;10.0;50.0;10.0;50.0;70.0;10.0;70.0 + + + Relation + + 370 + 100 + 80 + 220 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 30.0;10.0;60.0;10.0;60.0;200.0;10.0;200.0 + + + Relation + + 220 + 70 + 70 + 170 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 30.0;10.0;10.0;10.0;10.0;150.0;50.0;150.0 + + + Relation + + 210 + 60 + 80 + 300 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 40.0;10.0;10.0;10.0;10.0;280.0;60.0;280.0 + + + UMLClass + + 30 + 30 + 150 + 350 + + halign=left +*/home* +lt=.. + + + + UMLClass + + 50 + 110 + 110 + 90 + + halign=left +*user1/* +lt=.. +group=5 + + + + UMLClass + + 60 + 150 + 80 + 30 + + *project/* +lt=.. +lw=2 +group=5 + + + + UMLClass + + 50 + 230 + 110 + 90 + + halign=left +*user2/* +lt=.. +group=6 + + + + UMLClass + + 60 + 270 + 80 + 30 + + *project/* +lt=.. +lw=2 +group=6 + + + + Relation + + 130 + 140 + 140 + 40 + + lt=<. +lw=1.5 + + 120.0;20.0;10.0;20.0 + + + Relation + + 130 + 270 + 140 + 30 + + lt=<. +lw=1.5 + 120.0;10.0;10.0;10.0 + + diff --git a/static/img/user-guide/data-sharing/ssh-storage.png b/static/img/user-guide/data-sharing/ssh-storage.png new file mode 100644 index 0000000000..a48b334114 Binary files /dev/null and b/static/img/user-guide/data-sharing/ssh-storage.png differ diff --git a/static/img/user-guide/data-sharing/ssh-storage.uxf b/static/img/user-guide/data-sharing/ssh-storage.uxf new file mode 100644 index 0000000000..549e468f8b --- /dev/null +++ b/static/img/user-guide/data-sharing/ssh-storage.uxf @@ -0,0 +1,233 @@ + + + // Uncomment the following line to change the fontsize and font: +// fontsize=14 +fontfamily=Monospaced //possible: SansSerif,Serif,Monospaced + +////////////////////////////////////////////////////////////////////////////////////////////// +// Welcome to UMLet! +// +// Double-click on elements to add them to the diagram, or to copy them +// Edit elements by modifying the text in this panel +// Hold Ctrl to select multiple elements +// Use Ctrl+mouse to select via lasso +// +// Use +/- or Ctrl+mouse wheel to zoom +// Drag a whole relation at its central square icon +// +// Press Ctrl+C to copy the whole diagram to the system clipboard (then just paste it to, eg, Word) +// Edit the files in the "palettes" directory to create your own element palettes +// +// Select "Custom Elements > New..." to create new element types +////////////////////////////////////////////////////////////////////////////////////////////// + + +// This text will be stored with each diagram; use it for notes. + 10 + + UMLClass + + 170 + 60 + 170 + 30 + + halign=left +*/srv/project.cache/ * +bg=#9802f5 +lw=0 + + + + UMLClass + + 170 + 100 + 170 + 30 + + halign=left +*/srv/project.git/ * +bg=#fc5e03 +lw=0 + + + + Relation + + 250 + 120 + 70 + 140 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 10.0;10.0;10.0;120.0;50.0;120.0 + + + Relation + + 190 + 120 + 70 + 140 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 50.0;10.0;50.0;120.0;10.0;120.0 + + + Relation + + 330 + 70 + 140 + 230 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 10.0;10.0;120.0;10.0;120.0;210.0;80.0;210.0 + + + Relation + + 40 + 70 + 150 + 230 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 130.0;10.0;10.0;10.0;10.0;210.0;50.0;210.0 + + + UMLClass + + 280 + 200 + 150 + 110 + + halign=left +*user2/project/* +lt=.. +group=9 + + + + UMLClass + + 300 + 270 + 110 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 +group=9 + + + + UMLClass + + 300 + 230 + 110 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 +group=9 + + + + UMLUseCase + + 130 + 40 + 250 + 110 + + lt=.. + + + + + UMLClass + + 270 + 170 + 170 + 150 + + *host2* +halign=left +group=9 + + + + UMLClass + + 70 + 200 + 150 + 110 + + halign=left +*user1/project/* +lt=.. +group=10 + + + + UMLClass + + 90 + 270 + 110 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 +group=10 + + + + UMLClass + + 90 + 230 + 110 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 +group=10 + + + + UMLClass + + 60 + 170 + 170 + 150 + + *host1* +halign=left +group=10 + + + diff --git a/static/img/user-guide/data-sharing/synced-storage.png b/static/img/user-guide/data-sharing/synced-storage.png new file mode 100644 index 0000000000..c5f4ea66c2 Binary files /dev/null and b/static/img/user-guide/data-sharing/synced-storage.png differ diff --git a/static/img/user-guide/data-sharing/synced-storage.uxf b/static/img/user-guide/data-sharing/synced-storage.uxf new file mode 100644 index 0000000000..5a444876c6 --- /dev/null +++ b/static/img/user-guide/data-sharing/synced-storage.uxf @@ -0,0 +1,279 @@ + + + // Uncomment the following line to change the fontsize and font: +// fontsize=14 +fontfamily=Monospaced //possible: SansSerif,Serif,Monospaced + +////////////////////////////////////////////////////////////////////////////////////////////// +// Welcome to UMLet! +// +// Double-click on elements to add them to the diagram, or to copy them +// Edit elements by modifying the text in this panel +// Hold Ctrl to select multiple elements +// Use Ctrl+mouse to select via lasso +// +// Use +/- or Ctrl+mouse wheel to zoom +// Drag a whole relation at its central square icon +// +// Press Ctrl+C to copy the whole diagram to the system clipboard (then just paste it to, eg, Word) +// Edit the files in the "palettes" directory to create your own element palettes +// +// Select "Custom Elements > New..." to create new element types +////////////////////////////////////////////////////////////////////////////////////////////// + + +// This text will be stored with each diagram; use it for notes. + 10 + + UMLClass + + 190 + 70 + 170 + 30 + + halign=left +*/srv/project.cache/ * +bg=#9802f5 +lw=0 + + + + UMLClass + + 190 + 110 + 170 + 30 + + halign=left +*/srv/project.git/ * +bg=#fc5e03 +lw=0 + + + + Relation + + 270 + 130 + 70 + 180 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 10.0;10.0;10.0;160.0;50.0;160.0 + + + Relation + + 210 + 130 + 70 + 180 + + lt=<<<->>> +lw=1.5 +fg=#fc5e03 + 50.0;10.0;50.0;160.0;10.0;160.0 + + + Relation + + 430 + 230 + 60 + 120 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 40.0;10.0;40.0;100.0;10.0;100.0 + + + Relation + + 140 + 70 + 70 + 160 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 50.0;10.0;10.0;10.0;10.0;140.0 + + + UMLClass + + 300 + 250 + 180 + 110 + + halign=left +*user2/project/* +lt=.. + + + + UMLClass + + 320 + 320 + 120 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 320 + 280 + 120 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLUseCase + + 160 + 50 + 230 + 110 + + lt=.. + + + + + UMLClass + + 290 + 180 + 200 + 190 + + *host2* +halign=left + + + + UMLClass + + 60 + 250 + 180 + 110 + + halign=left +* user1/project/* +lt=.. + + + + UMLClass + + 100 + 320 + 120 + 30 + + halign=left +*.dvc/cache/* +bg=#9802f5 +lw=0 + + + + UMLClass + + 100 + 280 + 120 + 30 + + halign=left +*.git/* +bg=#fc5e03 +lw=0 + + + + UMLClass + + 40 + 180 + 210 + 190 + + *host1* +halign=left + + + + UMLClass + + 60 + 210 + 180 + 30 + + halign=left +*user1/project.cache/* +bg=#9802f5 +lw=0 + + + + Relation + + 60 + 230 + 60 + 120 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 10.0;10.0;10.0;100.0;40.0;100.0 + + + UMLClass + + 300 + 210 + 180 + 30 + + halign=left +*user2/project.cache/* +bg=#9802f5 +lw=0 + + + + Relation + + 350 + 70 + 70 + 160 + + lt=<<<->>> +lw=1.5 +fg=#9802f5 + 10.0;10.0;50.0;10.0;50.0;140.0 + +