diff --git a/content/docs/command-reference/cache/dir.md b/content/docs/command-reference/cache/dir.md
index d2da878f14..9f2cc9e751 100644
--- a/content/docs/command-reference/cache/dir.md
+++ b/content/docs/command-reference/cache/dir.md
@@ -1,7 +1,7 @@
# cache dir
Set/unset the cache directory location intuitively (compared to
-using `dvc config cache`).
+using `dvc config cache`), or shows the current configured value.
## Synopsis
@@ -19,11 +19,13 @@ positional arguments:
Helper to set the `cache.dir` configuration option. (See
[cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).)
-Unlike doing so with `dvc config cache`, this command transform paths (`value`)
-that are provided relative to the current working directory into paths
+Unlike doing so with `dvc config cache`, `dvc cache dir` transform paths
+(`value`) that are provided relative to the current working directory into paths
**relative to the config file location**. However, if the `value` provided is an
-absolute path, then it's preserved as it is. If no path is provided, it prints
-the path for current cache directory.
+absolute path, then it's preserved as it is.
+
+If no path `value` is provided to this command, it prints the path for current
+cache directory.
## Options
diff --git a/content/docs/command-reference/check-ignore.md b/content/docs/command-reference/check-ignore.md
index 21e7f3d74c..60ad9f2889 100644
--- a/content/docs/command-reference/check-ignore.md
+++ b/content/docs/command-reference/check-ignore.md
@@ -119,7 +119,7 @@ file1
file2
```
-It can also be used as a component of a POSIX pipe:
+It can also be used as part of a POSIX pipe:
```dvc
cat file_list | dvc check-ignore --stdin
diff --git a/content/docs/command-reference/unfreeze.md b/content/docs/command-reference/unfreeze.md
index d1a784dc58..0002bb4045 100644
--- a/content/docs/command-reference/unfreeze.md
+++ b/content/docs/command-reference/unfreeze.md
@@ -10,7 +10,6 @@ usage: dvc unfreeze [-h] [-q | -v] targets [targets ...]
positional arguments:
targets Stages or .dvc files to unfreeze
- (see also `dvc freeze`).
```
## Description
diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
index 6794fbff8f..efc7899783 100644
--- a/content/docs/sidebar.json
+++ b/content/docs/sidebar.json
@@ -100,8 +100,8 @@
"slug": "how-to",
"source": false,
"children": [
- "add-output-to-stage",
"undo-adding-data",
+ "add-deps-or-outs-to-a-stage",
"update-tracked-files"
]
},
diff --git a/content/docs/start/index.md b/content/docs/start/index.md
index a2f465ca94..060edd3e9a 100644
--- a/content/docs/start/index.md
+++ b/content/docs/start/index.md
@@ -49,13 +49,13 @@ Changes to be committed:
$ git commit -m "Initialize DVC"
```
-DVC features can be grouped into layers. We'll explore them one by one in the
-next few sections:
+DVC features can be grouped into functional components. We'll explore them one
+by one in the next few sections:
-- [**Data versioning**](/doc/start/data-versioning) is the core part of DVC for
- large files, datasets, machine learning models versioning and efficient
- sharing. We'll show how to use a regular Git workflow, without storing large
- files with Git. Think "Git for data".
+- [**Data versioning**](/doc/start/data-versioning) is the base layer of DVC for
+ large files, datasets, and machine learning models. It looks like a regular
+ Git workflow, but without storing large files in the repo (think "Git for
+ data"). Data is stored separately, which allows for efficient sharing.
- [**Data access**](/doc/start/data-access) shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
diff --git a/content/docs/use-cases/versioned-storage.md b/content/docs/use-cases/versioned-storage.md
new file mode 100644
index 0000000000..3ced05eacf
--- /dev/null
+++ b/content/docs/use-cases/versioned-storage.md
@@ -0,0 +1,13 @@
+# Versioned storage
+
+What if we could **combine data and ML model versioning features with large file
+storage** solutions like traditional hard drives, NAS, or cloud services such as
+Amazon S3 and Google Drive? DVC brings together the best of both worlds by
+implementing easy synchronization between the data cache and
+on-premises or cloud storage for sharing.
+
+![](/img/model-versioning-diagram.png) _DVC's hybrid versioned storage_
+
+> Note that [remote storage](/doc/command-reference/remote) is optional in DVC:
+> no server setup or special services are needed, just the `dvc` command-line
+> tool.
diff --git a/content/docs/user-guide/dvc-files-and-directories.md b/content/docs/user-guide/dvc-files-and-directories.md
index 09e2ff8ee7..7398c5903a 100644
--- a/content/docs/user-guide/dvc-files-and-directories.md
+++ b/content/docs/user-guide/dvc-files-and-directories.md
@@ -282,22 +282,29 @@ Full parameters (key and value) are listed separately under
## Structure of the cache directory
-There are two ways in which the data is stored in cache: As a
-single file (eg. `data.csv`), or a directory of files.
+The DVC cache is a
+[content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage),
+which adds a layer of indirection between code and data.
-For the first case, we calculate the file hash, a 32 characters long string
-(usually MD5). The first two characters are used to name the directory inside
-`.dvc/cache`, and the rest become the file name of the cached file. For example,
-if a data file `Posts.xml.zip` has a hash value of
-`ec1d2935f811b77cc49b031b999cbf17`, its path in the cache will be
-`.dvc/cache/ec/1d2935f811b77cc49b031b999cbf17`.
+There are two ways in which the data is cached: As a single file
+(eg. `data.csv`), or as a directory.
+
+### For files
+
+DVC calculates the file hash, a 32 characters long string (usually MD5). The
+first two characters are used to name the directory inside `.dvc/cache`, and the
+rest become the file name of the cached file. For example, if a data file
+`Posts.xml.zip` has a hash value of `ec1d2935f811b77cc49b031b999cbf17`, its path
+in the cache will be `.dvc/cache/ec/1d2935f811b77cc49b031b999cbf17`.
> Note that file hashes are calculated from file contents only. 2 or more files
> with different names but the same contents can exist in the workspace and be
> tracked by DVC, but only one copy is stored in the cache. This helps avoid
> data duplication in cache and remotes.
-For the second case, let us consider a directory with 2 images.
+### For directories
+
+Let's imagine [adding](/doc/command-reference/add) a directory with 2 images:
```dvc
$ tree data/images/
@@ -308,21 +315,10 @@ data/images/
$ dvc add data/images
```
-When running `dvc add` on this directory of images, a `data/images.dvc`
-[DVC-file](/doc/user-guide/dvc-files-and-directories) is created, containing the
-hash value of the directory:
-
-```yaml
-outs:
- - md5: 196a322c107c2572335158503c64bfba.dir
- path: data/images
-```
-
-The directory in cache is stored as a JSON file (with `.dir` file extension)
-describing it's contents, along with the files it contains in cache, like this:
+The directory entry in the cache is stored as a JSON file with `.dir` file
+extension, along with the files it contains in cache, like this:
```dvc
-$ tree .dvc/cache
.dvc/cache/
├── 19
│ └── 6a322c107c2572335158503c64bfba.dir
@@ -332,11 +328,9 @@ $ tree .dvc/cache
└── 0b40427ee0998e9802335d98f08cd98f
```
-The cache file with `.dir` extension is a special text file that contains the
-mapping of files in the `data/` directory (as a JSON array), along with their
-hash values. The other two cache files are the files inside `data/`.
-
-A typical `.dir` cache file looks like this:
+This `.dir` file contains the mapping of files in `data/images` (as a JSON
+array), including their hash values. That's how DVC knows that the other two
+cached files belong in the directory:
```dvc
$ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir
diff --git a/content/docs/user-guide/how-to/add-deps-or-outs-to-a-stage.md b/content/docs/user-guide/how-to/add-deps-or-outs-to-a-stage.md
new file mode 100644
index 0000000000..bc495a6f3a
--- /dev/null
+++ b/content/docs/user-guide/how-to/add-deps-or-outs-to-a-stage.md
@@ -0,0 +1,56 @@
+# Add Deps or Outs to a Stage
+
+There are situations where we have executed a stage (either by writing
+`dvc.yaml` manually and using `dvc repro`, or with `dvc run`), but later notice
+that some of the build requirements are missing from `dvc.yaml`:
+
+- Files or directories in the workspace that are dependencies of
+ the stage, are missing from `deps` field.
+
+- Output files or directories that the stage creates, which are already in the
+ workspace, are missing from `outs` field.
+
+Follow the steps below to add existing files/directories as
+dependencies or outputs to a stage without
+re-executing it again, which can be expensive/time-consuming, and is
+unnecessary.
+
+We start with an example `prepare`, which has a single dependency and output. To
+add a missing dependency `data/data.csv`, and output `data/validate` to this
+stage, we can edit `dvc.yaml` like this:
+
+```git
+ stages:
+ prepare:
+ cmd: python src/prepare.py
+ deps:
++ - data/data.csv
+ - src/prepare.py
+ outs:
+ - data/train
++ - data/validate
+```
+
+> Note that you can also use `dvc run` with the `-f` and `--no-exec` options to
+> add another output to the stage:
+>
+> ```dvc
+> $ dvc run -f --no-exec \
+> -n prepare \
+> -n prepare \
+> -d src/prepare.py \
+> -o data/train \
+> -o data/validate \
+> python src/prepare.py
+> ```
+>
+> `-f` overwrites the stage in `dvc.yaml`, while `--no-exec` updates the stage
+> without executing it.
+
+Finally, we need to run `dvc commit` to save the newly specified output(s) to
+the cache (and to update the hash values of `deps` and `outs` in
+`dvc.lock`):
+
+```dvc
+$ dvc commit
+```
diff --git a/content/docs/user-guide/how-to/add-output-to-stage.md b/content/docs/user-guide/how-to/add-output-to-stage.md
deleted file mode 100644
index 79e3ef292d..0000000000
--- a/content/docs/user-guide/how-to/add-output-to-stage.md
+++ /dev/null
@@ -1,46 +0,0 @@
-# Add Output to Stage
-
-There are situations where we have executed a stage (either by writing
-`dvc.yaml` manually and using `dvc repro`, or with `dvc run`), but later notice
-that some of the output files or directories it creates, which are already in
-the workspace, are missing from `dvc.yaml` (`outs` field). Follow
-the steps below to add existing files or directories as outputs to
-a stage without re-executing it again, which can be expensive/time-consuming,
-and is unnecessary.
-
-We start with an example `prepare`, which has a single output. To add a missing
-output `data/validate` to this stage, we can edit `dvc.yaml` like this:
-
-```git
- stages:
- prepare:
- cmd: python src/prepare.py
- deps:
- - src/prepare.py
- outs:
- - data/train
-+ - data/validate
-```
-
-> Note that you can also use `dvc run` with the `-f` and `--no-exec` options to
-> add another output to the stage:
->
-> ```dvc
-> $ dvc run -f --no-exec \
-> -n prepare \
-> -d src/prepare.py \
-> -o data/train \
-> -o data/validate \
-> python src/prepare.py
-> ```
->
-> `-f` overwrites the stage in `dvc.yaml`, while `--no-exec` updates the stage
-> without executing it.
-
-Finally, we need to run `dvc commit` to save the newly specified output(s) to
-the cache (and to update the corresponding hash values in
-`dvc.lock`):
-
-```dvc
-$ dvc commit
-```
diff --git a/content/docs/user-guide/how-to/undo-adding-data.md b/content/docs/user-guide/how-to/undo-adding-data.md
index d749f8fc5c..e5a59d747c 100644
--- a/content/docs/user-guide/how-to/undo-adding-data.md
+++ b/content/docs/user-guide/how-to/undo-adding-data.md
@@ -3,8 +3,8 @@
There are situations where you want to stop tracking data added previously.
Follow the steps listed here to undo `dvc add`.
-Let's first add a data file into an example project using
-`dvc add`, which creates a `.dvc` file to track the data:
+Let's first add a data file into an example project, which creates
+a `.dvc` file to track the data:
```dvc
$ dvc add data.csv
@@ -12,32 +12,24 @@ $ ls
data.csv data.csv.dvc
```
-> Note, if you are using `symlink` or `hardlink` as
-> [link type](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
-> for DVC cache, you will have to unprotect the tracked file first
-> (see `dvc unprotect`):
->
-> ```dvc
-> $ dvc unprotect data.csv
-> ```
+> Note, if you're using `symlink` or `hardlink` as the project's
+> [link type](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache),
+> you'll have to unprotect the tracked file first (see `dvc unprotect`).
-Now let's reverse `dvc add` by removing the corresponding `.dvc` file and
-`.gitignore` entry using `dvc remove`:
+Now let's reverse that with `dvc remove`. This removes the `.dvc` file (and
+corresponding `.gitignore` entry). The data file is now no longer being tracked
+after this:
```dvc
$ dvc remove data.csv.dvc
-```
-
-Data file `data.csv` is now no longer being tracked by DVC.
-```dvc
$ git status
Untracked files:
data.csv
```
You can run `dvc gc` with the `-w` option to remove the data that isn't
-referenced in the current workspace from the cache:
+referenced in the current workspace from the cache:
```dvc
$ dvc gc -w
diff --git a/content/docs/user-guide/how-to/update-tracked-files.md b/content/docs/user-guide/how-to/update-tracked-files.md
index e74a1d06b6..554974d263 100644
--- a/content/docs/user-guide/how-to/update-tracked-files.md
+++ b/content/docs/user-guide/how-to/update-tracked-files.md
@@ -1,4 +1,4 @@
-# Updating Tracked Files
+# Update Tracked Files
Due to the way DVC handles linking between the data files between the
cache and their counterparts in the workspace (refer
diff --git a/content/docs/user-guide/related-technologies.md b/content/docs/user-guide/related-technologies.md
index 3a7d72c704..9be7a4a93f 100644
--- a/content/docs/user-guide/related-technologies.md
+++ b/content/docs/user-guide/related-technologies.md
@@ -82,8 +82,9 @@ _Luigi_, etc.
## Experiment management software
-- DVC uses Git as the underlying layer for data, pipelines, an experiment
- versioning, instead of a custom web application.
+- DVC uses Git as the underlying version control layer for data, pipelines, and
+ experiments. Data versions exist as metadata in Git, as opposed to using
+ external databases or APIs, so no additional services are required.
- DVC doesn't need to run any services. There's no GUI as a result, but we
expect some GUI services will be created on top of DVC.