Skip to content

Commit

Permalink
Modified some sentences and the tooltip for clarity
Browse files Browse the repository at this point in the history
  • Loading branch information
iesahin committed Feb 26, 2021
1 parent 5674991 commit 0b099f6
Showing 1 changed file with 20 additions and 20 deletions.
40 changes: 20 additions & 20 deletions content/docs/user-guide/basic-concepts/workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,39 @@
name: Workspace
match: [workspace]
tooltip: >-
Directory containing all your project files e.g. raw datasets, source code, ML
models, etc. Typically, it's also a Git repository. It will contain your DVC
The directory containing all your project files, e.g., the raw data, source
code, ML models. Typically, it's also a Git repository. It contains your DVC
project.
---

# Workspace

A data science project consists of data obtained from many different sources.
This data may be split into multiple files or directories or (as the project
structure needs) have different versions for different requirements. e.g. A
A data science project can consist of data obtained from many distinct sources.
These may be split into multiple files or directories or (as the project
structure needs) have different versions for different requirements, e.g., a
smaller / simplified version might be required in prototyping for faster
feedback and shorter training times. A single workspace to manage all artifacts
of a project is desirable, although versioning needs and managing dependencies
make it increasingly complex.
make it increasingly difficult.

DVC allows a single directory to contain all your project artifacts. The
workspace is the directory containing _visible_ part of your
<abbr>project</abbr> e.g. raw data, source code, model files. You can have
multiple versions of data, models and other kinds of artifacts within the
workspace and limit your focus to a subset of these, then record your progress
in a commit and move along your data and model history. DVC provides a _machine
learning file system_ to manipulate your data and models using its commands. No
need to keep different artifact versions in different directories, under
different names, without any semantic relation among them. Instead DVC is able
to link your data and models and can show the evolution, progress and
interrelationships.
workspace is the directory containing the _visible_ part of your
<abbr>project</abbr>, e.g., the raw data, source code, model files. You can have
multiple versions of data, models, and other kinds of artifacts within the
workspace and limit your focus to a subset of these. You can record your
progress in a commit and analyze your data and model history. DVC provides a
_machine learning file system_ to manipulate your data and models using its
commands. No need to rename your models for minor changes, save cleaned up data
in different directories or save tens of different renamed files for training
programs. DVC can keep track of all of these in a single directory called the
workspace.

Files and directories in the workspace can be added to DVC (`dvc add`) or they
Files and directories in the workspace can be added to DVC (`dvc add`), or they
can be downloaded from external sources (`dvc get`, `dvc import`,
`dvc import-url`). Changes to the data, notebooks, models, and any related
machine learning artifact can be tracked (`dvc commit`) and their content can be
synchronized (`dvc checkout`). Tracked data can be removed (`dvc remove`) from
the workspace.
machine learning artifact can be tracked (`dvc commit`), and their content can
be synchronized (`dvc checkout`). Tracked data can be removed (`dvc remove`)
from the workspace.

DVC supports all typical operations of a versioned data file system through its
commands. Behind the scene these operations use <abbr>metafiles</abbr> like the
Expand Down

0 comments on commit 0b099f6

Please sign in to comment.