Skip to content

Commit

Permalink
Merge pull request #1263 from pachyderm/git-docs
Browse files Browse the repository at this point in the history
Git docs
  • Loading branch information
dwhitena committed Jan 16, 2017
2 parents 5ff576b + 6024432 commit db8cafb
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions doc/pachyderm_file_system.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,16 @@ Doing this on big data sets gets interesting, but having a simple underlying int

## Versioning

PFS is very Git-like. A data set is compromised of many `Files`, which constitutes a `Repo`.
Interactions with PFS are very Git-like. Your data, which is made up of one or more `Files`, is versioned in a data repository, or `Repo`.

In PFS you version your data with `Commits`. By versioning your data, you can:
With PFS, you version your data by making `Commits` of data into `Repos`. By versioning your data, you can:

- reproduce any input or output for your processing, which in turn enables ...
- collaborating with your peers on a data set

[Reproducibility and Collaboration](https://pachyderm.io/dsbor.html) are things we care a lot about.

We store each commit only as the data that changed from the prior commit. This is a concept borrowed from Git. Storing your data this way also allows us to enable [Incrementality](https://pachyderm.io/dsbor.html).
We store each commit as only the data that changed from the prior commit, which is where PFS differs from Git. Storing your data this way allows us to enable [Incrementality](https://pachyderm.io/dsbor.html) and keeps PFS space efficient.

## Files vs Blocks

Expand Down

0 comments on commit db8cafb

Please sign in to comment.