Skip to content

Commit

Permalink
Update data_management.md
Browse files Browse the repository at this point in the history
fixing broken shuffle link
  • Loading branch information
Nick-Harvey committed Dec 6, 2018
1 parent 0e47783 commit e3add0e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion doc/managing_pachyderm/data_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ ln -s /pfs/input/log.txt /pfs/out/logs/log.txt

Under the hood, Pachyderm is smart enough to recognize that the output file simply symlinks to a file that already exists in Pachyderm, and therefore skips the upload altogether.

Note that if your shuffling pipeline only needs the names of the input files but not their content, you can use [`lazy input`](http://pachyderm.readthedocs.io/en/latest/reference/pipeline_spec.html#atom-input). That way, your shuffling pipeline can skip both the download and the upload. An example for this type of shuffle pipeline is [here](https://github.com/pachyderm/pachyderm/tree/master/doc/examples/lazy_shuffle)
Note that if your shuffling pipeline only needs the names of the input files but not their content, you can use [`lazy input`](http://pachyderm.readthedocs.io/en/latest/reference/pipeline_spec.html#atom-input). That way, your shuffling pipeline can skip both the download and the upload. An example for this type of shuffle pipeline is [here](https://github.com/pachyderm/pachyderm/tree/master/doc/examples/shuffle)

## Garbage collection

Expand Down

0 comments on commit e3add0e

Please sign in to comment.