Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify external outputs section #143

Open
dmpetrov opened this issue Dec 22, 2018 · 11 comments

Comments

@dmpetrov
Copy link
Member

commented Dec 22, 2018

  • First, https://dvc.org/doc/user-guide/external-outputs describes caches configuration, but not outputs. All the examples describe how to reconfigure cache. Let's rename it to something like Cache Reconfiguration or Setup Cache and let's change the section introduction paragraph accordingly.

  • Local cache reconfiguration is missing. Something like dvc config cache.dir /mnt/cache.
    Also, it is not clear explain why we use cache.{s3, gs, ssh, hdfs} but for local cache we use cache.dir, not cache.local: cache.dir is a shortcut for cache.local.

  • Consolidate everything related to Managing External Data (also including https://dvc.org/doc/user-guide/external-dependencies) under this document, adding an index.md file to do an overview of different approaches and features DVC gives to support it.

@dmpetrov dmpetrov added the question label Dec 22, 2018

@efiop

This comment was marked as resolved.

Copy link
Member

commented Dec 22, 2018

I think External outputs name is suitable since the article explains both cache configuration(as a necessity for outputs) and outputs themselves. You can see dvc run commands showing how to use external outputs.

We indeed need to explain better external output case for local file outside of the dvc project. Also, cache.dir is a shortcut for cache.local. That needs better explaining as well.

@shcheklein shcheklein changed the title Misleading section name: External Outputs clarify external outputs section Mar 25, 2019

@jorgeorpinel

This comment was marked as resolved.

Copy link
Collaborator

commented Aug 17, 2019

This issue is a little old and I see the doc is now called "Managing External Data" but from what I'm reading in the description this problem is still pending an update right? And should I also rename the URL so it's not external-outputs but that it matches external-data at least?

@efiop

This comment was marked as resolved.

Copy link
Member

commented Aug 17, 2019

I somehow missed when the renaming took place, not sure about the reasons, to me it makes it more confusing, since external-data is more about dvc add, but external outputs is about dvc run, which is described there as well. Maybe it would be worth splitting that article into two external-data and external-output?

@shcheklein

This comment was marked as resolved.

Copy link
Member

commented Aug 20, 2019

@efiop @jorgeorpinel the idea was to consolidate everything related to "external data management" under this new section + add index.md to do an overview of different approaches and features DVC gives to support it. That was just the first step, but it still way better than having just "external outputs" (as a User Guide top level secion name) which is not descriptive at all, especially for the dvc add case.

@jorgeorpinel

This comment was marked as outdated.

Copy link
Collaborator

commented Aug 20, 2019

Makes sense. Maybe we should update the title and description of the issue to detail on that idea? Then we'll just prioritize accordingly. I could address this after #425 for example.

@shcheklein

This comment was marked as outdated.

Copy link
Member

commented Aug 20, 2019

@jorgeorpinel yep, feel free to update accordingly ... please, check if there are other tickets related to this already

@jorgeorpinel

This comment was marked as resolved.

Copy link
Collaborator

commented Aug 20, 2019

@shcheklein ok I updated the description, does it look good?

@shcheklein

This comment was marked as resolved.

Copy link
Member

commented Aug 20, 2019

@jorgeorpinel looks good!

@jorgeorpinel

This comment has been minimized.

Copy link
Collaborator

commented Aug 20, 2019

Question:

Consolidate everything related to Managing External Data

What about external data sources? I.e. DVC remotes, dvc import and dvc import-url commands, or even more general concepts like "remote location" and "external data source". (from #566) Should we mention any of those in this new consolidated doc, or keep separate?

@shcheklein

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

@jorgeorpinel I think we can definitely mention them to provide a full picture of different options how one can connect data from different sources

@jorgeorpinel

This comment has been minimized.

Copy link
Collaborator

commented Aug 22, 2019

So basically #566 is a duplicate of this issue. Meaning, the consolidated document here described could solve both issues, right? I'll merge them if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.