Skip to content

Conversation

@skshetry
Copy link
Collaborator

@skshetry skshetry commented Jan 15, 2020

Closes #2994.

From the product requirements, it deviates on the following issue:

  1. For now, it does not support showing URLs for multiple paths.
    So, we can only do dvc get <url> <path> --show-url for a single path, so as
    to make it compatible with dvc get.

  2. It also does not support specifying remote at the moment, so as to not introduce new flags. (get: --show-url does not support specifying remotes #3183)

  3. It does not work with the file inside of a directory (i.e. dir/file). Limitation of dvc.api.get_url(). Requires adding directory granularity support. (api: get_url does not support links for granular file  #3180)

    ➜ dvc get . dir/file --show-url
    ERROR: failed to show url - unable to find DVC-file with output 'dir/file'
  4. How to show path/URL to the directory? Currently, it only shows .dir file. (api: get_url() returns path to .dir for directory #3182)

    ➜ dvc get . dir --show-url
    ../tmp.fKEyzBMWge/12/11325135bf45fe5f67efa975110a57.dir

    Same is the case with dvc get (cc @jorgeorpinel for docs regarding dvc.api.get_url()).

  5. It does not work with non-dvc repos (i.e. git only repos). So, the following won't work (api/get: cannot show url for git only repos #3181):

    ➜ dvc get https://github.com/schacon/cowsay install.sh --show-url 
    ERROR: failed to show url - URL 'https://github.com/schacon/cowsay' is not a dvc repository.
  6. Custom revision does not work for the local repo.
    Not fixing at the moment. api/get: local repo with custom rev is not supported #3179 tracks this.

    ➜ dvc get . foo --rev=HEAD@{5.days.ago} # works
    ➜ dvc get . foo --rev=HEAD@{5.days.ago} --show-url
    ERROR: unexpected error - Custom revision is not supported for local repo

Example Usage

➜ dvc get . foo --show-url
../tmp.uJVvn5ZaLY/d3/b07384d113edec49eaa6238ad5ff00
➜ dvc get https://github.com/iterative/dataset-registry.git get-started/data.xml --rev=HEAD@{5.days.ago} --show-url
https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355

EDIT: Made changes to format from <path> <url> previously to just <url>.

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
    cmd ref: document get --show-url option dvc.org#930

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏


TODO

Edgecases

@skshetry skshetry self-assigned this Jan 15, 2020
@skshetry
Copy link
Collaborator Author

Some subtle difference that I noticed with --show-url flags vis-a-vis without flags:

  1. get has path as a positional argument. I don't want to change it to have nargs=*. Is it okay to have get <url> <single_path> --show-url?

  2. dvc get . foo --rev=HEAD~ works, but, dvc get . foo --rev=HEAD~ --show-url does not work because of assertion in _make_repo().

So, this is currently failing:

def test_get_url_with_rev(tmp_dir, erepo_dir):
    with erepo_dir.chdir():
        erepo_dir.dvc_gen("foo", "foo")
        assert main(["get", ".", "foo", "--rev", "HEAD", "--show-url"]) == 0

@skshetry

This comment has been minimized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess good question here is whether or not --show-url should also become a part of repo.get. Hm... For now, we can leave it as is, since api has get_url anyways. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to refactor get_url and make that part of the internals, but, I left it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, we could consider splitting it into _get_url and _show_url here, so that we could test them separately without invoking main each time. Kinda like we do in pipeline show.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it a bit to make it like pipeline show, but, I didn't see any tests where we directly make use of Cmd* commands (we do check for isinstance in few places though). pipeline show tests also calls main.

@skshetry skshetry marked this pull request as ready for review January 15, 2020 15:18
@skshetry skshetry changed the title [WIP] get: implement --show-url to display only url/path to remote get: implement --show-url to display only url/path to remote Jan 15, 2020
@skshetry skshetry requested review from a user, efiop and pared January 15, 2020 15:30
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry , looks good!

I don't like the idea of having --show-url instead of a separate command.
Specially because get is pretty ambiguous 😓 (get what?)
It could be like get --url, but colliding two verbs like get + --show feels weird.

I know that you didn't make the decision, just advocating for something else) (cc: @shcheklein)

Is get --show-url | wget the same as get?

@shcheklein
Copy link
Member

shcheklein commented Jan 15, 2020

@MrOutis I think the short version --url will be even more confusing. get --url - feels like we provide an URL to get from or something. The whole thing is not ideal, but I still don't feel we should add a global command just for this. Also, like @jorgeorpinel mentioned we can move this to the future dvc list - it probably makes even more sense than dvc get.

@ghost
Copy link

ghost commented Jan 15, 2020

@shcheklein , indeed! let's roll with --show-url and then move it to list

@jorgeorpinel
Copy link
Contributor

Is get --show-url | wget the same as get?

Only for HTTP remotes.

@ghost
Copy link

ghost commented Jan 16, 2020

Only for HTTP remotes.

So, if I use it for S3 it won't give me a usable URL?

@efiop
Copy link
Contributor

efiop commented Jan 16, 2020

@MrOutis It will, but not usable with wget 🙂 I think

dvc get .. --show-url | dvc get-url

would work though 😄 Checkmate, wget!

@skshetry
Copy link
Collaborator Author

skshetry commented Jan 16, 2020

dvc get .. --show-url | dvc get-url

@efiop, do we support reading from stdin? Couldn't find any, and above command does not work, because we don't read from stdin (and, I need to change format of the output).

Workaround for now is following:

dvc get .. --show-url | xargs dvc get-url

Is get --show-url | wget the same as get?

The above also does not seem to work unless we use xargs or wget -i -.

But, I have to change the output from <path> <url> to just <url>, as it works for only one path at the moment.

@efiop
Copy link
Contributor

efiop commented Jan 16, 2020

Good point @skshetry ! So wget behaves the same, let's keep the same behavior in get-url for now as well 🙂

@skshetry
Copy link
Collaborator Author

Good point @skshetry ! So wget behaves the same, let's keep the same behavior in get-url for now as well

You mean, introducing - arg? Or, leave as it is (like, plain wget)?

@efiop
Copy link
Contributor

efiop commented Jan 16, 2020

@skshetry I meant just leaving it as is for now, indeed 🙂

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments under tests

@skshetry skshetry requested review from a user, efiop and pared January 16, 2020 13:45
remote_obj = _repo.cloud.get_remote(remote)
return str(remote_obj.checksum_to_path_info(out.checksum))
except NotDvcRepoError:
raise UrlNotDvcRepoError(repo)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise this same error in _make_repo in a separate PR. Just keeping it here for now.

On windows, `str(exc_info)` is returning escaped path, eg:
'<ExceptionInfo UrlNotDvcRepoError("URL \'C:\\\\Users\\\\travis\\\\AppData\\\\Local\\\\Temp\\\\pytest-of-travis\\\\pytest-0\\\\popen-gw2\\\\test_get_url_git_only_repo_thr0\' is not a dvc repository.") tblen=2>'

This commit just replaces the double slashes with single one. Bit of a hack, I know.
@skshetry skshetry requested a review from Suor January 16, 2020 16:28
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions, not required, besides that LGTM.

Co-Authored-By: Paweł Redzyński <pawelredzynski@gmail.com>
@efiop efiop requested a review from pared January 17, 2020 07:41
get_parser.add_argument(
"--show-url",
action="store_true",
help="Returns path/url to the location in remote for specified path",
Copy link
Contributor

@efiop efiop Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use the description from https://github.com/iterative/dvc/pull/3130/files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, looks like it is still up for discussion, so i'm sure we will modify it while working on the docs. Let's keep it as is for now 👍

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I would change this description, its confusing. Sorry, didn't notice before merge...

This comment was marked as resolved.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Please see one minor comment above.

@efiop
Copy link
Contributor

efiop commented Jan 17, 2020

@skshetry Also, need to not forget about docs 🙂

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🎉

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry sorry didn't get to review this before merge but there are some language improvements from my side, tied to iterative/dvc.org/pull/936. Thanks

)
logger.info(url)
except DvcException:
logger.exception("failed to show url")
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization: "Failed to show URL" Maybe end in period . ?
Depends on whether it's printed as part of a larger message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel, the lowercase seems to be used everywhere in the codebase. And, because this is a logger.exception, our formatter handles adding extra information from raised exception besides this particular logging message. Notice the messages on following examples:

➜ dvc get https://github.com/schacon/cowsay install.sh --show-url 
ERROR: failed to show url - URL 'https://github.com/schacon/cowsay' is not a dvc repository.
➜ dvc get . dir/file --show-url
ERROR: failed to show url - unable to find DVC-file with output 'dir/file'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed to show url - URL it looks strange though :) First it proves that the lowercase seems to be used everywhere is not true (and it should not be, typically, acronyms and initialisms are written in all capital letters to distinguish them from ordinary words. - NASA, URL, DVC, etc). Second, you have a repetition here. Repetition is probably fine if reasons vary and you can't actually control them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein, it's not something that I add, but, I'm utilizing the fact that something will be there. So, it's more like "(what failed) - (the reason)".

Copy link
Collaborator Author

@skshetry skshetry Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, yes, regarding lowercase, I just looked into it and it's not everywhere for other cases of logging, but for exceptions, lowercase is used everywhere.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this is a logger.exception, our formatter handles adding extra information

Thanks, good to know this.

lowercase seems to be used everywhere is not true...

➕ 1️⃣ No need to use bad grammar on purpose because of existing bad grammar.

failed to show url - URL it looks strange though :) ...

it's more like "(what failed) - (the reason)".

So then the reason part in this case should start directly with http://... – but that's not part of this PR...

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason part in this case should start directly with http://... – but that's not part of this PR

OK I changed this in ec0c39a#diff-3b2745249f7139953ab4b45b382e69e9 but this whole conversation kind of opened a can of worms and that PR (#3220) grew quite a bit 😅 Hoping core team doesn't flip out.

but for exceptions, lowercase is used everywhere

Thanks for the tip. I may review some of these too 😋

get_parser.add_argument(
"--show-url",
action="store_true",
help="Returns path/url to the location in remote for specified path",

This comment was marked as resolved.

@jorgeorpinel
Copy link
Contributor

UPDATE: I'm addressing my own feedback in #3220.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get: show url to cache

5 participants