Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo: add API and CLI for reading artifacts #9770

Merged
merged 8 commits into from
Sep 25, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Jul 27, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close: #9100

  • Adds GTO 1.0 as a dependency
  • Adds api.artifacts_show() for getting a relative path + Git revision for a named artifact version, which can then be used in the existing DVC API methods
  • Adds dvc artifacts get <url> <artifact> for downloading named artifacts
    • Studio URL download is preferred over DVC remote download:
      1. Try to download from studio without doing any DVC operations at all (so the DVC repo will not be loaded/cloned at all)
      2. If that fails, the DVC repo will be loaded and we will then try to download from studio using the repo's studio config section
      3. If that fails we will try to download from the DVC remote

API:

>>> import dvc.api
>>> dvc.api.artifacts_show(
...     "text-classification",
...     repo="https://github.com/iterative/example-get-started.git",
... )
{'rev': '068a1974dfc36104d0a716ac19502d506aa14fe9', 'path': 'model.pkl'}

CLI:
asciicast

Docs PR: iterative/dvc.org#4809

@pmrowla pmrowla self-assigned this Jul 27, 2023
dvc/api/artifacts.py Outdated Show resolved Hide resolved
Comment on lines +29 to +22
name (str): name of the artifact to open.
version (str, optional): version of the artifact to open. Defaults to
the latest version.
stage (str, optional): name of the model registry stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these options are actually mutually exclusive. Would be good to explicitly handle that early in the code.

I am a little confused about the GTO code as it just silently overrides tags in:

https://github.com/iterative/gto/blob/c82563d988ea927d9cd0275bb0c2f288dd73e0b6/gto/tag.py#L158C13-L163

Copy link
Contributor Author

@pmrowla pmrowla Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems like they are supposed to be mutually exclusive, but I wasn't sure either since GTO doesn't check for it.

CC: @aguschin

@pmrowla
Copy link
Contributor Author

pmrowla commented Jul 27, 2023

@dberenbaum CI tests haven't been updated for this yet, but otherwise the PR can be installed/tested manually.

Other thoughts - it seems confusing that we use artifacts everywhere in DVC and models everywhere in Studio/Model Registry

Comment on lines +66 to +75
get_parser.add_argument(
"--rev",
nargs="?",
help="Artifact version",
metavar="<version>",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use --version since it conflicts with the global DVC version flag, so I went with --rev for now.

@codecov
Copy link

codecov bot commented Jul 28, 2023

Codecov Report

Patch coverage is 56.50% of modified lines.

Files Changed Coverage
dvc/cli/parser.py ΓΈ
dvc/utils/__init__.py ΓΈ
dvc/api/artifacts.py 27.27%
dvc/repo/artifacts.py 41.98%
dvc/commands/artifacts.py 67.50%
dvc/api/__init__.py 100.00%
dvc/exceptions.py 100.00%
dvc/fs/__init__.py 100.00%
tests/func/artifacts/test_artifacts.py 100.00%

πŸ“’ Thoughts on this report? Let us know!.

@dberenbaum
Copy link
Contributor

Other thoughts - it seems confusing that we use artifacts everywhere in DVC and models everywhere in Studio/Model Registry

Yes, it's confusing but I think it is somewhat explainable. An artifact can be any path and may contain metadata like type, labels, etc. One type of artifact is a model, and the primary use for artifact metadata is in the model registry. There is a similar distinction in wandb models. I don't think we want to say that this metadata can only apply to models, but the primary use case is specific to models.

@dberenbaum
Copy link
Contributor

@pmrowla Do you want me to wait before reviewing?

@pmrowla pmrowla force-pushed the artifact-get branch 6 times, most recently from 963d3bd to 755cc5f Compare August 11, 2023 03:22
@pmrowla pmrowla changed the title [WIP] repo: add API and CLI for reading artifacts repo: add API and CLI for reading artifacts Aug 11, 2023
@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 11, 2023

@dberenbaum this is ready for review/testing

@pmrowla pmrowla force-pushed the artifact-get branch 2 times, most recently from 4c9cfff to 54823b2 Compare August 11, 2023 16:21
@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 11, 2023

PR has been updated to prefer downloading with studio over DVC remote.

  1. If DVC_STUDIO_TOKEN is set in the environment, we will try to download from studio without doing any DVC operations at all (so the DVC repo will not be loaded/cloned at all)
  2. If that fails or if the token is not set in the environment, the DVC repo will be loaded and we will then try to download from studio using the repo's studio config section
  3. If that fails we will try to download from the DVC remote

@dberenbaum
Copy link
Contributor

I'm getting a long pause after cloning the repo followed by an error (the signed urls should fail here but the artifact should be found, and not sure why it takes so long):

Screen.Recording.2023-08-11.at.12.46.35.PM.mov

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 11, 2023

@dberenbaum ah, I forgot about needing the support for both the studio/gto and DVC style addressing. The PR currently only has the support for the studio/gto names (I think it should find the artifact if you use results/train:pool-segmentation

and regarding the long pause, there are no progress callbacks for the studio API calls and there's no support for progress in GTO, so that's something we'd have to address in a follow up. (but once we get URLs from studio, the HTTP download will have progress bars like any other file transfer in DVC)

@dberenbaum
Copy link
Contributor

Some thoughts on the long pause:

  1. I don't have a studio token set anywhere here, so can we skip even trying it?
  2. There's no long pause before cloning, so is it actually about the studio api or is it something else?

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 11, 2023

Some thoughts on the long pause:

  1. I don't have a studio token set anywhere here, so can we skip even trying it?
  2. There's no long pause before cloning, so is it actually about the studio api or is it something else?

If DVC_STUDIO_TOKEN isn't set in your env, it skips trying to use the studio API before cloning (which is why there's no pause before cloning). After the clone, it tries to access the studio API using your studio config, or the dvc-studio-client defaults if you don't have a config. The dvc-studio-client behavior for this probably needs to be smarter about exiting early here.

@dberenbaum
Copy link
Contributor

Can we print a result showing what's been downloaded and the path to it? Since the path is not part of the command, it's not obvious where to find it.

@dberenbaum
Copy link
Contributor

Trying a different repo and can't get past this error:

$ dvc artifacts get -vv git@github.com:iterative/lstm_seq2seq.git results:best
2023-08-11 13:39:24,195 DEBUG: v3.14.1.dev9+g54823b259, CPython 3.11.4 on macOS-13.4.1-arm64-arm-64bit
2023-08-11 13:39:24,196 DEBUG: command: /Users/dave/micromamba/envs/dvc/bin/dvc artifacts get -vv git@github.com:iterative/lstm_seq2seq.git results:best
2023-08-11 13:39:24,196 TRACE: Namespace(quiet=0, verbose=2, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='get', url='git@github.com:iterative/lstm_seq2seq.git', name='results:best', rev=None, stage=None, out=None, jobs=None, force=False, config=None, func=<class 'dvc.commands.artifacts.CmdArtifactsGet'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2023-08-11 13:39:24,387 DEBUG: Creating external repo git@github.com:iterative/lstm_seq2seq.git@None
2023-08-11 13:39:24,388 DEBUG: erepo: git clone 'git@github.com:iterative/lstm_seq2seq.git' to a temporary dir
2023-08-11 13:39:28,706 DEBUG: Trying to download artifact 'results:best' via studio
2023-08-11 13:39:28,706 DEBUG: Trying to download artifact 'results:best' via DVC
2023-08-11 13:39:28,777 TRACE: switching fs to revision 77bd34d
2023-08-11 13:39:28,799 TRACE: Context during resolution of stage download:
{'model': {'batch_size': 512, 'latent_dim': 8, 'duration': '00:00:30:00', 'max_epochs': 5, 'optim': {'lr': 0.001}}, 'data_path': 'fra.txt', 'num_samples': 10000, 'seed': 423}
2023-08-11 13:39:28,832 TRACE: Context during resolution of stage train:
{'model': {'batch_size': 512, 'latent_dim': 8, 'duration': '00:00:30:00', 'max_epochs': 5, 'optim': {'lr': 0.001}}, 'data_path': 'fra.txt', 'num_samples': 10000, 'seed': 423}
2023-08-11 13:39:28,833 TRACE:    53.45 ms in collecting stages from /
2023-08-11 13:39:28,833 TRACE:     1.92 mks in collecting stages from /.github
2023-08-11 13:39:28,833 TRACE:     1.75 mks in collecting stages from /.github/workflows
2023-08-11 13:39:28,833 TRACE:     3.08 mks in collecting stages from /conf
2023-08-11 13:39:28,833 TRACE:     3.50 mks in collecting stages from /conf/model
2023-08-11 13:39:28,834 TRACE:   862.58 mks in collecting stages from /results
2023-08-11 13:39:28,917 DEBUG: failed to load ('model',) from storage local (/var/folders/24/99_tf1xj3vx8k1k_jkdmnhq00000gn/T/tmpjnauvqowdvc-cache/files/md5) - [Errno 2] No such file or directory: '/var/folders/24/99_tf1xj3vx8k1k_jkdmnhq00000gn/T/tmpjnauvqowdvc-cache/files/md5/70/06e6dfdcbce27a2362214995586475.dir'
Traceback (most recent call last):
  File "/Users/dave/micromamba/envs/dvc/lib/python3.11/site-packages/dvc_data/index/index.py", line 545, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/Users/dave/micromamba/envs/dvc/lib/python3.11/site-packages/dvc_data/index/index.py", line 475, in _load_from_object_storage
    obj = Tree.load(
          ^^^^^^^^^^
  File "/Users/dave/micromamba/envs/dvc/lib/python3.11/site-packages/dvc_data/hashfile/tree.py", line 191, in load
    with obj.fs.open(obj.path, "r") as fobj:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc-objects/src/dvc_objects/fs/base.py", line 222, in open
    return self.fs.open(path, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc-objects/src/dvc_objects/fs/local.py", line 134, in open
    return open(path, mode=mode, encoding=encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/24/99_tf1xj3vx8k1k_jkdmnhq00000gn/T/tmpjnauvqowdvc-cache/files/md5/70/06e6dfdcbce27a2362214995586475.dir'

@dberenbaum
Copy link
Contributor

dberenbaum commented Aug 11, 2023

Here's a simple example that uses the aws sandbox: https://github.com/dberenbaum/example-get-started/blob/main/dvc.yaml. Both the studio rest api and dvc get work fine on this example.

With a studio token it works but still clones the repo, which is unexpected:

$ dvc artifacts get -vv git@github.com:dberenbaum/example-get-started.git text-classification
2023-08-11 13:42:29,325 DEBUG: v3.14.1.dev9+g54823b259, CPython 3.11.4 on macOS-13.4.1-arm64-arm-64bit
2023-08-11 13:42:29,325 DEBUG: command: /Users/dave/micromamba/envs/dvc/bin/dvc artifacts get -vv git@github.com:dberenbaum/example-get-started.git text-classification
2023-08-11 13:42:29,325 TRACE: Namespace(quiet=0, verbose=2, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='get', url='git@github.com:dberenbaum/example-get-started.git', name='text-classification', rev=None, stage=None, out=None, jobs=None, force=False, config=None, func=<class 'dvc.commands.artifacts.CmdArtifactsGet'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2023-08-11 13:42:29,571 DEBUG: Creating external repo git@github.com:dberenbaum/example-get-started.git@None
2023-08-11 13:42:29,571 DEBUG: erepo: git clone 'git@github.com:dberenbaum/example-get-started.git' to a temporary dir
2023-08-11 13:42:33,673 DEBUG: Trying to download artifact 'text-classification' via studio
2023-08-11 13:42:35,054 DEBUG: Analytics is disabled.

Without a studio token it fails completely:

$ dvc artifacts get -vv git@github.com:dberenbaum/example-get-started.git text-classification
2023-08-11 13:48:05,792 DEBUG: v3.14.1.dev9+g54823b259, CPython 3.11.4 on macOS-13.4.1-arm64-arm-64bit
2023-08-11 13:48:05,792 DEBUG: command: /Users/dave/micromamba/envs/dvc/bin/dvc artifacts get -vv git@github.com:dberenbaum/example-get-started.git text-classification
2023-08-11 13:48:05,793 TRACE: Namespace(quiet=0, verbose=2, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='get', url='git@github.com:dberenbaum/example-get-started.git', name='text-classification', rev=None, stage=None, out=None, jobs=None, force=False, config=None, func=<class 'dvc.commands.artifacts.CmdArtifactsGet'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2023-08-11 13:48:05,990 DEBUG: Creating external repo git@github.com:dberenbaum/example-get-started.git@None
2023-08-11 13:48:05,990 DEBUG: erepo: git clone 'git@github.com:dberenbaum/example-get-started.git' to a temporary dir
2023-08-11 13:48:09,708 DEBUG: Trying to download artifact 'text-classification' via studio
2023-08-11 13:48:09,708 DEBUG: Trying to download artifact 'text-classification' via DVC
2023-08-11 13:48:09,806 ERROR: failed to get 'text-classification' from 'git@github.com:dberenbaum/example-get-started.git' - Unable to find artifact 'text-classification': No studio config
Traceback (most recent call last):
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 268, in get
    return cls._download_studio(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 191, in _download_studio
    for path, url in get_download_uris(
                     ^^^^^^^^^^^^^^^^^^
  File "/Users/dave/micromamba/envs/dvc/lib/python3.11/site-packages/dvc_studio_client/model_registry.py", line 39, in get_download_uris
    raise ValueError("No studio config")
ValueError: No studio config

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/dave/Code/dvc/dvc/commands/artifacts.py", line 18, in run
    Artifacts.get(
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 292, in get
    raise exc from saved_exc
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 282, in get
    return repo.artifacts.download(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 153, in download
    rev = self.get_rev(name, version=version, stage=stage)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/repo/artifacts.py", line 119, in get_rev
    raise ArtifactNotFoundError(name, version=version, stage=stage)
dvc.exceptions.ArtifactNotFoundError: Unable to find artifact 'text-classification'

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 25, 2023

Cloning and pulling from the remote is working for me, but the path it prints isn't right:

I'm still seeing an unexpected error when trying it with wrong/no credentials:

Both of these should be resolved now

Since there's two separate sets of exceptions (one for the studio case and one for the dvc remote case) the logging for it is a bit clunky but for now it looks like:

$ dvc artifacts get git@github.com:iterative/lstm_seq2seq.git results:best
ERROR: Failed to download artifact 'results:best' via Studio - No studio config
ERROR: failed to get 'results:best' from 'git@github.com:iterative/lstm_seq2seq.git' - Failed to download artifact 'results:best' via DVC remote: failed to load directory ('model',): Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden

@aguschin
Copy link
Contributor

@dberenbaum there was an example here https://github.com/iterative/dvc.org/pull/4681/files#diff-ad0fba6815b9afc00db151a1c167f308681cd0faaf07cc402bf087756296c4c2R58

just checked - it works

@dberenbaum
Copy link
Contributor

Thanks @aguschin! I'm able to get that curl command to work, but not to download the model:

$ curl "https://studio.iterative.ai/api/model-registry/get-download-uris?repo=git@github.com:iterative/demo-bank-customer-churn.git&name=randomforest-model&version=v2.0.0" --header "Authorization:token ***"
{".mlem/model/clf-model":"https://sandbox-datasets-iterative.s3.amazonaws.com/bank-customer-churn/4c/65eb98e1f9b9cae801f1a3f124f8ae?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIATZNUPOWBHJ4AE55H%2F20230825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230825T144821Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH8aCXVzLWVhc3QtMiJIMEYCIQDnSDv%2BAZd1kn4Ir5KUTcO7wjEAJOHyFwrXFxAEkvdbYgIhAMKqBSmMkOrv3qbvzfPiT1hitvvnNad7dqfix4o%2BzYwMKvcECEgQAxoMMjYwNzYwODkyODAyIgxue92VDAAxClYsAqoq1ARvFH8YevdIWvwfflwNS05yQqSdNdjTewTiEPyqNxsXeH%2BNy4uPE39CPnwM8jEGZZAGsaCFUpRoQ6i%2BVx6CHr4GExaHqWTRYLQILPA7NSi7ITQg1cfm4tW%2BX8lO2DXYT3uUaPWI3P7LVJmv%2BLCUzzWw2vt34nsNioH%2FytTMKUwPIE4t8Pv6LbHhBOQrQYfhM1EnkBlI7zYXnlwldzDrJHUbP7ydc%2FOx3cD1BvfMF5wDZZVtWCUHI7kt3uZ%2FGPcZvleYHLOuTGe9qa7dOniAY%2FANiAYrNUfeX19UrZQK%2B8Hr6ft%2FlIOvcgWeOfKvlTvUiEFgqWLCiPr9Olwuc%2FwG1gyXBNx9DVgJF6ARq5%2B38ufF69HKon1Jp2duzWEn9332EFLS43k6BEiAIcfOnmBK3o2hnyfRXxVb9IYm9pumqD8mWtPNK0dP9FcyiOW9ItEVFXx5AXYBJNrm5Vk4sjHN5bZe7TMoxU1B3%2BIQSHS9dcCKIdRGPYhyXSma8pZX44b6Mh9grWLP%2F%2F5k1mkI1IhSAC%2BFhP%2FDU8WRjVWdY0l91ChRXn20qyIsONm4FOxjTMpFzqBvNFEksqIO20gyDDeIQMMD9nyPCHwTvLqUjUegCM1Sa9FLWsxnoVYP2J1XTiaWSQDsL9cTlUVdO1HxGSYY4w%2ByQP67Eqd4I4yM4T7TKPDWh8MnCfdVhxe7P0kVCT4sAt04uUEiLrz0DwHA%2FtsPEHWXdnCkV8NMo3UFy%2FYagd8tbAPMxe4raGeFdmZlHzu1QpZW5aWs68fufNs6okRHevNG%2BUXVujC0%2FqKnBjqZAVblJbmkOvTNbDEzua48Fmfih3CAhb9L1PQyEMSe9M2PR520fQcif9FXDjWR3s7dZgLauEdBJsB%2F8BYU1EgJSsjd0v3%2BUTdfWfhsADV6b3zobAJkjPwqiRNq3jcd5MPyW%2FEQYfrHAHBME%2B%2BvZ9NfVfRXy%2Bh395e7vDuQb06rEblZXIoPsVEpEk8oNFShTfZHqSc13PRwkq9Rew%3D%3D&X-Amz-Signature=a6c9b2ea7cb452ae4c4d17df69a753d0b84d0e4cc38c66beaa33c99765ba9fb4"}%

$ wget "https://sandbox-datasets-iterative.s3.amazonaws.com/bank-customer-churn/4c/65eb98e1f9b9cae801f1a3f124f8ae?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIATZNUPOWBHJ4AE55H%2F20230825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230825T144821Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH8aCXVzLWVhc3QtMiJIMEYCIQDnSDv%2BAZd1kn4Ir5KUTcO7wjEAJOHyFwrXFxAEkvdbYgIhAMKqBSmMkOrv3qbvzfPiT1hitvvnNad7dqfix4o%2BzYwMKvcECEgQAxoMMjYwNzYwODkyODAyIgxue92VDAAxClYsAqoq1ARvFH8YevdIWvwfflwNS05yQqSdNdjTewTiEPyqNxsXeH%2BNy4uPE39CPnwM8jEGZZAGsaCFUpRoQ6i%2BVx6CHr4GExaHqWTRYLQILPA7NSi7ITQg1cfm4tW%2BX8lO2DXYT3uUaPWI3P7LVJmv%2BLCUzzWw2vt34nsNioH%2FytTMKUwPIE4t8Pv6LbHhBOQrQYfhM1EnkBlI7zYXnlwldzDrJHUbP7ydc%2FOx3cD1BvfMF5wDZZVtWCUHI7kt3uZ%2FGPcZvleYHLOuTGe9qa7dOniAY%2FANiAYrNUfeX19UrZQK%2B8Hr6ft%2FlIOvcgWeOfKvlTvUiEFgqWLCiPr9Olwuc%2FwG1gyXBNx9DVgJF6ARq5%2B38ufF69HKon1Jp2duzWEn9332EFLS43k6BEiAIcfOnmBK3o2hnyfRXxVb9IYm9pumqD8mWtPNK0dP9FcyiOW9ItEVFXx5AXYBJNrm5Vk4sjHN5bZe7TMoxU1B3%2BIQSHS9dcCKIdRGPYhyXSma8pZX44b6Mh9grWLP%2F%2F5k1mkI1IhSAC%2BFhP%2FDU8WRjVWdY0l91ChRXn20qyIsONm4FOxjTMpFzqBvNFEksqIO20gyDDeIQMMD9nyPCHwTvLqUjUegCM1Sa9FLWsxnoVYP2J1XTiaWSQDsL9cTlUVdO1HxGSYY4w%2ByQP67Eqd4I4yM4T7TKPDWh8MnCfdVhxe7P0kVCT4sAt04uUEiLrz0DwHA%2FtsPEHWXdnCkV8NMo3UFy%2FYagd8tbAPMxe4raGeFdmZlHzu1QpZW5aWs68fufNs6okRHevNG%2BUXVujC0%2FqKnBjqZAVblJbmkOvTNbDEzua48Fmfih3CAhb9L1PQyEMSe9M2PR520fQcif9FXDjWR3s7dZgLauEdBJsB%2F8BYU1EgJSsjd0v3%2BUTdfWfhsADV6b3zobAJkjPwqiRNq3jcd5MPyW%2FEQYfrHAHBME%2B%2BvZ9NfVfRXy%2Bh395e7vDuQb06rEblZXIoPsVEpEk8oNFShTfZHqSc13PRwkq9Rew%3D%3D&X-Amz-Signature=a6c9b2ea7cb452ae4c4d17df69a753d0b84d0e4cc38c66beaa33c99765ba9fb4"
The destination name is too long (1543), reducing to 236
--2023-08-25 10:48:42--  https://sandbox-datasets-iterative.s3.amazonaws.com/bank-customer-churn/4c/65eb98e1f9b9cae801f1a3f124f8ae?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIATZNUPOWBHJ4AE55H%2F20230825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230825T144821Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH8aCXVzLWVhc3QtMiJIMEYCIQDnSDv%2BAZd1kn4Ir5KUTcO7wjEAJOHyFwrXFxAEkvdbYgIhAMKqBSmMkOrv3qbvzfPiT1hitvvnNad7dqfix4o%2BzYwMKvcECEgQAxoMMjYwNzYwODkyODAyIgxue92VDAAxClYsAqoq1ARvFH8YevdIWvwfflwNS05yQqSdNdjTewTiEPyqNxsXeH%2BNy4uPE39CPnwM8jEGZZAGsaCFUpRoQ6i%2BVx6CHr4GExaHqWTRYLQILPA7NSi7ITQg1cfm4tW%2BX8lO2DXYT3uUaPWI3P7LVJmv%2BLCUzzWw2vt34nsNioH%2FytTMKUwPIE4t8Pv6LbHhBOQrQYfhM1EnkBlI7zYXnlwldzDrJHUbP7ydc%2FOx3cD1BvfMF5wDZZVtWCUHI7kt3uZ%2FGPcZvleYHLOuTGe9qa7dOniAY%2FANiAYrNUfeX19UrZQK%2B8Hr6ft%2FlIOvcgWeOfKvlTvUiEFgqWLCiPr9Olwuc%2FwG1gyXBNx9DVgJF6ARq5%2B38ufF69HKon1Jp2duzWEn9332EFLS43k6BEiAIcfOnmBK3o2hnyfRXxVb9IYm9pumqD8mWtPNK0dP9FcyiOW9ItEVFXx5AXYBJNrm5Vk4sjHN5bZe7TMoxU1B3%2BIQSHS9dcCKIdRGPYhyXSma8pZX44b6Mh9grWLP%2F%2F5k1mkI1IhSAC%2BFhP%2FDU8WRjVWdY0l91ChRXn20qyIsONm4FOxjTMpFzqBvNFEksqIO20gyDDeIQMMD9nyPCHwTvLqUjUegCM1Sa9FLWsxnoVYP2J1XTiaWSQDsL9cTlUVdO1HxGSYY4w%2ByQP67Eqd4I4yM4T7TKPDWh8MnCfdVhxe7P0kVCT4sAt04uUEiLrz0DwHA%2FtsPEHWXdnCkV8NMo3UFy%2FYagd8tbAPMxe4raGeFdmZlHzu1QpZW5aWs68fufNs6okRHevNG%2BUXVujC0%2FqKnBjqZAVblJbmkOvTNbDEzua48Fmfih3CAhb9L1PQyEMSe9M2PR520fQcif9FXDjWR3s7dZgLauEdBJsB%2F8BYU1EgJSsjd0v3%2BUTdfWfhsADV6b3zobAJkjPwqiRNq3jcd5MPyW%2FEQYfrHAHBME%2B%2BvZ9NfVfRXy%2Bh395e7vDuQb06rEblZXIoPsVEpEk8oNFShTfZHqSc13PRwkq9Rew%3D%3D&X-Amz-Signature=a6c9b2ea7cb452ae4c4d17df69a753d0b84d0e4cc38c66beaa33c99765ba9fb4
Resolving sandbox-datasets-iterative.s3.amazonaws.com (sandbox-datasets-iterative.s3.amazonaws.com)... 52.216.241.12, 52.217.111.92, 54.231.169.217, ...
Connecting to sandbox-datasets-iterative.s3.amazonaws.com (sandbox-datasets-iterative.s3.amazonaws.com)|52.216.241.12|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-08-25 10:48:42 ERROR 403: Forbidden.

Should we open a Studio issue?

@dberenbaum
Copy link
Contributor

@pmrowla Basic scenarios starting to look good!

--stage still isn't working for me though:

$ AWS_PROFILE=iterative-sandbox dvc artifacts get -f git@github.com:iterative/lstm_seq2seq.git results:best --stage=dev
ERROR: Failed to download artifact 'results:best' via Studio - No studio config
ERROR: failed to get 'results:best' from 'git@github.com:iterative/lstm_seq2seq.git' - Failed to download artifact 'results:best' via DVC remote: Unable to find artifact 'results:best @ dev'

Otherwise, I'm blocked on QA for the Studio token download by the comments above, but otherwise LGTM.

Not a blocker (it could be in a follow up), but since we have --config, could we also add --remote and --remote-config (those may have been added to dvc get after you started on this)?

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 28, 2023

--stage still isn't working for me though:

This should be fixed now

Not a blocker (it could be in a follow up), but since we have --config, could we also add --remote and --remote-config (those may have been added to dvc get after you started on this)?

This is also supported now

Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @pmrowla! I'm ready to merge, although we need to track a couple follow-up issues:

  1. Check the studio download functionality after https://github.com/iterative/studio/issues/7383 is resolved.
  2. Reduce time it takes to clone the repo (dvc get/artifacts get: repo clone is slowΒ #9880)

@dberenbaum
Copy link
Contributor

@iterative/dvc Could someone please review?

@dberenbaum
Copy link
Contributor

ping @iterative/dvc for review

@dberenbaum dberenbaum enabled auto-merge (rebase) September 22, 2023 13:50
@skshetry skshetry enabled auto-merge (squash) September 25, 2023 14:36
@skshetry skshetry merged commit 12b5725 into iterative:main Sep 25, 2023
21 checks passed
daavoo added a commit to iterative/dvclive that referenced this pull request Sep 25, 2023
The function we were using from `dvc.repo.artifacts` was dropped in iterative/dvc#9770

deps: bump  "dvc>3.22.1"
daavoo added a commit to iterative/dvclive that referenced this pull request Sep 26, 2023
* log_artifact: Use name validation from GTO.

The function we were using from `dvc.repo.artifacts` was dropped in iterative/dvc#9770

deps: bump  "dvc>3.22.1"

* deps: Add `gto`.
@efiop efiop added the feature is a feature label Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Download models (type: model) with dvc get
7 participants