Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement additional CLI repo functionality #1671

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions src/huggingface_hub/commands/huggingface_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from huggingface_hub.commands.download import DownloadCommand
from huggingface_hub.commands.env import EnvironmentCommand
from huggingface_hub.commands.lfs import LfsCommands
from huggingface_hub.commands.repo import RepoCommands
from huggingface_hub.commands.scan_cache import ScanCacheCommand
from huggingface_hub.commands.upload import UploadCommand
from huggingface_hub.commands.user import UserCommands
Expand All @@ -36,6 +37,7 @@ def main():
LfsCommands.register_subcommand(commands_parser)
ScanCacheCommand.register_subcommand(commands_parser)
DeleteCacheCommand.register_subcommand(commands_parser)
RepoCommands.register_subcommand(commands_parser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this line up between UserCommand and UploadCommand? Looks more important than the lfs and cache commands IMO


# Let's go
args = parser.parse_args()
Expand Down
310 changes: 310 additions & 0 deletions src/huggingface_hub/commands/repo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
# coding=utf-8
# Copyright 2023-present, the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains commands to perform repo management with the CLI.

Usage Examples:
# Create a new repository on huggingface.co
huggingface-cli repo create my-cool-model

# List repositories on huggingface.co
huggingface-cli repo list

# Delete a repository on huggingface.co
huggingface-cli repo delete my-cool-model

# Toggle the visibility of a repository to private
huggingface-cli repo toggle my-cool-model private
"""
import subprocess
from argparse import Namespace, _SubParsersAction

from requests.exceptions import HTTPError

from huggingface_hub.commands import BaseHuggingfaceCLICommand
from huggingface_hub.constants import (
REPO_TYPES,
REPO_TYPES_URL_PREFIXES,
SPACES_SDK_TYPES,
)
from huggingface_hub.hf_api import HfApi

from ..utils import HfFolder
from ._cli_utils import ANSI


class RepoCommands(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction):
repo_parser = parser.add_parser(
"repo",
help=(
"{create, ls-files, list, delete, toggle visibility} commands to interact with your huggingface.co"
" repos."
),
)
repo_subparsers = repo_parser.add_subparsers(help="huggingface.co repos related commands")
repo_create_parser = repo_subparsers.add_parser("create", help="Create a new repo on huggingface.co")
repo_create_parser.add_argument(
"name",
type=str,
help="Name for your repo. Will be namespaced under your username to build the repo id.",
)
Comment on lines +59 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would renamed it repo_id for consistency with the create_repo method.

repo_create_parser.add_argument(
"--type",
choices=["model", "dataset", "space"],
default="model",
help='Optional: type: set to "dataset" or "space" if creating a dataset or space, default is model.',
)
Comment on lines +64 to +69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename it --repo-type for consistency as well (same for other parsers)

repo_create_parser.add_argument("--organization", type=str, help="Optional: organization namespace.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organization is a bit of a legacy argument. Let's remove it as it is now possible to directly set a repo_id.

repo_create_parser.add_argument(
"--space_sdk",
type=str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type=str,
type=choices["docker", "gradio", "static" or "streamlit"],

help='Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help='Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".',
help='Choice of SDK to use if --repo-type="space"',

No need to say it's optional (-- args are basically all optional). Same for the other descriptions :)

choices=SPACES_SDK_TYPES,
)
repo_create_parser.add_argument(
"-y",
"--yes",
action="store_true",
help="Optional: answer Yes to the prompt",
)
repo_create_parser.set_defaults(func=lambda args: RepoCreateCommand(args))
repo_list_parser = repo_subparsers.add_parser("list", help="List repos on huggingface.co")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
repo_list_parser = repo_subparsers.add_parser("list", help="List repos on huggingface.co")
# huggingface-cli repo list
repo_list_parser = repo_subparsers.add_parser("list", help="List repos on huggingface.co")

(nit) I'd add newlines between parsers for code readability
(same for other parser below)

repo_list_parser.add_argument(
"--type",
choices=["model", "dataset", "space"],
default="model",
help="Type of repos to list, default is model.",
)
repo_list_parser.add_argument(
"--author",
type=str,
help="Optional: author namespace.",
)
repo_list_parser.add_argument(
"--search",
type=str,
help="Optional: A string that will be contained in the returned repo ids.",
)
repo_list_parser.set_defaults(func=lambda args: RepoListCommand(args))
repo_delete_parser = repo_subparsers.add_parser("delete", help="Delete a repo on huggingface.co")
repo_delete_parser.add_argument(
"name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo_id (see above)

type=str,
help="Name of your repo to delete.",
)
repo_delete_parser.add_argument(
"--type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--repo-type (see above)

choices=["model", "dataset", "space"],
default="model",
help="Type of repos to delete, default is model.",
)
repo_delete_parser.add_argument("--organization", type=str, help="Optional: organization namespace.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed (see above)

repo_delete_parser.add_argument(
"-y",
"--yes",
action="store_true",
help="Optional: answer Yes to the prompt",
)
repo_delete_parser.set_defaults(func=lambda args: RepoDeleteCommand(args))
repo_toggle_parser = repo_subparsers.add_parser(
"toggle", help="Toggle a repo on huggingface.co private or public"
)
repo_toggle_parser.add_argument(
"name",
type=str,
help="Name of your repo to toggle.",
)
repo_toggle_parser.add_argument(
"visibility",
choices=["public", "private"],
default="public",
help="Visibility to change repo to, default is public.",
)
repo_toggle_parser.add_argument(
"--type",
choices=["model", "dataset", "space"],
default="model",
help="Type of repo to toggle, default is model.",
)
repo_toggle_parser.add_argument("--organization", type=str, help="Optional: organization namespace.")
repo_toggle_parser.add_argument(
"-y",
"--yes",
action="store_true",
help="Optional: answer Yes to the prompt",
)
repo_toggle_parser.set_defaults(func=lambda args: RepoToggleCommand(args))


class BaseRepoCommand:
def __init__(self, args: Namespace):
self.args = args
self._api = HfApi()
self.token = HfFolder.get_token()
if self.token is None:
print("Not logged in")
exit(1)
try:
stdout = subprocess.check_output(["git", "--version"]).decode("utf-8")
print(ANSI.gray(stdout.strip()))
except FileNotFoundError:
print("Looks like you do not have git installed, please install.")
Comment on lines +152 to +164
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be legacy code. What is does is quite outdated:

  • I'd rather set the actual (typed) attributes in the other classes rather than self.args that is unprecise
  • Initialize HfApi with no arguments => not very useful
  • Check if token exist => not always needed, for example token is not required to list repo from the Hub
  • check git version => seems very legacy. No need of .git to make a HTTP call.

I'd just delete this parent class and define the other classes normally. Please let me know if I missed something.



class RepoCreateCommand(BaseRepoCommand):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class RepoCreateCommand(BaseRepoCommand):
class RepoCreateCommand(BaseRepoCommand):
def __init__(self, args: Namespace):
self.repo_id: str = args.repo_id
self.repo_type: Optional[str] = args.repo_type
self.space_sdk: Optional[str] = args.space_sdk
self.private: bool = args.private
self.token: Optional[str] = args.token

Here is how I would do the __init__. By explicitly settings to values here, you still don't validate their type. But in the subsequent code (e.g. when actually calling create_repo) you will have an extra insurance that if __init__ went well then the rest is fine as well (thanks to mypy static check).

In addition to that, I've updated a bit the arguments. I think we should support --token for all commands (already the case in upload/download) as some users might use the CLI in scripts in which they don't want to save a token on the machine (might be a CI machine running in the cloud for example). Also supporting an optional --private option (as store_true) would be a nice addition.

Finally, I'm not sure we should have this -y or --yes option. Let's assume that if a user wants to create a repo, then let's create it. I know it is already existing code but I don't expect it to be used a lot.

def run(self):
try:
stdout = subprocess.check_output(["git-lfs", "--version"]).decode("utf-8")
print(ANSI.gray(stdout.strip()))
except FileNotFoundError:
print(
ANSI.red(
"Looks like you do not have git-lfs installed, please install."
" You can install from https://git-lfs.github.com/."
" Then run `git lfs install` (you only have to do this once)."
)
)
print("")
Comment on lines +169 to +180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy code.


user = self._api.whoami(self.token)["name"]
namespace = self.args.organization if self.args.organization is not None else user

repo_id = f"{namespace}/{self.args.name}"

if self.args.type not in REPO_TYPES:
print("Invalid repo --type")
exit(1)

if self.args.type in REPO_TYPES_URL_PREFIXES:
prefixed_repo_id = REPO_TYPES_URL_PREFIXES[self.args.type] + repo_id
else:
prefixed_repo_id = repo_id

print(f"You are about to create {ANSI.bold(prefixed_repo_id)}")

if not self.args.yes:
choice = input("Proceed? [Y/n] ").lower()
if not (choice == "" or choice == "y" or choice == "yes"):
print("Abort")
exit()
Comment on lines +182 to +202
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Almost?) all of this logic can be removed I think 😈

try:
url = self._api.create_repo(
repo_id=repo_id,
token=self.token,
repo_type=self.args.type,
space_sdk=self.args.space_sdk,
)
except HTTPError as e:
print(e)
print(ANSI.red(e.response.text))
exit(1)
Comment on lines +210 to +213
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not catch the error for now. I think it's a good idea to try/except + use ANSI.red on error message but if we do that, we should do it for all commands at once.

print("\nYour repo now lives at:")
print(f" {ANSI.bold(url)}")
print("\nYou can clone it locally with the command below, and commit/push as usual.")
print(f"\n git clone {url}")
print("")
Comment on lines +214 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("\nYour repo now lives at:")
print(f" {ANSI.bold(url)}")
print("\nYou can clone it locally with the command below, and commit/push as usual.")
print(f"\n git clone {url}")
print("")
print("\nYour repo now lives at:")
print(f" {ANSI.bold(url)}")
print("\nYou can clone it locally with the command below, and commit/push as usual.")
print(f"\n git clone {url}")
print("")

That's very legacy as well. Who wants to clone models nowadays? 😄



class RepoListCommand(BaseRepoCommand):
def run(self):
self.type = self.args.type
self.author = self.args.author
self.search = self.args.search
Comment on lines +223 to +225
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to __init__ and typed.

try:
if self.type is None or self.type == "model":
repos = self._api.list_models(token=self.token, author=self.author, search=self.search)
elif self.type == "dataset":
repos = self._api.list_datasets(token=self.token, author=self.author, search=self.search)
elif self.type == "space":
repos = self._api.list_spaces(token=self.token, author=self.author, search=self.search)
except HTTPError as e:
print(e)
print(ANSI.red(e.response.text))
exit(1)
print("\nYour repos:")
for repo in repos:
print(f" {ANSI.bold(repo.id)}")
print("")
Comment on lines +237 to +240
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("\nYour repos:")
for repo in repos:
print(f" {ANSI.bold(repo.id)}")
print("")
for repo in repos:
print(repo.id)

I would go straight to the point => only print the ids. The advantage of not being verbose is that the output can be piped to another command directly.

For example:

huggingface-cli repo list --author=Wauplin | wc -l

=> counts the number of models on the Hub for author "Wauplin".

This is much harder to do when there's more colors/formatting.



class RepoDeleteCommand(BaseRepoCommand):
def run(self):
user = self._api.whoami(self.token)["name"]
namespace = self.args.organization if self.args.organization is not None else user

repo_id = f"{namespace}/{self.args.name}"

if self.args.type not in REPO_TYPES:
print("Invalid repo --type")
exit(1)

if self.args.type in REPO_TYPES_URL_PREFIXES:
prefixed_repo_id = REPO_TYPES_URL_PREFIXES[self.args.type] + repo_id
else:
prefixed_repo_id = repo_id

print(f"You are about to delete {ANSI.bold(prefixed_repo_id)}")

if not self.args.yes:
choice = input("Proceed? [Y/n] ").lower()
if not (choice == "" or choice == "y" or choice == "yes"):
print("Abort")
exit()
try:
self._api.delete_repo(repo_id=repo_id, token=self.token, repo_type=self.args.type)
except HTTPError as e:
print(e)
print(ANSI.red(e.response.text))
exit(1)
print("\nYour repo has been deleted.")
print("")


class RepoToggleCommand(BaseRepoCommand):
def run(self):
self.private = self.args.private
user = self._api.whoami(self.token)["name"]
namespace = self.args.organization if self.args.organization is not None else user
Comment on lines +276 to +280
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of this... maybe huggingface-cli repo toggle should be huggingface-cli repo edit [] --visibility public / private like in GitHub gh repo edit?

That's a good question. I would personally be in favor of

huggingface-cli repo set-public Wauplin/my-cool-model
huggingface-cli repo set-private Wauplin/my-cool-dataset --repo-type="dataset"

This way set-public and set-private are explicit commands. Both commands can share the same logic I think. Looking at gh CLI is a good idea to take inspiration from. In this special case I still think we'll have less possible settings to play so being explicit is fine.


repo_id = f"{namespace}/{self.args.name}"
Copy link
Contributor

@Wauplin Wauplin Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a whoami to get the namespace (if not provided) by the user is really useful here as it cannot be implicitly used (in opposition to create_repo/delete_repo)! So good to do it this way :)


if self.args.type not in REPO_TYPES:
print("Invalid repo --type")
exit(1)

if self.args.type in REPO_TYPES_URL_PREFIXES:
prefixed_repo_id = REPO_TYPES_URL_PREFIXES[self.args.type] + repo_id
else:
prefixed_repo_id = repo_id

print(f"You are about to toggle {ANSI.bold(prefixed_repo_id)} to {self.private}")

if not self.args.yes:
choice = input("Proceed? [Y/n] ").lower()
if not (choice == "" or choice == "y" or choice == "yes"):
print("Abort")
exit()
self.privateBool = False if self.private == "public" else True
try:
self._api.update_repo_visibility(
repo_id=repo_id, private=self.privateBool, token=self.token, repo_type=self.args.type
)
except HTTPError as e:
print(e)
print(ANSI.red(e.response.text))
exit(1)
print(f"\nYour repo has been toggled to {self.private}.")
print("")