Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API deprecate positional args in file_download and hf_api #745

Merged
merged 21 commits into from Mar 29, 2022
Merged

API deprecate positional args in file_download and hf_api #745

merged 21 commits into from Mar 29, 2022

Conversation

adrinjalali
Copy link
Contributor

Fixes #732

This PR deprecates passing positional args to most functions and methods in file_download.py and hf_api.py.

Things to discuss:

  • whether we want to make all parameters kwarg only or leave some as positional
  • make more parts of the API kwarg only

cc @julien-c @LysandreJik @osanseviero

Question: do we have a place to put changelog/release logs? How do we handle those now?

@muellerzr
Copy link
Contributor

I'd think some of this shouldn't be kwargs-only, such as list_models and list_datasets, where we've introduced the ModelFilter and DatasetFilter, as well as the filtering by CO2. As that doesn't mimic the rest api

@osanseviero
Copy link
Member

Question: do we have a place to put changelog/release logs? How do we handle those now?

When we make a new release we add release notes to https://github.com/huggingface/huggingface_hub/releases

@adrinjalali
Copy link
Contributor Author

When we make a new release we add release notes to https://github.com/huggingface/huggingface_hub/releases

But how do we keep track of the changelog? To me it makes sense to have a place where each PR which changes a user facing section to add the relevant entry to the changelog. And it can then be also rendered in the documentation.

@adrinjalali
Copy link
Contributor Author

I'd think some of this shouldn't be kwargs-only, such as list_models and list_datasets, where we've introduced the ModelFilter and DatasetFilter, as well as the filtering by CO2. As that doesn't mimic the rest api

I'm not sure where the CO2 filtering is @muellerzr . Could you please comment where it needs to change?

@muellerzr
Copy link
Contributor

@muellerzr
Copy link
Contributor

My other concern with deprecating the positional arguments is making it harder for the user to know what to pass in. Part of the goal with all these functions is that users in an IDE can rely on tab-completion to know what to pass in, and what is available. (Which is why we have ModelSearchArguments and DatasetSearchArguments). We should make sure that users have upfront what they should be able to pass in, if we completely remove those positional arguments.

@adrinjalali
Copy link
Contributor Author

@muellerzr that parameter is the fourth argument. I don't think mimicking the REST API is important here, as people using the library won't necessary even know what the REST API looks like. So I would rather keep that as kwonly.

@adrinjalali
Copy link
Contributor Author

My other concern with deprecating the positional arguments is making it harder for the user to know what to pass in.

When we decided to do this in sklearn, we had the same concerns. But at the end testing with IDEs we didn't see any major issues and the benefits outweighed the downsides.

@muellerzr
Copy link
Contributor

muellerzr commented Mar 7, 2022

@adrinjalali could you outline the benefits vs the downsides for me? It'd help a lot with getting me comfortable with the idea 😄

(I can quickly think of a few including code maintainability as the API changes, since everything is a kwarg. But say for instance a kwarg to be passed changed. How would you make sure there isn't technical debt with the docstring and write a check for it outside of just good reviewers)

@adrinjalali
Copy link
Contributor Author

So here's the SLEP (scikit-learn enhancement proposal) discussing the exact same issue: https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html with links to original discussions. I think almost everything applies here too.

(I can quickly think of a few including code maintainability as the API changes, since everything is a kwarg. But say for instance a kwarg to be passed changed. How would you make sure there isn't technical debt with the docstring and write a check for it outside of just good reviewers)

We can have docstring common tests which make sure every PR updates the docstrings of the methods/functions/classes they're changing as well:

(note that those tests are numpydoc specific, we'd need to write them for the google style docs)

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm enthusiastic about the change from positional + keyword argument to forced keyword arguments. I'm a bit less enthusiastic about converting all arguments (including current positional-only arguments) to forced-keyword arguments.

For example, I think this should be acceptable:

from huggingface_hub import hf_hub_url

hf_hub_url('bert-base-cased', 'config.json')

Whereas this should not:

  from huggingface_hub import hf_hub_url

- hf_hub_url('bert-base-cased', 'config.json', 'subfolder')  # unclear
+ hf_hub_url('bert-base-cased', 'config.json', subfolder='subfolder')  # clearer

src/huggingface_hub/utils/_deprecation.py Outdated Show resolved Hide resolved
src/huggingface_hub/utils/_deprecation.py Outdated Show resolved Hide resolved
@adrinjalali
Copy link
Contributor Author

I'm a bit less enthusiastic about converting all arguments (including current positional-only arguments) to forced-keyword arguments.

I agree. Changed a few of them, but in the hf_api.py we kinda need to have all of them kwarg only since we're deprecating the first two (name, org) in a few of those methods.

@adrinjalali
Copy link
Contributor Author

I can't run the test_snapshot_download locally due to this error (I have logged in with huggingface-cli login) and not sure how to fix it.

$ pytest -Werror::FutureWarning -x -vv tests/test_snapshot_download.py
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0 -- /home/adrin/miniforge3/envs/hf-sklearn/bin/python
cachedir: .pytest_cache
rootdir: /home/adrin/Projects/hf/hub, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0
collected 8 items                                                                                                                                                           

tests/test_snapshot_download.py::SnapshotDownloadTests::test_download_model FAILED                                                                                    [ 12%]

================================================================================= FAILURES ==================================================================================
_________________________________________________________________ SnapshotDownloadTests.test_download_model _________________________________________________________________

self = <huggingface_hub.hf_api.HfApi object at 0x7f91f63c7cd0>
token = 'XzfBoMLHCZMbsyafTEYrVFpZgghKKBIBONswDgQQmiBuSyLJRqbYKugFfcEJKGNFipOxvskWDqoAlfVrVfIkkIrWRfBmcBZiqbDDdfSlABRbhiiNlScmGlOWEWbGHfCR'

    def whoami(self, token: Optional[str] = None) -> Dict:
        """
        Call HF API to know "whoami".
    
        Args:
            token (``str``, `optional`):
                Hugging Face token. Will default to the locally saved token if not provided.
        """
        if token is None:
            token = HfFolder.get_token()
        if token is None:
            raise ValueError(
                "You need to pass a valid `token` or login by using `huggingface-cli login`"
            )
    
        path = f"{self.endpoint}/api/whoami-v2"
        r = requests.get(path, headers={"authorization": f"Bearer {token}"})
        try:
>           r.raise_for_status()

src/huggingface_hub/hf_api.py:461: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Response [401]>

    def raise_for_status(self):
        """Raises :class:`HTTPError`, if one occurred."""
    
        http_error_msg = ''
        if isinstance(self.reason, bytes):
            # We attempt to decode utf-8 first because some servers
            # choose to localize their reason strings. If the string
            # isn't utf-8, we fall back to iso-8859-1 for all other
            # encodings. (See PR #3538)
            try:
                reason = self.reason.decode('utf-8')
            except UnicodeDecodeError:
                reason = self.reason.decode('iso-8859-1')
        else:
            reason = self.reason
    
        if 400 <= self.status_code < 500:
            http_error_msg = u'%s Client Error: %s for url: %s' % (self.status_code, reason, self.url)
    
        elif 500 <= self.status_code < 600:
            http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)
    
        if http_error_msg:
>           raise HTTPError(http_error_msg, response=self)
E           requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/whoami-v2

../../../miniforge3/envs/hf-sklearn/lib/python3.9/site-packages/requests/models.py:960: HTTPError

The above exception was the direct cause of the following exception:

args = (<tests.test_snapshot_download.SnapshotDownloadTests testMethod=test_download_model>,), kwargs = {}, retry_count = 1

    def decorator(*args, **kwargs):
        retry_count = 1
        while retry_count < number_of_tries:
            try:
>               return function(*args, **kwargs)

tests/testing_utils.py:195: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <tests.test_snapshot_download.SnapshotDownloadTests testMethod=test_download_model>

    @retry_endpoint
    def setUp(self) -> None:
        if os.path.exists(REPO_NAME):
            shutil.rmtree(REPO_NAME, onerror=set_write_permission_and_retry)
        logger.info(f"Does {REPO_NAME} exist: {os.path.exists(REPO_NAME)}")
>       repo = Repository(
            REPO_NAME,
            clone_from=f"{USER}/{REPO_NAME}",
            use_auth_token=self._token,
            git_user="ci",
            git_email="ci@dummy.com",
        )

tests/test_snapshot_download.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <huggingface_hub.repository.Repository object at 0x7f91f63c7b20>, local_dir = 'dummy-hf-hub-16466637872524'
clone_from = '__DUMMY_TRANSFORMERS_USER__/dummy-hf-hub-16466637872524', repo_type = None
use_auth_token = 'XzfBoMLHCZMbsyafTEYrVFpZgghKKBIBONswDgQQmiBuSyLJRqbYKugFfcEJKGNFipOxvskWDqoAlfVrVfIkkIrWRfBmcBZiqbDDdfSlABRbhiiNlScmGlOWEWbGHfCR', git_user = 'ci'
git_email = 'ci@dummy.com', revision = None, private = False, skip_lfs_files = False

    def __init__(
        self,
        local_dir: str,
        clone_from: Optional[str] = None,
        repo_type: Optional[str] = None,
        use_auth_token: Union[bool, str] = True,
        git_user: Optional[str] = None,
        git_email: Optional[str] = None,
        revision: Optional[str] = None,
        private: bool = False,
        skip_lfs_files: bool = False,
    ):
        """
        Instantiate a local clone of a git repo.
    
        If specifying a `clone_from`:
        will clone an existing remote repository, for instance one
        that was previously created using ``HfApi().create_repo(name=repo_name)``.
        ``Repository`` uses the local git credentials by default, but if required, the ``huggingface_token``
        as well as the git ``user`` and the ``email`` can be explicitly specified.
        If `clone_from` is used, and the repository is being instantiated into a non-empty directory,
        e.g. a directory with your trained model files, it will automatically merge them.
    
        Args:
            local_dir (``str``):
                path (e.g. ``'my_trained_model/'``) to the local directory, where the ``Repository`` will be initalized.
            clone_from (``str``, `optional`):
                repository url (e.g. ``'https://huggingface.co/philschmid/playground-tests'``).
            repo_type (``str``, `optional`):
                To set when creating a repo: et to "dataset" or "space" if creating a dataset or space, default is model.
            use_auth_token (``str`` or ``bool``, `optional`, defaults to ``True``):
                huggingface_token can be extract from ``HfApi().login(username, password)`` and is used to authenticate against the hub
                (useful from Google Colab for instance).
            git_user (``str``, `optional`):
                will override the ``git config user.name`` for committing and pushing files to the hub.
            git_email (``str``, `optional`):
                will override the ``git config user.email`` for committing and pushing files to the hub.
            revision (``str``, `optional`):
                Revision to checkout after initializing the repository. If the revision doesn't exist, a
                branch will be created with that revision name from the default branch's current HEAD.
            private (``bool``, `optional`, defaults to ``False``):
                whether the repository is private or not.
            skip_lfs_files (``bool``, `optional`, defaults to ``False``):
                whether to skip git-LFS files or not.
        """
    
        os.makedirs(local_dir, exist_ok=True)
        self.local_dir = os.path.join(os.getcwd(), local_dir)
        self.repo_type = repo_type
        self.command_queue = []
        self.private = private
        self.skip_lfs_files = skip_lfs_files
    
        self.check_git_versions()
    
        if isinstance(use_auth_token, str):
            self.huggingface_token = use_auth_token
        elif use_auth_token:
            self.huggingface_token = HfFolder.get_token()
        else:
            self.huggingface_token = None
    
        if clone_from is not None:
>           self.clone_from(repo_url=clone_from)

src/huggingface_hub/repository.py:422: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <huggingface_hub.repository.Repository object at 0x7f91f63c7b20>, repo_url = 'https://huggingface.co/', use_auth_token = None

    def clone_from(self, repo_url: str, use_auth_token: Union[bool, str, None] = None):
        """
        Clone from a remote. If the folder already exists, will try to clone the repository within it.
    
        If this folder is a git repository with linked history, will try to update the repository.
        """
        token = use_auth_token if use_auth_token is not None else self.huggingface_token
        if token is None and self.private:
            raise ValueError(
                "Couldn't load Hugging Face Authorization Token. Credentials are required to work with private repositories."
                " Please login in using `huggingface-cli login` or provide your token manually with the `use_auth_token` key."
            )
        api = HfApi()
    
        if "huggingface.co" in repo_url or (
            "http" not in repo_url and len(repo_url.split("/")) <= 2
        ):
            repo_type, namespace, repo_id = repo_type_and_id_from_hf_id(repo_url)
    
            if repo_type is not None:
                self.repo_type = repo_type
    
            repo_url = ENDPOINT + "/"
    
            if self.repo_type in REPO_TYPES_URL_PREFIXES:
                repo_url += REPO_TYPES_URL_PREFIXES[self.repo_type]
    
            if token is not None:
>               whoami_info = api.whoami(token)

src/huggingface_hub/repository.py:536: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <huggingface_hub.hf_api.HfApi object at 0x7f91f63c7cd0>
token = 'XzfBoMLHCZMbsyafTEYrVFpZgghKKBIBONswDgQQmiBuSyLJRqbYKugFfcEJKGNFipOxvskWDqoAlfVrVfIkkIrWRfBmcBZiqbDDdfSlABRbhiiNlScmGlOWEWbGHfCR'

    def whoami(self, token: Optional[str] = None) -> Dict:
        """
        Call HF API to know "whoami".
    
        Args:
            token (``str``, `optional`):
                Hugging Face token. Will default to the locally saved token if not provided.
        """
        if token is None:
            token = HfFolder.get_token()
        if token is None:
            raise ValueError(
                "You need to pass a valid `token` or login by using `huggingface-cli login`"
            )
    
        path = f"{self.endpoint}/api/whoami-v2"
        r = requests.get(path, headers={"authorization": f"Bearer {token}"})
        try:
            r.raise_for_status()
        except HTTPError as e:
>           raise HTTPError(
                "Invalid user token. If you didn't pass a user token, make sure you are properly logged in by "
                "executing `huggingface-cli login`, and if you did pass a user token, double-check it's correct."
            ) from e
E           requests.exceptions.HTTPError: Invalid user token. If you didn't pass a user token, make sure you are properly logged in by executing `huggingface-cli login`, and if you did pass a user token, double-check it's correct.

src/huggingface_hub/hf_api.py:463: HTTPError

During handling of the above exception, another exception occurred:

args = (<tests.test_snapshot_download.SnapshotDownloadTests testMethod=test_download_model>,), kwargs = {}, retry_count = 1

    def decorator(*args, **kwargs):
        retry_count = 1
        while retry_count < number_of_tries:
            try:
                return function(*args, **kwargs)
            except HTTPError as e:
>               if e.response.status_code == 504:
E               AttributeError: 'NoneType' object has no attribute 'status_code'

tests/testing_utils.py:197: AttributeError
---------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------
ERROR    root:hf_api.py:432 HfApi.login: This method is deprecated in favor of `set_access_token`.
========================================================================== short test summary info ==========================================================================
FAILED tests/test_snapshot_download.py::SnapshotDownloadTests::test_download_model - AttributeError: 'NoneType' object has no attribute 'status_code'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================= 1 failed in 1.75s =============================================================================

@LysandreJik
Copy link
Member

Could you try running it with HUGGINGFACE_CO_STAGING=1 set as an environment variable?

@adrinjalali
Copy link
Contributor Author

Could you try running it with HUGGINGFACE_CO_STAGING=1 set as an environment variable?

Thanks, that fixes the issue.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! I've left a few comments, my remark regarding the positional vs keyword arguments wasn't only linked to hf_hub_url but to all positional/keyword arguments that you have changed.

It's especially the case for create_repo and delete_repo, for example.

src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
@@ -229,6 +234,7 @@ def _raise_if_offline_mode_is_enabled(msg: Optional[str] = None):


def _request_with_retry(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a @_deprecate_positional_args decorator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is private API, I don't think it needs to follow our deprecation policy here. Private API can change at any time w/o any guarantees for the user.

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
Comment on lines 955 to 959
@_deprecate_positional_args
def list_repo_files(
self,
*,
repo_id: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few others below I can't do with suggestions, so will stop there for suggestions on this kind of reordering.

@LysandreJik
Copy link
Member

Regarding the changelog question: for releases we use the "Auto-generate release notes" button for releases:

image

Then we manually add a description/example of the commits that had an impact on the user-facing API. See v0.4.0 for example.

@osanseviero osanseviero self-requested a review March 9, 2022 10:34
@adrinjalali
Copy link
Contributor Author

I'd wait for #733 to be merged before fixing this one.

@adrinjalali
Copy link
Contributor Author

@LysandreJik I think I've addressed your comments. Let me know if there are other places where you'd like to see positional args which are not included now.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only left a few comments, and @osanseviero I'd be happy for you to take a second look at the ones I tagged you on as I'm hovering between two possibilities.

src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
Copy link
Member

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Thank you! Approving pending @LysandreJik approval. I left some minor comments

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
]
args_msg = ", ".join(args_msg)
warnings.warn(
f"Pass {args_msg} as keyword args. From version "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

get_full_repo_name("repo_name") will give warning

Pass model_id=repo_name as keyword args.

but expected would be

Pass model_id="repo_name" as keyword args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can get quite complicated if we actually try to handle many different data types I think, but I single out string here then.

@adrinjalali
Copy link
Contributor Author

I think all comments should be addressed now. @LysandreJik please merge if you're happy with this. (The CI failure seems unrelated)

@adrinjalali
Copy link
Contributor Author

Opened #785 for the failing CI.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go for me! I'll let you merge in case you want to rebase/merge main to see if all tests pass; otherwise, feel free to merge at your convenience. Thanks for working on it @adrinjalali!

@adrinjalali adrinjalali merged commit c2ffc48 into huggingface:main Mar 29, 2022
@adrinjalali adrinjalali deleted the kwargs branch March 29, 2022 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate passing positional args to most of the public API
4 participants