Skip to content

Releases: huggingface/huggingface_hub

Patch release v0.10.1

11 Oct 07:48
Compare
Choose a tag to compare

Hot-fix to force utf-8 encoding in modelcards. See #1102 and skops-dev/skops#162 (comment) for context.

Full Changelog: v0.10.0...v0.10.1

v0.10.0: Modelcards, cache management and more

28 Sep 07:26
Compare
Choose a tag to compare

Modelcards

Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub.

>>> from huggingface_hub import ModelCard

>>> card = ModelCard.load('nateraw/vit-base-beans')
>>> card.data.to_dict()
{'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}

Related commits

Related documentation

Cache management (huggingface-cli scan-cache and huggingface-cli delete-cache)

New commands in huggingface-cli to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub. Only the new cache-system format is supported.

➜ huggingface-cli scan-cache
REPO ID                     REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS                LOCAL PATH
--------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
glue                        dataset         116.3K       15 4 days ago    4 days ago    2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
google/fleurs               dataset          64.9M        6 1 week ago    1 week ago    refs/pr/1, main     /home/wauplin/.cache/
(...)

Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
Got 1 warning(s) while scanning. Use -vvv to print details.

Related commits

Related documentation

Better error handling (and http-related stuff)

HTTP calls to the Hub have been harmonized to behave the same across the library.

Major differences are:

  • Unified way to handle HTTP errors using hf_raise_for_status (more informative error message)
  • Auth token is always sent by default when a user is logged in (see documentation).
  • package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for hf_hub_download.

Related commits

  • Always send the cached token when user is logged in by @Wauplin in #1064
  • Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075
  • [Repository] Add better error message by @patrickvonplaten in #993
  • Clearer HTTP error messages in huggingface_hub by @Wauplin in #1019
  • Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038

Breaking changes

  1. For consistency, the return type of create_commit has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).
  • Return more information in create_commit output by @Wauplin in #1066
  1. Since repo_id is now validated using @validate_hf_hub_args (see below), a breaking change can be caused if repo_id was previously miused. A HFValidationError is now raised if repo_id is not valid.

Miscellaneous improvements

Add support for autocomplete

http-based push_to_hub_fastai

  • Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040

Check if a file is cached

  • try_to_load_from_cache returns cached non-existence by @sgugger in #1039

Get file metadata (commit hash, etag, location) without downloading

  • Add get_hf_file_metadata to fetch metadata from the Hub by @Wauplin in #1058

Validate arguments using @validate_hf_hub_args

  • Add validator for repo id + decorator to validate arguments in huggingface_hub by @Wauplin in #1029
  • Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031

⚠️ This is a breaking change if repo_id was previously misused ⚠️

Related documentation:

Documentation updates

Deprecations

  • ENH Deprecate clone_from behavior by @merveenoyan in #952
  • 🗑 Deprecate token in read-only methods of HfApi in favor of use_auth_token by @SBrandeis in #928
  • Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059
  • 1055 deprecate private and repo type in repository class by @Wauplin in #1057

Bugfixes & small improvements

  • Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021
  • enable http request retry under proxy by @MrZhengXin in #1022
  • Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026
  • 👽️ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048
  • Use /models/ path for api call to update settings by @Wauplin in #1049
  • Globally set git credential.helper to store in google colab by @Wauplin in #1053
  • FIX notebook login by @Wauplin in #1073

Windows-specific bug fixes

  • Fix default cache on windows by @thomwolf in #1069
  • Degraded but fully working cache-system when symlinks are not supported by @Wauplin in #1067
  • Check symlinks support per directory instead of globally by @Wauplin in #1077

Patch release v0.9.1

25 Aug 15:41
Compare
Choose a tag to compare

Hot-fix error message on gated repositories (#1015).

Context: https://huggingface.co/CompVis/stable-diffusion-v1-4 has been widely shared in the last days but since it's a gated-repo, lots of users are getting confused by the Authentification error received. Error message is now more detailed.

Full Changelog: v0.9.0...v0.9.1

v0.9.0: Community API and new `push_to_hub` mixins

23 Aug 12:22
Compare
Choose a tag to compare

Community API

Huge work to programmatically interact with the community tab, thanks to @SBrandeis !
It is now possible to:

  • Manage discussions (create_discussion, create_pull_request, merge_pull_request, change_discussion_status, rename_discussion)
  • Comment on them (comment_discussion, edit_discussion_comment)
  • List them (get_repo_discussions, get_discussion_details)

See full documentation for more details.

HTTP-based push_to_hub mixins

push_to_hub mixin and push_to_hub_keras have been refactored to leverage the http-endpoint. This means pushing to the hub will no longer require to first download the repo locally. Previous git-based version is planned to be supported until v0.12.

Miscellaneous API improvements

  • parent_commit argument for create_commit and related functions by @SBrandeis in #916
  • Add a helpful error message when commit_message is empty in create_commit by @sgugger in #962
  • ✨ create_commit: more user-friendly errors on HTTP 400 by @SBrandeis in #963
  • ✨ Add files_metadata option to repo_info by @SBrandeis in #951
  • Add list_spaces to HfApi by @cakiki in #889

Miscellaneous helpers (advanced)

Filter which files to upload in upload_folder

  • Allowlist and denylist when uploading a folder by @Wauplin in #994

Non-existence of files in a repo is now cached

  • Cache non-existence of files or completeness of repo by @sgugger in #986

Progress bars can be globally disabled via the HF_HUB_DISABLE_PROGRESS_BARS env variable or using disable_progress_bars/enable_progress_bars helpers.

  • Add helpers to disable progress bars globally + tests by @Wauplin in #987

Use try_to_load_from_cache to check if a file is locally cached

Documentation updates

Bugfixes & small improvements

Internal

v0.8.1: lazy loading, git-aware cache file layout, new create_commit

15 Jun 15:53
Compare
Choose a tag to compare

Git-aware cache file layout

v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download and hf_hub_download.
The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.

New create_commit API

A new create_commit API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:

  • upload_folder: Allows uploading a local directory to a repo.
  • delete_file allows deleting a single file from a repo.

upload_file now uses create_commit under the hood.

create_commit also allows creating pull requests with a create_pr=True flag.

None of the methods rely on Git locally.

Lazy loading

All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub as it will no longer load all soft dependencies.

Improvements and bugfixes

v0.7.0: Repocard metadata

30 May 12:18
Compare
Choose a tag to compare

Repocard metadata

This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.

Examples:

Starting from

existing_results = [{
    'dataset': {'name': 'IMDb', 'type': 'imdb'},
    'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
     'task': {'name': 'Text Classification', 'type': 'text-classification'}
}]

1. Overwrite existing metric value in existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["value"] = 0.999
_update_metadata_model_index(existing_results, new_results, overwrite=True)
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

2. Add new metric to existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["name"] = "Recall"
new_results[0]["metrics"][0]["type"] = "recall"
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
              {'name': 'Recall', 'type': 'recall', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

3. Add new result

new_results = deepcopy(existing_results)
new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}},
 {'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

Improvements and bug fixes

v0.6.0: fastai support, binary file support, skip LFS files when pushing to the hub

09 May 20:11
Compare
Choose a tag to compare

Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.

fastai support

v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4.
The integration is detailed in the following blog.

  • Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678

Automatic binary file tracking in Repository

Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track argument of the Repository.git_add method. It also introduces the Repository.auto_track_binary_files method which can be used independently of other methods.

skip_lfs_file is now added to mixins

The parameter skip_lfs_files is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.

  • ✨ add skip_lfs_files to mixins' push_to_hub by @nateraw in #858

Keras support improvement

The support for Keras model is greatly improved through several additions:

  • The save_pretrained_keras method now accepts a list of tags that will automatically be added to the repository.
  • Download statistics are now available on Keras models

Bugfixes and improvements

v0.5.1: Patch release

07 Apr 19:10
Compare
Choose a tag to compare

This is a patch release fixing a breaking backward compatibility issue.

Linked PR: #822

v0.5.0: Reference documentation, Keras improvements, stabilizing the API

07 Apr 19:09
Compare
Choose a tag to compare

Documentation

Version v0.5.0 is the first version which features an API reference. It is still a work in progress with features lacking, some images not rendering, and a documentation reorg coming up, but should already provide significantly simpler access to the huggingface_hub API.

The documentation is visible here.

Model & datasets list improvements

The list_models and list_datasets methods have been improved in several ways.

List private models

These two methods now accept the token keyword to specify your token. Specifying the token will include your private models and datasets in the returned list.

  • Support list_models and list_datasets with token arg by @muellerzr in #638

Modelcard metadata

These two methods now accept the cardData boolean argument. If set to True, the modelcard metadata will also be returned when using these two methods.

  • Include cardData in list_models and list_datasets by @muellerzr in #639

Filtering by carbon emissions

The list_models method now also accepts an emissions_trehsholds parameter to filter by carbon emissions.

Keras improvements

The Keras serialization and upload methods have been worked on to provide better support for models:

  • All parameters are now included in the saved model when using push_to_hub_keras
  • log_dir parameter for TensorBoard logs, which will automatically spawn a TensorBoard instance on the Hub.
  • Automatic model card

Contributing guide

A contributing guide is now available for the huggingface_hub repository. For any and all information related to contributing to the repository, please check it out!

Read more about it here: CONTRIBUTING.md.

Pre-commit hooks

The huggingface_hub GitHub repository has several checks to ensure that the code respects code quality standards. Opt-in pre-commit hooks have been added in order to make it simpler for contributors to leverage them.

Read more about it in the aforementionned CONTRIBUTING guide.

Renaming and transferring repositories

Repositories can now be renamed and transferred programmatically using move_repo.

  • Allow renaming and transferring repos programmatically by @osanseviero in #704

Breaking changes & deprecation

⛔ The following methods have now been removed following a deprecation cycle

list_repos_objs

The list_repos_objs and the accompanying CLI utility huggingface-cli repo ls-files have been removed.
The same can be done using the model_info and dataset_info methods.

  • Remove deprecated list_repos_objs and huggingface-cli repo ls-files by @julien-c in #702

Python 3.6

Python 3.6 support is now dropped as end of life. Using Python 3.6 and installing huggingface_hub will result in version v0.4.0 being installed.

⚠️ Items below are now deprecated and will be removed in a future version

  • API deprecate positional args in file_download and hf_api by @adrinjalali in #745
  • MNT deprecate name and organization in favor of repo_id by @adrinjalali in #733

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0: Tag listing, Namespace Objects, Model Filter

26 Jan 18:30
Compare
Choose a tag to compare

Tag listing

This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:

>>> from huggingface_hub import HfApi

>>> api = HfApi() 
>>> tags = api.get_model_tags()
>>> print(tags)
Available Attributes:
 * benchmark
 * language_creators
 * languages
 * licenses
 * multilinguality
 * size_categories
 * task_categories
 * task_ids

>>> print(tags.benchmark)
Available Attributes:
 * raft
 * superb
 * test

Namespace objects

With a goal of adding more tab-completion to the library, this PR introduces two objects:

  • DatasetSearchArguments
  • ModelSearchArguments

These two AttributeDictionary objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization and dataset (or model) _name as well through careful string splitting.

Model Filter

This PR introduces a new way to search the hub: the ModelFilter class.

It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:

f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")

From there, they can pass in this filter to the new list_models_by_filter function in HfApi to search through it:

models = api.list_modes(filter=f)

The API may then be used for complex queries:

args = ModelSearchArguments()
f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])

api.list_models_from_filter(f)

Ignoring filenames in snapshot_download

This PR introduces a way to limit the files that will be fetched by the snapshot_download. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.4.0