Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/hub/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@
title: Widget Examples
- local: models-inference
title: Inference API docs
- local: models-download-stats
title: Models Download Stats
- local: models-faq
title: Frequently Asked Questions
- local: models-advanced
Expand Down Expand Up @@ -149,6 +151,8 @@
sections:
- local: datasets-viewer-configure
title: Configure the Dataset Viewer
- local: datasets-download-stats
title: Datasets Download Stats
- local: datasets-data-files-configuration
title: Data files Configuration
sections:
Expand Down
8 changes: 8 additions & 0 deletions docs/hub/datasets-download-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Datasets Download Stats

## How are download stats generated for datasets?

The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads. This means that:

* The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a script to load the data from an external source.
* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.
2 changes: 2 additions & 0 deletions docs/hub/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ The Hugging Face Hub is a platform with over 350k models, 75k datasets, and 150k
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-tasks">Tasks</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-widgets">Widgets</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-inference">Inference API</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-download-stats">Download Stats</a>
</div>

<div class="group flex flex-col space-y-2 rounded-xl border border-red-100 bg-gradient-to-br from-red-50 dark:bg-none px-6 py-4 transition-colors hover:shadow dark:border-red-700">
Expand All @@ -44,6 +45,7 @@ The Hugging Face Hub is a platform with over 350k models, 75k datasets, and 150k
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-downloading">Downloading Datasets</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-libraries">Libraries</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-viewer">Dataset Viewer</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-download-stats">Download Stats</a>
<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-data-files-configuration">Data files Configuration</a>
</div>

Expand Down
157 changes: 157 additions & 0 deletions docs/hub/models-download-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Models Download Stats

## How are download stats generated for models?

Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads.

Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.

## Which are the query files for different libraries?

By default, the Hub looks at `config.json`, `config.yaml`, `hyperparams.yaml`, and `meta.yaml`. For the following set of libraries, there are specific query files

```json
{
"adapter-transformers": {
filter: [
{
term: { path: "adapter_config.json" },
},
],
},
"asteroid": {
filter: [
{
term: { path: "pytorch_model.bin" },
},
],
},
"flair": {
filter: [
{
term: { path: "pytorch_model.bin" },
},
],
},
"keras": {
filter: [
{
term: { path: "saved_model.pb" },
},
],
},
"ml-agents": {
filter: [
{
wildcard: { path: "*.onnx" },
},
],
},
"nemo": {
filter: [
{
wildcard: { path: "*.nemo" },
},
],
},
"open_clip": {
filter: [
{
wildcard: { path: "*pytorch_model.bin" },
},
],
},
"sample-factory": {
filter: [
{
term: { path: "cfg.json" },
},
],
},
"paddlenlp": {
filter: [
{
term: { path: "model_config.json" },
},
],
},
"speechbrain": {
filter: [
{
term: { path: "hyperparams.yaml" },
},
],
},
"sklearn": {
filter: [
{
term: { path: "sklearn_model.joblib" },
},
],
},
"spacy": {
filter: [
{
wildcard: { path: "*.whl" },
},
],
},
"stanza": {
filter: [
{
term: { path: "models/default.zip" },
},
],
},
"stable-baselines3": {
filter: [
{
wildcard: { path: "*.zip" },
},
],
},
"timm": {
filter: [
{
terms: { path: ["pytorch_model.bin", "model.safetensors"] },
},
],
},
"diffusers": {
/// Filter out nested safetensors and pickle weights to avoid double counting downloads from the diffusers lib
must_not: [
{
wildcard: { path: "*/*.safetensors" },
},
{
wildcard: { path: "*/*.bin" },
},
],
/// Include documents that match at least one of the following rules
should: [
/// Downloaded from diffusers lib
{
term: { path: "model_index.json" },
},
/// Direct downloads (LoRa, Auto1111 and others)
{
wildcard: { path: "*.safetensors" },
},
{
wildcard: { path: "*.ckpt" },
},
{
wildcard: { path: "*.bin" },
},
],
minimum_should_match: 1,
},
"peft": {
filter: [
{
term: { path: "adapter_config.json" },
},
],
}
}
```
4 changes: 2 additions & 2 deletions docs/hub/models-faq.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Frequently Asked Questions
# Models Frequently Asked Questions

## How can I see what dataset was used to train the model?

Expand Down Expand Up @@ -42,4 +42,4 @@ If the model card includes a link to a paper on arXiv, the Hugging Face Hub will
<img class="hidden dark:block" width="300" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datasets-arxiv-dark.png"/>
</div>

Read more about paper pages [here](./paper-pages).
Read more about paper pages [here](./paper-pages).