From 0aaaf81d4f977093c0c038232cbeee97e0e858b1 Mon Sep 17 00:00:00 2001
From: osanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 16:34:07 +0100
Subject: [PATCH 1/8] Add docs on download stats

---
 docs/hub/_toctree.yml    | 2 ++
 docs/hub/datasets-faq.md | 8 ++++++++
 docs/hub/models-faq.md   | 8 ++++++--
 3 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 docs/hub/datasets-faq.md
diff --git a/docs/hub/_toctree.yml b/docs/hub/_toctree.yml
index ad3406ade..d7d9727ab 100644
--- a/docs/hub/_toctree.yml
+++ b/docs/hub/_toctree.yml
@@ -156,6 +156,8 @@
         title: File names and splits
       - local: datasets-manual-configuration
         title: Manual Configuration
+  - local: datasets-faq
+    title: Frequently Asked Questions
 - local: spaces
   title: Spaces
   isExpanded: true
diff --git a/docs/hub/datasets-faq.md b/docs/hub/datasets-faq.md
new file mode 100644
index 000000000..ee9216c68
--- /dev/null
+++ b/docs/hub/datasets-faq.md
@@ -0,0 +1,8 @@
+# Datasets Frequently Asked Questions
+
+## How are download stats generated for datasets?
+
+The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. This means that:
+
+* Whether the data is directly stored on the Hub repo or if the repository has a script to load the data from an external source, the download count is not impacted.
+* If a user manually downloads the data using tools like `wget` or through the Hub's user interface (UI), those downloads will not be included in the download count.
\ No newline at end of file
diff --git a/docs/hub/models-faq.md b/docs/hub/models-faq.md
index 890da38a5..32258c094 100644
--- a/docs/hub/models-faq.md
+++ b/docs/hub/models-faq.md
@@ -1,4 +1,4 @@
-# Frequently Asked Questions
+# Models Frequently Asked Questions
 
 ## How can I see what dataset was used to train the model?
 
@@ -42,4 +42,8 @@ If the model card includes a link to a paper on arXiv, the Hugging Face Hub will
 <img class="hidden dark:block" width="300" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datasets-arxiv-dark.png"/>
 </div>
 
-Read more about paper pages [here](./paper-pages).
\ No newline at end of file
+Read more about paper pages [here](./paper-pages).
+
+## How are download stats generated for models?
+
+Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. Every `GET` request to these files will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.

From 2b638fbda68f5c6f6ac52db5117dce527ab6cb30 Mon Sep 17 00:00:00 2001
From: osanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 16:59:07 +0100
Subject: [PATCH 2/8] Destroy the wall

---
 docs/hub/models-faq.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/hub/models-faq.md b/docs/hub/models-faq.md
index 32258c094..0a9fbc247 100644
--- a/docs/hub/models-faq.md
+++ b/docs/hub/models-faq.md
@@ -46,4 +46,6 @@ Read more about paper pages [here](./paper-pages).
 
 ## How are download stats generated for models?
 
-Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. Every `GET` request to these files will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.
+Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
+
+Every `GET` request to these files will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.

From b21de0af9f0930384d8f2379172567cb2aa34421 Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 19:13:19 +0100
Subject: [PATCH 3/8] Apply suggestions from code review

Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
---
 docs/hub/datasets-faq.md | 4 ++--
 docs/hub/models-faq.md   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/hub/datasets-faq.md b/docs/hub/datasets-faq.md
index ee9216c68..b7ef7e65f 100644
--- a/docs/hub/datasets-faq.md
+++ b/docs/hub/datasets-faq.md
@@ -4,5 +4,5 @@
 
 The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. This means that:
 
-* Whether the data is directly stored on the Hub repo or if the repository has a script to load the data from an external source, the download count is not impacted.
-* If a user manually downloads the data using tools like `wget` or through the Hub's user interface (UI), those downloads will not be included in the download count.
\ No newline at end of file
+* The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a script to load the data from an external source.
+* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.
\ No newline at end of file
diff --git a/docs/hub/models-faq.md b/docs/hub/models-faq.md
index 0a9fbc247..49b2db4d0 100644
--- a/docs/hub/models-faq.md
+++ b/docs/hub/models-faq.md
@@ -46,6 +46,6 @@ Read more about paper pages [here](./paper-pages).
 
 ## How are download stats generated for models?
 
-Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
+Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
 
-Every `GET` request to these files will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.
+Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.

From 747ca2b9f8b801c5d1dff79c5e79db7fa3b0226b Mon Sep 17 00:00:00 2001
From: osanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 19:19:55 +0100
Subject: [PATCH 4/8] Move to their own sections

---
 docs/hub/_toctree.yml                                    | 6 ++++--
 docs/hub/{datasets-faq.md => datasets-download-stats.md} | 2 +-
 docs/hub/models-download-stats.md                        | 7 +++++++
 docs/hub/models-faq.md                                   | 6 ------
 4 files changed, 12 insertions(+), 9 deletions(-)
 rename docs/hub/{datasets-faq.md => datasets-download-stats.md} (94%)
 create mode 100644 docs/hub/models-download-stats.md

diff --git a/docs/hub/_toctree.yml b/docs/hub/_toctree.yml
index d7d9727ab..24069f59c 100644
--- a/docs/hub/_toctree.yml
+++ b/docs/hub/_toctree.yml
@@ -108,6 +108,8 @@
         title: Widget Examples
   - local: models-inference
     title: Inference API docs
+  - local: models-download-stats
+    title: Models Download Stats
   - local: models-faq
     title: Frequently Asked Questions
   - local: models-advanced
@@ -156,8 +158,8 @@
         title: File names and splits
       - local: datasets-manual-configuration
         title: Manual Configuration
-  - local: datasets-faq
-    title: Frequently Asked Questions
+  - local: datasets-download-stats
+    title: Datasets Download Stats
 - local: spaces
   title: Spaces
   isExpanded: true
diff --git a/docs/hub/datasets-faq.md b/docs/hub/datasets-download-stats.md
similarity index 94%
rename from docs/hub/datasets-faq.md
rename to docs/hub/datasets-download-stats.md
index b7ef7e65f..84fc46cde 100644
--- a/docs/hub/datasets-faq.md
+++ b/docs/hub/datasets-download-stats.md
@@ -1,4 +1,4 @@
-# Datasets Frequently Asked Questions
+# Datasets Download Stats
 
 ## How are download stats generated for datasets?
 
diff --git a/docs/hub/models-download-stats.md b/docs/hub/models-download-stats.md
new file mode 100644
index 000000000..63b1e02f6
--- /dev/null
+++ b/docs/hub/models-download-stats.md
@@ -0,0 +1,7 @@
+# Models Download Stats
+
+## How are download stats generated for models?
+
+Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
+
+Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.
diff --git a/docs/hub/models-faq.md b/docs/hub/models-faq.md
index 49b2db4d0..0ec34bf63 100644
--- a/docs/hub/models-faq.md
+++ b/docs/hub/models-faq.md
@@ -43,9 +43,3 @@ If the model card includes a link to a paper on arXiv, the Hugging Face Hub will
 </div>
 
 Read more about paper pages [here](./paper-pages).
-
-## How are download stats generated for models?
-
-Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
-
-Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.

From 209404356478f4f577de67281e20047202fdf916 Mon Sep 17 00:00:00 2001
From: osanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 19:39:59 +0100
Subject: [PATCH 5/8] Open source all the way

---
 docs/hub/models-download-stats.md | 150 ++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/docs/hub/models-download-stats.md b/docs/hub/models-download-stats.md
index 63b1e02f6..003f1afe6 100644
--- a/docs/hub/models-download-stats.md
+++ b/docs/hub/models-download-stats.md
@@ -5,3 +5,153 @@
 Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
 
 Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.
+
+## Which are the query files for different libraries?
+
+By default, the Hub looks at `config.json`, `config.yaml`, `hyperparams.yaml`, and `meta.yaml`. For the following set of libraries, there are specific query files
+
+```json
+{
+    "adapter-transformers": {
+        filter: [
+            {
+                term: { path: "adapter_config.json" },
+            },
+        ],
+    },
+    "asteroid": {
+        filter: [
+            {
+                term: { path: "pytorch_model.bin" },
+            },
+        ],
+    },
+    "flair": {
+        filter: [
+            {
+                term: { path: "pytorch_model.bin" },
+            },
+        ],
+    },
+    "keras": {
+        filter: [
+            {
+                term: { path: "saved_model.pb" },
+            },
+        ],
+    },
+    "ml-agents": {
+        filter: [
+            {
+                wildcard: { path: "*.onnx" },
+            },
+        ],
+    },
+    "nemo": {
+        filter: [
+            {
+                wildcard: { path: "*.nemo" },
+            },
+        ],
+    },
+    "open_clip": {
+        filter: [
+            {
+                wildcard: { path: "*pytorch_model.bin" },
+            },
+        ],
+    },
+    "sample-factory": {
+        filter: [
+            {
+                term: { path: "cfg.json" },
+            },
+        ],
+    },
+    "paddlenlp": {
+        filter: [
+            {
+                term: { path: "model_config.json" },
+            },
+        ],
+    },
+    "speechbrain": {
+        filter: [
+            {
+                term: { path: "hyperparams.yaml" },
+            },
+        ],
+    },
+    "sklearn": {
+        filter: [
+            {
+                term: { path: "sklearn_model.joblib" },
+            },
+        ],
+    },
+    "spacy": {
+        filter: [
+            {
+                wildcard: { path: "*.whl" },
+            },
+        ],
+    },
+    "stanza": {
+        filter: [
+            {
+                term: { path: "models/default.zip" },
+            },
+        ],
+    },
+    "stable-baselines3": {
+        filter: [
+            {
+                wildcard: { path: "*.zip" },
+            },
+        ],
+    },
+    "timm": {
+        filter: [
+            {
+                terms: { path: ["pytorch_model.bin", "model.safetensors"] },
+            },
+        ],
+    },
+    "diffusers": {
+        /// Filter out nested safetensors and pickle weights to avoid double counting downloads from the diffusers lib
+        must_not: [
+            {
+                wildcard: { path: "*/*.safetensors" },
+            },
+            {
+                wildcard: { path: "*/*.bin" },
+            },
+        ],
+        /// Include documents that match at least one of the following rules
+        should: [
+            /// Downloaded from diffusers lib
+            {
+                term: { path: "model_index.json" },
+            },
+            /// Direct downloads (LoRa, Auto1111 and others)
+            {
+                wildcard: { path: "*.safetensors" },
+            },
+            {
+                wildcard: { path: "*.ckpt" },
+            },
+            {
+                wildcard: { path: "*.bin" },
+            },
+        ],
+        minimum_should_match: 1,
+    },
+    "peft": {
+        filter: [
+            {
+                term: { path: "adapter_config.json" },
+            },
+        ],
+    }
+}
+```
\ No newline at end of file

From 7975404994e9efec6831df7a6367e92a6ca79746 Mon Sep 17 00:00:00 2001
From: osanseviero <osanseviero@gmail.com>
Date: Fri, 17 Nov 2023 19:45:57 +0100
Subject: [PATCH 6/8] Update order and add to index

---
 docs/hub/_toctree.yml | 4 ++--
 docs/hub/index.md     | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/hub/_toctree.yml b/docs/hub/_toctree.yml
index 24069f59c..c9ba0ef02 100644
--- a/docs/hub/_toctree.yml
+++ b/docs/hub/_toctree.yml
@@ -151,6 +151,8 @@
     sections:
       - local: datasets-viewer-configure
         title: Configure the Dataset Viewer
+  - local: datasets-download-stats
+    title: Datasets Download Stats
   - local: datasets-data-files-configuration
     title: Data files Configuration
     sections:
@@ -158,8 +160,6 @@
         title: File names and splits
       - local: datasets-manual-configuration
         title: Manual Configuration
-  - local: datasets-download-stats
-    title: Datasets Download Stats
 - local: spaces
   title: Spaces
   isExpanded: true
diff --git a/docs/hub/index.md b/docs/hub/index.md
index ec13e8612..d7f0a8570 100644
--- a/docs/hub/index.md
+++ b/docs/hub/index.md
@@ -31,6 +31,7 @@ The Hugging Face Hub is a platform with over 350k models, 75k datasets, and 150k
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-tasks">Tasks</a>
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-widgets">Widgets</a>
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-inference">Inference API</a>
+<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./models-download-stats">Download Stats</a>
 </div>
 
 <div class="group flex flex-col space-y-2 rounded-xl border border-red-100 bg-gradient-to-br from-red-50 dark:bg-none px-6 py-4 transition-colors hover:shadow dark:border-red-700">
@@ -44,6 +45,7 @@ The Hugging Face Hub is a platform with over 350k models, 75k datasets, and 150k
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-downloading">Downloading Datasets</a>
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-libraries">Libraries</a>
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-viewer">Dataset Viewer</a>
+<a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-download-stats">Download Stats</a>
 <a class="!no-underline hover:opacity-60 transform transition-colors hover:translate-x-px" href="./datasets-data-files-configuration">Data files Configuration</a>
 </div>
 

From 26e58098a0d8daaf1ef3625c4a9439a1fd7c90f2 Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Mon, 20 Nov 2023 09:08:58 +0100
Subject: [PATCH 7/8] Update datasets-download-stats.md

---
 docs/hub/datasets-download-stats.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/hub/datasets-download-stats.md b/docs/hub/datasets-download-stats.md
index 84fc46cde..8f3c3a5db 100644
--- a/docs/hub/datasets-download-stats.md
+++ b/docs/hub/datasets-download-stats.md
@@ -2,7 +2,7 @@
 
 ## How are download stats generated for datasets?
 
-The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. This means that:
+The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads. This means that:
 
 * The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a script to load the data from an external source.
-* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.
\ No newline at end of file
+* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.

From 5f1705047e32ee9b931d5cf980a63470b3f72d52 Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Mon, 20 Nov 2023 09:09:27 +0100
Subject: [PATCH 8/8] Update models-download-stats.md

---
 docs/hub/models-download-stats.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/hub/models-download-stats.md b/docs/hub/models-download-stats.md
index 003f1afe6..4acfa9785 100644
--- a/docs/hub/models-download-stats.md
+++ b/docs/hub/models-download-stats.md
@@ -2,9 +2,9 @@
 
 ## How are download stats generated for models?
 
-Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. 
+Counting the number of downloads for models is not a trivial task as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models), and different formats depending on the library. To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads.
 
-Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`.
+Every HTTP request to these files, including `GET` and `HEAD` will be counted as a download. By default, when no library is specified, the Hub uses `config.json` as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as `pytorch_model.bin` and `adapter_config.json`. 
 
 ## Which are the query files for different libraries?
 
@@ -154,4 +154,4 @@ By default, the Hub looks at `config.json`, `config.yaml`, `hyperparams.yaml`, a
         ],
     }
 }
-```
\ No newline at end of file
+```