From c5b86f6b46ce6999be5b615c07974e2983d382f8 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Wed, 12 Nov 2025 15:00:42 +0100 Subject: [PATCH 1/2] Datasets Download Stats: tiny tweak --- docs/hub/datasets-download-stats.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/datasets-download-stats.md b/docs/hub/datasets-download-stats.md index 20d5a3693..f6ba29b31 100644 --- a/docs/hub/datasets-download-stats.md +++ b/docs/hub/datasets-download-stats.md @@ -2,7 +2,7 @@ ## How are downloads counted for datasets? -Counting the number of downloads for datasets is not a trivial task, as a single dataset repository might contain multiple files, from multiple subsets and splits (e.g. train/validation/test) and sometimes with many files in a single split. To solve this issue and avoid counting one person's download multiple times, we treat all files downloaded by a user (based on their IP address) within a 5-minute window as a single dataset download. This counting happens automatically on our servers when files are downloaded (through GET or HEAD requests), with no need to collect any user information or make additional calls. +Counting the number of downloads for datasets is not a trivial task, as a single dataset repository might contain multiple files, from multiple subsets and splits (e.g. train/validation/test) and sometimes with many files in a single split. To solve this issue and avoid counting one person's download multiple times, we treat all files downloaded by a user (based on their IP address) within a 5-minute window in a given repository as a single dataset download. This counting happens automatically on our servers when files are downloaded (through GET or HEAD requests), with no need to collect any user information or make additional calls. ## Before September 2024 From 45826f967274157470a89a5b8f4c5828e54e0e67 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Wed, 12 Nov 2025 15:02:31 +0100 Subject: [PATCH 2/2] Update datasets-download-stats.md --- docs/hub/datasets-download-stats.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/datasets-download-stats.md b/docs/hub/datasets-download-stats.md index f6ba29b31..5b7941052 100644 --- a/docs/hub/datasets-download-stats.md +++ b/docs/hub/datasets-download-stats.md @@ -6,7 +6,7 @@ Counting the number of downloads for datasets is not a trivial task, as a single ## Before September 2024 -The Hub used to provide download stats only for the datasets loadable via the `datasets` library. To determine the number of downloads, the Hub previously counted every time `load_dataset` was called in Python, excluding Hugging Face's CI tooling on GitHub. No information was sent from the user, and no additional calls were made for this. The count was done server-side as we served files for downloads. This means that: +The Hub used to provide download stats only for the datasets loadable via the `datasets` library. To determine the number of downloads, the Hub previously counted every time `load_dataset` was called in Python, excluding Hugging Face's CI tooling on GitHub. No information was sent from the user, and no additional calls were made for this. The count was done server-side as we served files for downloads. This meant that: * The download count was the same regardless of whether the data is directly stored on the Hub repo or if the repository has a [script](/docs/datasets/dataset_script) to load the data from an external source. * If a user manually downloaded the data using tools like `wget` or the Hub's user interface (UI), those downloads were not included in the download count.