Parse safetensors metadata #1855

Wauplin · 2023-11-22T17:31:29Z

Related to #1832 cc @LysandreJik @Narsil

There are 2 methods:

parse_safetensors_file_metadata => parses a safetensors file on the Hub => the "real" method that parses safetensors
get_safetensors_metadata => takes a repo and parses all safetensors files (if sharded with a model.safetensors.index.json file) => more focused towards transformers architecture (i.e. opinionated)

I tried to follow more or less the typescript implementation with similar error handling.

HuggingFaceDocBuilderDev · 2023-11-22T17:38:29Z

The documentation is not available anymore as the PR was closed or merged.

src/huggingface_hub/__init__.py

julien-c · 2023-11-23T12:19:25Z

src/huggingface_hub/hf_api.py

+
+        To parse metadata from a single safetensors file, use [`get_safetensors_metadata`].
+
+        For more details regarding the safetensors format, check out https://huggingface.co/docs/safetensors/index#format.


you'll be able to hyper-link to this implem from the safetensors doc too BTW (like i had done for the JS implem)

more generally let's always make sure we cross-link stuff as much as possible

LysandreJik

Thanks @Wauplin ! API looks great and works great.

src/huggingface_hub/hf_api.py

LysandreJik · 2023-11-24T08:35:12Z

src/huggingface_hub/hf_api.py

+        )
+        _headers = self._build_hf_headers(token=token)
+
+        # 1. Fetch first 100kb


Is this true? If I'm not mistaken @Narsil had told me 1MB but I didn't try it firsthand

Narsil mentioned me 100kb as a good starting point.

Out of curiosity I've made a quick empirical study. I parsed 2700 files from the top 1000 models tagged as safetensors-compatible on the Hub, sorted by downloads. Out of 2700 files,

maximum metadata header is 365kb

3.2% have a metadata header >=100kb

4.1% have a metadata header >=75kb

7.5% have a metadata header >= 50kb

18% have a metadata header >= 25kb

Given these numbers, 100kb looks like a good threshold. We could even lower it to 75kb but it's not worth it.

Wauplin · 2023-11-24T15:06:03Z

Thanks for the reviews @julien-c and @LysandreJik!

I'll keep in mind to update the hub docs once this is released (#1855 (comment))

Wauplin added 2 commits November 22, 2023 18:20

first draft

35a289d

make quality

9327aee

julien-c reviewed Nov 22, 2023

View reviewed changes

src/huggingface_hub/__init__.py Outdated Show resolved Hide resolved

Wauplin added 3 commits November 23, 2023 12:09

docs

ee5e289

tests + better errors

cba5c15

Merge branch 'mtain' into 1832-parse-safetensors-data

7a52356

Wauplin marked this pull request as ready for review November 23, 2023 12:08

Wauplin requested review from julien-c and LysandreJik November 23, 2023 12:08

julien-c approved these changes Nov 23, 2023

View reviewed changes

add parameter count

ed20690

LysandreJik approved these changes Nov 24, 2023

View reviewed changes

Wauplin added 7 commits November 24, 2023 12:21

Merge branch 'main' into 1832-parse-safetensors-data

59f1674

Use file_exists instead of list_repo_files

a4f4d02

fix optional __metadata__ + fix __metadata__ type + fix scalar tensors

2216a1c

fix file_exists and repo_exists on gated repos

014506a

add comment

1dd3b10

Merge branch 'mtain' into 1832-parse-safetensors-data

cea550e

more robust

dcd1e11

Wauplin merged commit 8f7e04e into main Nov 24, 2023
14 of 16 checks passed

Wauplin deleted the 1832-parse-safetensors-data branch November 24, 2023 15:06

Wauplin mentioned this pull request Dec 15, 2023

Safetensors metadata remote reader #1832

Closed

This was referenced Jan 4, 2024

Document huggingface_hub.get_safetensors_metadata huggingface/safetensors#417

Merged

Fix URL in get_safetensors_metadata docstring #1951

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse safetensors metadata #1855

Parse safetensors metadata #1855

Wauplin commented Nov 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading

julien-c Nov 23, 2023

julien-c Nov 23, 2023

LysandreJik left a comment

LysandreJik Nov 24, 2023

Wauplin Nov 24, 2023 •

edited

Loading

Wauplin commented Nov 24, 2023


		To parse metadata from a single safetensors file, use [`get_safetensors_metadata`].

		For more details regarding the safetensors format, check out https://huggingface.co/docs/safetensors/index#format.

Parse safetensors metadata #1855

Parse safetensors metadata #1855

Conversation

Wauplin commented Nov 22, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Nov 22, 2023 • edited Loading

julien-c Nov 23, 2023

Choose a reason for hiding this comment

julien-c Nov 23, 2023

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Nov 24, 2023

Choose a reason for hiding this comment

Wauplin Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

Wauplin commented Nov 24, 2023

Wauplin commented Nov 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading

Wauplin Nov 24, 2023 •

edited

Loading