Skip to content

bug(@huggingface/hub): fileDownloadInfo return an etag for LFS file which seems weird #1023

@axel7083

Description

@axel7083

I was experimenting with the fileDownloadInfo method from the @huggingface/hub package, trying to add caching capabilities (related #908).

export async function fileDownloadInfo(

I was comparing the behaviour of the python library github.com/huggingface/huggingface_hub, as they have a get_hf_file_metadata function doing something similar (might not be the same goal, but they have some clear similarities).

To explain my problem, I have some doubt with the way that the js function is getting the fileDownloadInfo, and specifically the etag. Let's compare.

JS vs PY request

First difference, the JS function is using the GET method for the request.

const resp = await (params.fetch ?? fetch)(url, {
method: "GET",
headers: {
...(params.credentials && {
Authorization: `Bearer ${accessToken}`,
}),
Range: "bytes=0-0",
},
});

In comparison, the python library is using the HEAD method for the same url. See ref on huggingface/huggingface_hub#src/huggingface_hub/file_download.py#L1297

We can note the following difference between the JS and PY requests

options JS PY
follow redirect ✔️ 🟥
method GET HEAD
headers Range: "bytes=0-0", Accept-Encoding: "identity"

Etag value comparison

What are the difference of the requests, let's take the same file and compare the output of fileDownloadInfo (JS) and get_hf_file_metadata (PY)

const info = await fileDownloadInfo({
	repo: {
		name: "bert-base-uncased",
		type: "model",
	},
	path: "tf_model.h5",
	revision: "dd4bc8b21efa05ec961e3efc4ee5e3832a3679c7",
});
assert.strictEqual(info?.etag, '"41a0e56472bad33498744818c8b1ef2c-64"');
result = get_hf_file_metadata("https://huggingface.co/google-bert/bert-base-uncased/resolve/dd4bc8b21efa05ec961e3efc4ee5e3832a3679c7/tf_model.h5")
assert result.etag == "a7a17d6d844b5de815ccab5f42cad6d24496db3850a2a43d8258221018ce87d2"

Response HEADERS

For non-LFS file, there are no differences between the headers of the python and the js.

The PY library is getting the etag using the following logic

response.headers["X-Linked-Etag"] or response.headers["etag"]

But the JS is doing response.headers["etag"] but the important part is the redirection, the response of the JS method never contain the X-Linked-Etag as this only appear on 30X requests for LFS file in our example, because it is resolving the download link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions