-
Notifications
You must be signed in to change notification settings - Fork 533
Description
I was experimenting with the fileDownloadInfo
method from the @huggingface/hub
package, trying to add caching capabilities (related #908).
export async function fileDownloadInfo( |
I was comparing the behaviour of the python library github.com/huggingface/huggingface_hub, as they have a get_hf_file_metadata function doing something similar (might not be the same goal, but they have some clear similarities).
To explain my problem, I have some doubt with the way that the js function is getting the fileDownloadInfo, and specifically the etag. Let's compare.
JS vs PY request
First difference, the JS function is using the GET
method for the request.
huggingface.js/packages/hub/src/lib/file-download-info.ts
Lines 50 to 58 in 400ea89
const resp = await (params.fetch ?? fetch)(url, { | |
method: "GET", | |
headers: { | |
...(params.credentials && { | |
Authorization: `Bearer ${accessToken}`, | |
}), | |
Range: "bytes=0-0", | |
}, | |
}); |
In comparison, the python library is using the HEAD
method for the same url. See ref on huggingface/huggingface_hub#src/huggingface_hub/file_download.py#L1297
We can note the following difference between the JS and PY requests
options | JS | PY |
---|---|---|
follow redirect | ✔️ | 🟥 |
method | GET |
HEAD |
headers | Range: "bytes=0-0", |
Accept-Encoding: "identity" |
Etag value comparison
What are the difference of the requests, let's take the same file and compare the output of fileDownloadInfo
(JS) and get_hf_file_metadata
(PY)
const info = await fileDownloadInfo({
repo: {
name: "bert-base-uncased",
type: "model",
},
path: "tf_model.h5",
revision: "dd4bc8b21efa05ec961e3efc4ee5e3832a3679c7",
});
assert.strictEqual(info?.etag, '"41a0e56472bad33498744818c8b1ef2c-64"');
result = get_hf_file_metadata("https://huggingface.co/google-bert/bert-base-uncased/resolve/dd4bc8b21efa05ec961e3efc4ee5e3832a3679c7/tf_model.h5")
assert result.etag == "a7a17d6d844b5de815ccab5f42cad6d24496db3850a2a43d8258221018ce87d2"
Response HEADERS
For non-LFS file, there are no differences between the headers of the python and the js.
The PY library is getting the etag using the following logic
response.headers["X-Linked-Etag"] or response.headers["etag"]
But the JS is doing response.headers["etag"]
but the important part is the redirection
, the response of the JS method never contain the X-Linked-Etag
as this only appear on 30X requests for LFS file in our example, because it is resolving the download link.