Skip to content

Conversation

axel7083
Copy link
Contributor

@axel7083 axel7083 commented Nov 15, 2024

Description

Following discussion #1024 and incompatibility of using the HEAD request to get the same etag as the python library is using for populating the cache directory.

This PR add the pathsInfo function that return the paths information including the LFS oid (or etag) if the file is a LFS pointer.

As suggested by @coyotte508 in #1024 (review)

Related issues

Fixes #1023 (provide an alternative method to fileDownloadInfo.

Tests

  • unit tests has been added

Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

You should export the function in lib/index.ts

@axel7083 axel7083 requested a review from coyotte508 November 15, 2024 11:24
@coyotte508 coyotte508 merged commit 8fc1f6e into huggingface:main Nov 15, 2024
4 checks passed
@julien-c
Copy link
Member

great contrib @axel7083

coyotte508 added a commit that referenced this pull request Nov 18, 2024
## Description

Following #1031 which
added a `pathsInfo` method which can return the etag/commitHash for a
given file. Allowing to be compliant with the
`_hf_hub_download_to_cache_dir`[^1] method from the python library.

[^1]:
[huggingface_hub/file_download.py#L882](https://github.com/huggingface/huggingface_hub/blob/c547c839dbbe0163e3ca422d017daad7c7f9361f/src/huggingface_hub/file_download.py#L882)

## Potential issue

The JS implementation do not handle the .lock files as the python
library does.. This could be a problem if using the JS and PY function..
?

The JS could make a basic implementation of the lock file that the PY
lib is doing if this is a hard requirement.

## Testing

I wrote tests for the existing `downloadFile` function (no change to the
implementation) and the new one added `downloadFileToCacheDir`.
 
- [x] unit tests has been added

---------

Co-authored-by: Eliott C. <coyotte508@gmail.com>
Co-authored-by: Eliott C. <coyotte508@protonmail.com>
coyotte508 added a commit that referenced this pull request Nov 19, 2024
## Description

We can now create a `snapshotDownload` method similator to the
`snapshot_download` of the PY lib[^1], clone to the cache (only cache
supported for now) a repository (either model, space or dataset)

[^1]:
https://huggingface.co/docs/huggingface_hub/en/guides/download#download-an-entire-repository

## Related issues/PR

With the amazing help of @coyotte508 we were able to merge the following
changes

- #1034
- #1031
- #999

Which allow this PR to provide a python compliant clone of a hugging
face repository to the cache directory.

## Testing

- [x] unit tests are covering the new feature

**Manually**

```ts
await snapshotDownload({
	repo: {
		name: 'OuteAI/OuteTTS-0.1-350M',
		type: 'model',
	},
});
```

assert using the `huggingface-cli` tool (python)
```
$: huggingface-cli scan-cache
REPO ID                             REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED     LAST_MODIFIED     REFS LOCAL PATH                                                                         
----------------------------------- --------- ------------ -------- ----------------- ----------------- ---- ---------------------------------------------------------------------------------- 
OuteAI/OuteTTS-0.1-350M             model           731.6M       14 5 minutes ago     5 minutes ago     main /home/axel7083/.cache/huggingface/hub/models--OuteAI--OuteTTS-0.1-350M
```

---------

Co-authored-by: Eliott C. <coyotte508@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(@huggingface/hub): fileDownloadInfo return an etag for LFS file which seems weird

3 participants