Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloaded datasets do not cache at $HF_HOME #5546

Closed
ErfanMoosaviMonazzah opened this issue Feb 18, 2023 · 1 comment
Closed

Downloaded datasets do not cache at $HF_HOME #5546

ErfanMoosaviMonazzah opened this issue Feb 18, 2023 · 1 comment

Comments

@ErfanMoosaviMonazzah
Copy link

Describe the bug

In the huggingface course (https://huggingface.co/course/chapter3/2?fw=pt) it said that if we set HF_HOME, downloaded datasets would be cached at specified address but it does not. downloaded models from checkpoint names are downloaded and cached at HF_HOME but this is not the case for datasets, they are still cached at ~/.cache/huggingface/datasets.

Steps to reproduce the bug

Run the following code

from datasets import load_dataset
raw_datasets = load_dataset("glue", "mrpc")
raw_datasets

it downloads and store dataset at ~/.cache/huggingface/datasets

Expected behavior

to cache dataset at HF_HOME.

Environment info

python 3.10.6
Kubuntu 22.04
HF_HOME located on a separate partition

@lhoestq
Copy link
Member

lhoestq commented Feb 21, 2023

Hi ! Can you make sure you set HF_HOME before importing datasets ?

Then you can print

print(datasets.config.HF_CACHE_HOME)
print(datasets.config.HF_DATASETS_CACHE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants