Where does the pre-trained bert model gets cached in my system by default? #2323

13Ashu · 2019-12-26T09:31:29Z

❓ Questions & Help

I used model_class.from_pretrained('bert-base-uncased') to download and use the model. The next time when I use this command, it picks up the model from cache. But when I go into the cache, I see several files over 400M with large random names. How do I know which is the bert-base-uncased or distilbert-base-uncased model? Maybe I am looking at the wrong place

shashankMadan-designEsthetics · 2019-12-26T11:39:20Z

AFAIK, the cache folder is hidden. You can download the files manually and the save them to your desired location two files to download is config.json and .bin and you can call it through pretrained suppose you wanted to instantiate BERT then do BertForMaskedLM.from_pretrained(Users/<Your location>/<your folder name>)

aaugustin · 2020-01-01T20:11:42Z

Each file in the cache comes with a .json file describing what's inside.

This isn't part of transformers' public API and may change at any time in the future.

Anyway, here's how you can locate a specific file:

$ cd ~/.cache/torch/transformers
$ grep /bert-base-uncased *.json
26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.json:{"etag": "\"64800d5d8528ce344256daf115d4965e\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt"}
4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.bf3b9ea126d8c0001ee8a1e8b92229871d06d36d8808208cc2449280da87785c.json:{"etag": "\"74d4f96fdabdd865cbdbe905cd46c1f1\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json"}
d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.json:{"etag": "\"41a0e56472bad33498744818c8b1ef2c-64\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5"}

Here, bert-base-uncased-tf_model.h5 is cached as d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.

aaugustin · 2020-01-01T20:15:47Z

The discussion in #2157 could be useful too.

Mayar2009 · 2020-01-14T06:08:36Z

Hi!
What if I use colab then how can I find the cash file? @aaugustin

kaniblu · 2020-05-01T09:49:10Z

For anyone landed here wondering if one can globally change the cache directory: set PYTORCH_TRANSFORMERS_CACHE environment variable in shell before running the python interpreter.

attardi · 2020-06-06T12:37:45Z

You can get find it the same way transformers do it:

from transformers.file_utils import hf_bucket_url, cached_path
pretrained_model_name = 'DeepPavlov/rubert-base-cased'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)

persunde · 2021-01-26T23:22:40Z

For me huggingface changed the default cache folder to:

~/.cache/huggingface/transformers

johntiger1 · 2021-02-02T19:31:28Z

You can get find it the same way transformers do it:

from transformers.file_utils import hf_bucket_url, cached_path
pretrained_model_name = 'DeepPavlov/rubert-base-cased'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)

Thank you, this worked for me!

Note that I had to remove the use_cdn option. Additionally, it does not seem to tell you where the vocab.txt and other files are located

Phobia-Cosmos · 2024-01-17T03:10:13Z

You can get find it the same way transformers do it:
from transformers.file_utils import hf_bucket_url, cached_path
pretrained_model_name = 'DeepPavlov/rubert-base-cased'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)
Thank you, this worked for me!

Note that I had to remove the use_cdn option. Additionally, it does not seem to tell you where the vocab.txt and other files are located

Note that the hf_bucket_url has been removed so you can use this now. ImportError: cannot import name 'hf_bucket_url' from 'transformers.file_utils' #22390

aaugustin closed this as completed Jan 1, 2020

billpku mentioned this issue Jan 14, 2020

The initialization of BertForTokenClassification,'NoneType' object is not iterable billpku/NLP_In_Action#2

Closed

Tiiiger mentioned this issue Aug 10, 2020

--model and --rescale-with-baseline can not be used together Tiiiger/bert_score#71

Closed

frankhart2018 mentioned this issue Apr 8, 2021

Problem with data download #11134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where does the pre-trained bert model gets cached in my system by default? #2323

Where does the pre-trained bert model gets cached in my system by default? #2323

13Ashu commented Dec 26, 2019 •

edited

Loading

shashankMadan-designEsthetics commented Dec 26, 2019

aaugustin commented Jan 1, 2020

aaugustin commented Jan 1, 2020

Mayar2009 commented Jan 14, 2020

kaniblu commented May 1, 2020

attardi commented Jun 6, 2020

persunde commented Jan 26, 2021

johntiger1 commented Feb 2, 2021

Phobia-Cosmos commented Jan 17, 2024

Where does the pre-trained bert model gets cached in my system by default? #2323

Where does the pre-trained bert model gets cached in my system by default? #2323

Comments

13Ashu commented Dec 26, 2019 • edited Loading

❓ Questions & Help

shashankMadan-designEsthetics commented Dec 26, 2019

aaugustin commented Jan 1, 2020

aaugustin commented Jan 1, 2020

Mayar2009 commented Jan 14, 2020

kaniblu commented May 1, 2020

attardi commented Jun 6, 2020

persunde commented Jan 26, 2021

johntiger1 commented Feb 2, 2021

Phobia-Cosmos commented Jan 17, 2024

13Ashu commented Dec 26, 2019 •

edited

Loading