[model weights caching] model upload doesn't check model weights hash #6916

stas00 · 2020-09-02T21:47:07Z

I have re-uploaded model weights via transformers-cli upload and noticed that when I tried to use it - it didn't get re-downloaded, and instead continued to use the cached version.

The problem seems to come from the fact that the other uploaded files haven't changed, only the model weights.

I double checked that the md5sum of the old weights file is different from the new one.

I re-uploaded the whole folder using:

transformers-cli upload fsmt-wmt19-en-de

If I hunt down the cached files (not an easy task), and delete those, it does re-download the new version.

If I diff the cached weights file and the updated cache file, which gets re-downloaded if I move away the original cached file, they aren't the same.:

Binary files 
before/d97352d9f1f96ee4c6055f203812035b4597258a837db1f4f0803a2932cc3071.53ce64c7097bfcd85418af04a21b4a897c78c8440de3af078e577727ad9de3a0 
and  
 after/d97352d9f1f96ee4c6055f203812035b4597258a837db1f4f0803a2932cc3071.53ce64c7097bfcd85418af04a21b4a897c78c8440de3af078e577727ad9de3a0 
differ

Could we please include the model weights file in the hash calculation?

Thank you.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2020-09-03T08:40:59Z

I can confirm it was previously checking the model weights and re-downloading if the weights had been changed. Investigating.

LysandreJik · 2020-09-03T10:42:25Z

This is due to the CDN caching files, with a 24 hour delay. After 24 hours it should download your file, but if you want it now you can use the use_cdn flag and set it to False. You can see the documentation for this here.

stas00 · 2020-09-04T00:51:49Z

Thank you for the hint, @LysandreJik. So from_pretrained(mname, use_cdn=False)

But that might be tricky for end users who won't know that the code base has changed yet the model weights they get are out sync.

Is there a way to signal CDN to invalidate the cache for some files? It could then be done from the upload util.

stas00 · 2020-09-04T01:04:27Z

FWIW, I wrote a one liner to force cache update for the 4 models I'm working at the moment.

PYTHONPATH="src" python -c 'from transformers import AutoModel; [AutoModel.from_pretrained("stas/fsmt-wmt19-"+p, use_cdn=False) for p in ["en-ru","ru-en","en-de","de-en"]]'

I now have that in my script, so I don't need to think about it.

stas00 · 2020-09-04T18:50:02Z

@LysandreJik, unfortunately this doesn't solve the issue

AutoModel.from_pretrained(mname, use_cdn=False)

Indeed forces a download of the recently updated model - but then if this flag is no longer used in the application - it still downloads the CDN cached version and ends up using the wrong version.

So, basically, this results in 2 copies (different hashes) sitting in the cache dir.

And normal usage w/o using use_cdn=False looks up the old version and not the new one. (so things like run_eval.py still use the old one)

Thanks.

julien-c · 2020-09-04T20:01:36Z

can you run AutoModel.from_pretrained(mname, use_cdn=False) in a debugger and check whether the downloaded url is a https://cdn.huggingface.co or a https://s3.amazonaws.com/models.huggingface.co url?

stas00 · 2020-09-04T20:08:48Z

I can do that, but I already checked that it downloads the updated model w/ use_cdn=False. But then if you run it again w/o use_cdn=False it ignores the new download and uses the old model again (if I delete the cached version, it redownloads the old cached version w/o use_cdn=False ).

julien-c · 2020-09-04T20:15:07Z

Oh yeah ok, I see. Can you run_eval.py on a local folder path then?

stas00 · 2020-09-04T20:17:17Z

Can you run_eval.py on a local folder path then?

Yes. Except others can't as they don't have my local copy.

e.g. @sshleifer wants to eval my PR #6940, but now has to wait till tomorrow for CDN to expire (or hack around it).

Last night I uploaded an experimental model, which proved to be invalid, thought I re-downloaded it OK as it was working OK and made a PR, except I was testing against the non-current cached version, which was a good one.

stas00 · 2020-09-06T02:57:23Z

Can we please re-open this ticket? It hasn't been resolved

julien-c · 2020-09-07T21:33:26Z

Can we add a --no_cdn boolean flag to run_eval.py that would then call AutoModelForSeq2SeqLM.from_pretrained(use_cdn=False)?

In our dev workflow we mostly don't use the cdn while the files are still in-flux. Cloudfront invalidation comes with its own set of issues so it's better to view cdn as a means to distribute permanent files. (for this reason we don't serve config.json files from Cloudfront)

stas00 · 2020-09-07T21:45:47Z

Can we add a --no_cdn boolean flag to run_eval.py that would then call AutoModelForSeq2SeqLM.from_pretrained(use_cdn=False)?

It could be done. I have a feeling then there will be others.

Perhaps an alternative solution would be to introduce an env var, that would transparently override cdn cache in any situation w/o needing to change every script? TRANSFORMERS_USE_CDN=False?

In our dev workflow we mostly don't use the cdn while the files are still in-flux. Cloudfront invalidation comes with its own set of issues so it's better to view cdn as a means to distribute permanent files. (for this reason we don't serve config.json files from Cloudfront)

Understood!

How do you let others onto testing the model files? Putting them on dropbox or something and sharing the link?

julien-c · 2020-09-08T16:17:19Z

No, just S3 links!

stale · 2020-11-07T21:18:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stas00 · 2020-11-08T00:59:03Z

#8324 should resolve this.

LysandreJik closed this as completed Sep 3, 2020

stas00 mentioned this issue Sep 4, 2020

[ported model] FSMT (FairSeq MachineTranslation) #6940

Merged

sshleifer reopened this Sep 6, 2020

stale bot added the wontfix label Nov 7, 2020

stas00 closed this as completed Nov 8, 2020

julien-c linked a pull request Nov 8, 2020 that will close this issue

Model versioning #8324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model weights caching] model upload doesn't check model weights hash #6916

[model weights caching] model upload doesn't check model weights hash #6916

stas00 commented Sep 2, 2020 •

edited

LysandreJik commented Sep 3, 2020

LysandreJik commented Sep 3, 2020

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 4, 2020

stas00 commented Sep 4, 2020 •

edited

julien-c commented Sep 4, 2020

stas00 commented Sep 4, 2020 •

edited

julien-c commented Sep 4, 2020

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 6, 2020

julien-c commented Sep 7, 2020

stas00 commented Sep 7, 2020 •

edited

julien-c commented Sep 8, 2020

stale bot commented Nov 7, 2020

stas00 commented Nov 8, 2020

[model weights caching] model upload doesn't check model weights hash #6916

[model weights caching] model upload doesn't check model weights hash #6916

Comments

stas00 commented Sep 2, 2020 • edited

LysandreJik commented Sep 3, 2020

LysandreJik commented Sep 3, 2020

stas00 commented Sep 4, 2020 • edited

stas00 commented Sep 4, 2020

stas00 commented Sep 4, 2020 • edited

julien-c commented Sep 4, 2020

stas00 commented Sep 4, 2020 • edited

julien-c commented Sep 4, 2020

stas00 commented Sep 4, 2020 • edited

stas00 commented Sep 6, 2020

julien-c commented Sep 7, 2020

stas00 commented Sep 7, 2020 • edited

julien-c commented Sep 8, 2020

stale bot commented Nov 7, 2020

stas00 commented Nov 8, 2020

stas00 commented Sep 2, 2020 •

edited

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 4, 2020 •

edited

stas00 commented Sep 7, 2020 •

edited