Skip to content

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Oct 28, 2021

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 28, 2021

💊 CI failures summary and remediations

As of commit 4f3ddb1 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's WIP, just added couple of FYI comments. Feel free to ignore if it's too early to address.

@pmeier pmeier marked this pull request as ready for review October 28, 2021 15:01
@pmeier pmeier requested a review from datumbox October 28, 2021 15:01
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @pmeier.

@NicolasHug could you also have a look as you are more familiar with CircleCi?

@pmeier pmeier requested a review from datumbox November 26, 2021 09:14
@pmeier pmeier marked this pull request as ready for review November 26, 2021 09:20
@pmeier pmeier changed the title enable caching of model weights for prototype CI Download model weights in parallel for prototype CI Nov 26, 2021
for line in file:
model_urls.update(MODEL_URL_PATTERN.findall(line))

print("\n".join(sorted(model_urls)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is a bit hacky, though admittedly you can't do much else prior compiling torchvision. If TorchVision was compiled, we could rely on the upcoming registration mechanism to get all available models and weights and then fetch their URLs. Since this is not possible for speed reasons, we might be forced to do something like that. The good thing is even if this fails, we will download the weights properly one-by-one later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how fast the download is, we could also do it in the foreground after the installation. Still much faster than downloading sequentially during the tests. Both variants are fine by me. You pick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 seconds sounds pretty good to me. It's your call. I don't mind either way.

@pmeier pmeier requested a review from datumbox November 26, 2021 13:01
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the awesome work @pmeier.

As discussed offline, this solution will be the basis for revamping our testing strategy for models. We will need to make things cross-platform, support cpu/gpu runs etc but that's something we can do on a follow up PR.

@pmeier pmeier merged commit 29f38f1 into pytorch:main Nov 26, 2021
@pmeier pmeier deleted the cache-model-weights branch November 26, 2021 21:58
facebook-github-bot pushed a commit that referenced this pull request Nov 30, 2021
Summary:
* enable caching of model weights for prototype CI

* syntax

* syntax

* make cache dir dynamic

* increase verbosity

* fix

* use larget CI machine

* revert debug output

* [DEBUG] test env var usage in save_cache

* retry

* use checksum for caching

* remove env vars because expansion is not working

* syntax

* cleanup

* base caching on model-urls

* relax regex

* cleanup skips

* cleanup

* fix skipping logic

* improve step name

* benchmark without caching

* benchmark with external download

* debug

* fix manual download location

* debug again

* download weights in the background

* try parallel download

* add missing import

* use correct decoractor

* up resource_class

* fix wording

* enable stdout passthrough to see download during test

* remove linebreak

* move checkout up

* cleanup

* debug failing test

* temp fix

* fix

* cleanup

* fix regex

* remove explicit install of numpy

Reviewed By: NicolasHug

Differential Revision: D32694305

fbshipit-source-id: 96a9ac5af170ca491edcedf0affdc338481befb8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants