-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Download model weights in parallel for prototype CI #4772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 4f3ddb1 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's WIP, just added couple of FYI comments. Feel free to ignore if it's too early to address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot @pmeier.
@NicolasHug could you also have a look as you are more familiar with CircleCi?
for line in file: | ||
model_urls.update(MODEL_URL_PATTERN.findall(line)) | ||
|
||
print("\n".join(sorted(model_urls))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is a bit hacky, though admittedly you can't do much else prior compiling torchvision. If TorchVision was compiled, we could rely on the upcoming registration mechanism to get all available models and weights and then fetch their URLs. Since this is not possible for speed reasons, we might be forced to do something like that. The good thing is even if this fails, we will download the weights properly one-by-one later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing how fast the download is, we could also do it in the foreground after the installation. Still much faster than downloading sequentially during the tests. Both variants are fine by me. You pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 seconds sounds pretty good to me. It's your call. I don't mind either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the awesome work @pmeier.
As discussed offline, this solution will be the basis for revamping our testing strategy for models. We will need to make things cross-platform, support cpu/gpu runs etc but that's something we can do on a follow up PR.
Summary: * enable caching of model weights for prototype CI * syntax * syntax * make cache dir dynamic * increase verbosity * fix * use larget CI machine * revert debug output * [DEBUG] test env var usage in save_cache * retry * use checksum for caching * remove env vars because expansion is not working * syntax * cleanup * base caching on model-urls * relax regex * cleanup skips * cleanup * fix skipping logic * improve step name * benchmark without caching * benchmark with external download * debug * fix manual download location * debug again * download weights in the background * try parallel download * add missing import * use correct decoractor * up resource_class * fix wording * enable stdout passthrough to see download during test * remove linebreak * move checkout up * cleanup * debug failing test * temp fix * fix * cleanup * fix regex * remove explicit install of numpy Reviewed By: NicolasHug Differential Revision: D32694305 fbshipit-source-id: 96a9ac5af170ca491edcedf0affdc338481befb8
cc @datumbox @pmeier @seemethere @bjuncek