-
-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wandb: Network error (SSLError), entering retry loop. #1227
Comments
I don't think it is related to software installed for all users on |
@kiristern looks like the speed of the network perhaps but in any case as per docs it definitely hurts the training time. Moreover, the following is a workaround that might be more relevant when working with large dataset(s): Initialize the mode as 'offline' as suggested in the docs while calling Line 74 in 9bb4e7f
which means it would look something like:
This would direct all the wandb logs into a dir. After the training is complete, you could sync it to view them on the dashboard by executing the |
I guess it doesn't make sense to include this in the codebase, at least for now, as in most cases, we'd like to have the live mode (which is by default) to see the updated changes on the wandb dashboard simultaneously. |
Thanks for the suggestion @kanishk16, not getting the error message after setting 'mode' = 'offline' |
Still, if you want the live mode (which is useful), we need to figure out what is wrong in your config. I don't think it's a network issue because I'm using the same computer and I don't experience this issue. |
#1253 could indirectly help |
@kiristern I presumed that you already tried what Julien suggested but I still wanted to confirm, does the error persist upon installing |
@kanishk16 thanks for following up... yes, i did try (also working in conda env) and it seemed to be working fine for my last training (was going to comment), but i just started getting the same error again (for another project)!:( |
AH! i just remembered about there being problems with calling data stored on |
Issue description
wandb: Network error (SSLError), entering retry loop.
interferes with training.Current behavior
The training still runs and I can see the metrics in wandb dashboard,
wandb: Network error resolved after 0:06:24.504729, resuming normal operation
. However, I think it really slows down the training as this occurs very frequently. From the wandb debug.log there is:Caused by SSLError(SSLError(1, '[SSL: KRB5_S_TKT_NYV] unexpected eof while reading (_ssl.c:1091)
wandb support said (April 2022):
(But i don't think I have permission to do so on the Neuropoly servers.)
Expected behavior
run without interruption.
Steps to reproduce
running normal training:
ivadomed --train -c config_Mod3DUnet_ax.json --path-data ../data/ --path-output ../results/
withbavaria-quebec
preprocessed data.config file
Environment
System description
NeuroPoly server, Rosenberg,
Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-53-generic x86_64)
Installed packages
on branch
mhb/1213-fix-3d-data-augmentation
from PR 1222Output of
pip freeze
The text was updated successfully, but these errors were encountered: