Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files Disappear When Still Running #1045

Open
chuktuk opened this issue Mar 4, 2020 · 7 comments
Open

Files Disappear When Still Running #1045

chuktuk opened this issue Mar 4, 2020 · 7 comments
Labels

Comments

@chuktuk
Copy link

chuktuk commented Mar 4, 2020

Bug report for Colab: http://colab.research.google.com/.

For questions about colab usage, please use stackoverflow.

  • Describe the current behavior:
    I'm training a Flair NLP model. This is an RNN that requires a lot of time and memory, so I'm using paid Pro version. I tried the free version, and it would time out before the model could finish training.

When training is complete, a file containing the best model is output. I saw this file created originally during model training, however, after a few hours, in the file area it only says 'Connecting to a runtime to enable file browsing.' It will be very annoying and a waste of money if I can't retrieve the best model file after all this time.

  • Describe the expected behavior:
    The files area should not time out when the model is still running! If it does, then it should be intuitive to recover the files.

  • The web browser you are using (Chrome, Firefox, Safari, etc.):
    Chrome

  • Link to self-contained notebook that reproduces this issue
    (click the Share button, then Get Shareable Link):
    https://colab.research.google.com/drive/1nzQ9jBGlokVswHTWS-6prQEjdPmkQ0Sk

@chuktuk
Copy link
Author

chuktuk commented Mar 4, 2020

When this happens, the button that usually has the RAM and CPU usage bars just says 'Busy'.

@tnovikoff
Copy link

tnovikoff commented Mar 4, 2020

It sounds like you may be running into the resource limits (max vm lifetimes or idle timeouts) of Colab Pro.

Information about resource limits in Colab Pro can be found in the Colab Pro FAQ at http://colab.research.google.com/signup

Information about resource limits in Colab in general can be found in the main Colab FAQ at
https://research.google.com/colaboratory/faq.html

Information for getting the most out of Colab Pro can be found at
https://colab.research.google.com/notebooks/pro.ipynb

One specific approach that might work in your situation is saving your model file out to Google Drive while the connection is active.

More generally, this might be a good question for StackOverflow, where other users may be able to provide other ideas for what to do in this situation.

Best of luck!

@chuktuk
Copy link
Author

chuktuk commented Mar 4, 2020

I couldn't find anything on stack overflow regarding this type of issue. I found this link on Medium https://medium.com/@ml_kid/how-to-save-our-model-to-google-drive-and-reuse-it-2c1028058cb2 about saving in Google Drive, however, it doesn't explain HOW to save after every epoch. I'm using Flair on top of PyTorch, and it is creating the file automatically. I also don't see how I could start in the middle if it crashed after an epoch. Unless I find a solution to this, it looks like the paid version doesn't have enough resources for my needs.

@tnovikoff
Copy link

You could try asking your question on StackOverflow. :-) Best of luck either way.

@dexter2406
Copy link

When this happens, the button that usually has the RAM and CPU usage bars just says 'Busy'.

This happens to me too, do you have any solution now?

@chuktuk
Copy link
Author

chuktuk commented May 7, 2021 via email

@stmer1
Copy link

stmer1 commented May 23, 2021

I had this problem too, though for me, I found that if I hit Runtime>Interrupt execution, everything magically reappeared. I did not try restarting, since I decided it had probably trained enough, and I wanted to make sure to grab what it had done so far. Not the ideal situation, but at least I did not lose everything.

Good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants