Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OSError: [Errno 38] Function not implemented #190

Open
irinushirka opened this issue Mar 11, 2021 · 4 comments
Open

[BUG] OSError: [Errno 38] Function not implemented #190

irinushirka opened this issue Mar 11, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@irinushirka
Copy link

Hi! I'm trying to train tf_efficientdet_d3 with your code on custom coco-like data. I'm training on Google Colab.
!python3 train.py '/content/drive/MyDrive/dataset' --model tf_efficientdet_d3 --num-classes 60 --pretrained -b 1 --save-images --log-interval 100 --epochs 10 --output '/content/drive/MyDrive/Colab Notebooks/PyTorch training/'
After the first epoch, I faced with this type of error:

Traceback (most recent call last):
  File "train.py", line 656, in <module>
    main()
  File "train.py", line 435, in main
    best_metric, best_epoch = saver.save_checkpoint(epoch=epoch, metric=eval_metrics[eval_metric])
  File "/usr/local/lib/python3.7/dist-packages/timm-0.4.5-py3.7.egg/timm/utils/checkpoint_saver.py", line 78, in save_checkpoint
    os.link(last_save_path, save_path)
OSError: [Errno 38] Function not implemented: '/content/drive/MyDrive/Colab Notebooks/PyTorch training/train/20210311-161812-tf_efficientdet_d3/last.pth.tar' -> '/content/drive/MyDrive/Colab Notebooks/PyTorch training/train/20210311-161812-tf_efficientdet_d3/checkpoint-0.pth.tar'

Something went wrong during the process of saving the checkpoint. I'll be grateful to recieve the solution of this problem or some tips that may help me to solve it. Thanks!

@irinushirka irinushirka added the bug Something isn't working label Mar 11, 2021
@rwightman
Copy link
Owner

@irinushirka colab isn't a normal filesystem, it's a FUSE filesystem on top of cloud storage and doesn't support hardlinks which the saver relies on for robust checkpoint saving (crash recovery). I'm aware of it but don't currently have a solution.

Looking out a few weeks to a month from now I plan to support saving into google storage buckets.

@dmatos2012
Copy link

Hi! Maybe not a permanent solution, but at least to get it working temporarily to understand your results, you can just change your output_dir to be
output_dir = "/content/output" and it will save in colab. Downside is you do have to manually download it before you go out of session and you lose the checkpoint forever. I did that and it worked fine for now

@mehshankhan
Copy link

Hi! Maybe not a permanent solution, but at least to get it working temporarily to understand your results, you can just change your output_dir to be output_dir = "/content/output" and it will save in colab. Downside is you do have to manually download it before you go out of session and you lose the checkpoint forever. I did that and it worked fine for now

Thanks, man you are my saver of day. It works actually.

@gallegi
Copy link

gallegi commented Jan 17, 2022

However absurd it is, we still need to manually copy the weights to gg drive now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants