Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu-resume 관련해서 #111

Closed
MunJeongHyeon opened this issue Oct 13, 2021 · 7 comments
Closed

gpu-resume 관련해서 #111

MunJeongHyeon opened this issue Oct 13, 2021 · 7 comments
Labels
DONE QUESTION Further information is requested

Comments

@MunJeongHyeon
Copy link

❓ Questions & Help

gpu-resume 관련해서 질문이 있습니다. 지정해준 해당 체크포인트부터 학습이 진행될텐데 이전에 예를 들어 이전 초기 에폭을 10으로 설정했고, 만약 5에폭에서 중단되었다면, epoch 설정을 10으로 해야할까요 남은 에폭을 계산해서 설정해야할까요?

Details

@upskyy
Copy link
Member

upskyy commented Oct 13, 2021

max epoch로 설정해주시면 됩니다! 예시 같은 경우는 10으로 하면 되겠네요 ㅎㅎ!

@MunJeongHyeon
Copy link
Author

안녕하세요 las 모델 돌렸을 때, 122에폭에서 아래와 같은 에러가 발생하여, 다시 학습을 122에폭부터 돌리려고 해도 같은 에러가 계속해서 반복되어서 질문 남깁니다.
Screenshot from 2021-10-18 21-29-29

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

@upskyy
Copy link
Member

upskyy commented Oct 18, 2021

체크포인트 파일이 깨진 것 같은데, 이전 체크포인트로 resume 해보시겠어요? pytorch/#31620

@MunJeongHyeon
Copy link
Author

네 감사합니다. 해결했습니다.

hydra_eval 하는 과정에서 util.py에 있는 get_class_name이 제대로 import가 되지 않는 것 같습니다.!
Screenshot from 2021-10-21 00-41-38

@sooftware
Copy link
Member

감사합니다 수정해놓겠습니다

@sooftware
Copy link
Member

cc. @upskyy

@sooftware sooftware added BUG Something isn't working QUESTION Further information is requested labels Nov 30, 2021
@upskyy
Copy link
Member

upskyy commented Apr 23, 2022

related issues #86

@upskyy upskyy closed this as completed Apr 23, 2022
@upskyy upskyy added DONE and removed BUG Something isn't working labels Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DONE QUESTION Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants