Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of application memory when training the model in terminal #205

Closed
YichengZou626 opened this issue Oct 5, 2022 · 6 comments
Closed

Comments

@YichengZou626
Copy link

Hello, I was trying to train the model with my own data, but I got this error. How should I resolve this problem?
Screen Shot 2022-10-05 at 1 03 54 PM

@YichengZou626
Copy link
Author

I also try to use the school server to train the data but got this error message
Screen Shot 2022-10-08 at 12 08 44 PM

@YichengZou626
Copy link
Author

I set up running memory as 300GB. Is that still not enough for training?

@wasserth
Copy link
Collaborator

It definitely should not take 300 GB of memory. So something must be wrong. Maybe wrong pytorch installation. You should also try to make APEX work so it works with fp16. Maybe our input data is too big.

@YichengZou626
Copy link
Author

Got it! The code has been run for 10 hours, but it is still training. I only use 6 HCP subjects for training. Does it take so long to work?
Screen Shot 2022-10-12 at 1 35 32 PM

@wasserth
Copy link
Collaborator

Yes thats normal. Training can take 2-3 days.

@YichengZou626
Copy link
Author

Got it! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants