Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: error == cudaSuccess (2 vs. 0) out of memory #28

Closed
chiwing4 opened this issue Dec 16, 2018 · 22 comments
Closed

Check failed: error == cudaSuccess (2 vs. 0) out of memory #28

chiwing4 opened this issue Dec 16, 2018 · 22 comments
Assignees
Labels
help wanted Extra attention is needed type:Bug Something isn't working

Comments

@chiwing4
Copy link

chiwing4 commented Dec 16, 2018

K4YT3X Edit: Temporary Solution

The issue is caused by waifu2x-caffe not having sufficient memory. A temporary solution to this problem is to reduce the number of threads used.

Original Issue

Hi, the program failed and I am having the following exception when I use --gpu to enlarge.
I believe my machine has enough memory to run it.
It there any way to fix it?
Many thanks. This is a useful program.

[+] INFO: Reading video information
[+] INFO: Framerate: 59.94005994005994
[+] INFO: Starting to upscale extracted images
2018-12-16 22:27:58.707744 [+] INFO: [upscaler] Thread 3 started
2018-12-16 22:27:58.707744 [+] INFO: [upscaler] Thread 4 started
2018-12-16 22:27:58.708742 [+] INFO: [upscaler] Thread 0 started
2018-12-16 22:27:58.708742 [+] INFO: [upscaler] Thread 1 started
2018-12-16 22:27:58.709739 [+] INFO: [upscaler] Thread 2 started
Could not create log file: File exists
COULD NOT CREATE LOGFILE '20181216-222802.16176'!
F1216 22:28:02.183454 12168 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Could not create log file: File exists
COULD NOT CREATE LOGFILE '20181216-222802.15444'!
F1216 22:28:02.189438 10304 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Could not create log file: File exists
COULD NOT CREATE LOGFILE '20181216-222802.17140'!
F1216 22:28:02.238307  1736 syncedmem.cpp:78] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Could not create log file: File exists
COULD NOT CREATE LOGFILE '20181216-222802.22424'!
F1216 22:28:02.290169 10188 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Could not create log file: File exists
COULD NOT CREATE LOGFILE '20181216-222803.18460'!
F1216 22:28:03.546811  1888 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
2018-12-16 22:28:03.889894 [+] INFO: [upscaler] Thread 2 exiting
2018-12-16 22:28:04.882241 [+] INFO: [upscaler] Thread 1 exiting
2018-12-16 22:28:04.925128 [+] INFO: [upscaler] Thread 0 exiting
2018-12-16 22:28:04.962030 [+] INFO: [upscaler] Thread 4 exiting
2018-12-16 22:28:05.021869 [+] INFO: [upscaler] Thread 3 exiting
[+] INFO: Upscaling completed
@chiwing4
Copy link
Author

chiwing4 commented Dec 16, 2018

For your reference, just in case.
my command:

py video2x.py -v testvid.mp4 -o new.mp4 --width 3840 --height 2160 --gpu

using Python 3.7.1
spec:

  • 32GB RAM
  • RTX 2080ti
  • running in a disk with 450+GB space

Thanks

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 17, 2018

Thank you for the issue.

This looks weird. I saw the issue this morning but has yet located the problem. Just know that I've already see the problem and is working on it.

There's a big update coming soon. This issue might have to be fixed after the update which is already in process.

@k4yt3x k4yt3x self-assigned this Dec 17, 2018
@k4yt3x k4yt3x added the type:Bug Something isn't working label Dec 17, 2018
@chiwing4
Copy link
Author

Thanks for your quick reply!
I look forward to have the new update.

@asimonf
Copy link

asimonf commented Dec 17, 2018

Just hit this right now. Hopefully you'll be able to figure it out

@WSADKeysGaming
Copy link

This also happens to me, even after reducing the amount of threads from 5 to 4, because I have insufficient memory available

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 17, 2018

Could you please monitor the system to make sure that there's sufficient memory available for both the system RAM and the GPU RAM?

Btw I stuffed your output in a code block to make it easier to read.

@k4yt3x k4yt3x added the help wanted Extra attention is needed label Dec 17, 2018
@k4yt3x
Copy link
Owner

k4yt3x commented Dec 17, 2018

I was just reading some other similar errors, but I'm still not sure how to fix this problem on my side.
I still have quite a few questions:

  • Does CUDNN work?
  • Does single thread work?

Some other caffe-library-related issues as a reference:

@asimonf
Copy link

asimonf commented Dec 17, 2018

I followed your suggestions and it turns out that my system is running out of memory. I had to limit thread count to 2 threads on my particular test to get it to run. My laptop has a GTX 965M and it only has 4GB of VRAM. System memory was more than enough with over 11 GB of RAM free.

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 17, 2018

@asimonf so that is indeed the problem. My program is able to monitor the system memory and give suggestions on how many threads should be used, and warns the user when insufficient memory is available. However, this program doesn't have the capability to monitor GPU memory, which reflects as a CUDA error in @chiwing4 's ouput as "cudaSuccess out of memory".

I remember that I've already looked into that, but didn't find an elegant solution for monitoring cuda GPU memory. I will keep looking into it.

Therefore, for now, if this issue arises, reduce the number of threads.

@asimonf
Copy link

asimonf commented Dec 17, 2018

My current testing with your file shows around 1.4 GB of memory per thread. I don't know if memory usage grows with output size, but if it does, it might even more (testing output at 1440x1080). I'll test a bit more. A good suggestion could be to limit threads assuming a maximum usage of 2GB per thread?

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 17, 2018

Maybe it's possible. I'll have to look into if waifu2x-caffe has any options for that.

@chiwing4
Copy link
Author

chiwing4 commented Dec 18, 2018

OK I can confirm this is GPU out of memory problem.

GPU memory usage

  • when idle : 1.9GB
  • with 1 thread: 6GB
  • with 2 threads: 9.6GB
  • with 3 threads: out of memory

My testing show ~4GB VRAM per thread.

Output to 1080p and 4K has the same memory usage.
I though 2080ti could handle this well. lol
https://imgur.com/a/ALTNPpR

@chiwing4
Copy link
Author

I just enlarged the testvid.mp4 (240 frames, 320 x 240) to 1080p and 4K.
While VRAM usage are the same, process time are different as expected.
With 2 Threads:

  • out 1080p : 465 seconds
  • out 4K : 1778 seconds

4K needs nearly 4x computational time than 1080p. (Interesting. 4K has 4 times more pixels than 1080p)

@asimonf
Copy link

asimonf commented Dec 18, 2018

Interesting. I thought it would be much faster. I tried an OpenCL version on my house computer (It has a Vega 56 card) and it took about that long for the 1080 one. I might not be remembering accurately, so I'll try testing again tonight.

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 18, 2018

@asimonf I thought waifu2x only supports cuda? Maybe I'm wrong on that. I don't have an AMD card to test things out.

@asimonf
Copy link

asimonf commented Dec 18, 2018

Yes, but there's a rewrite in C++ that is compatible with CUDA and OpenCL. I haven't tested enough to see if quality is similar. I just tested speed yesterday. It might not be the same (but it does use the same models for the NN though, so it can't be that different). Here's the link: https://github.com/DeadSix27/waifu2x-converter-cpp.

I doubt it's faster than the original waifu2x on nvidia cards, but I don't know.

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 18, 2018

That's cool. It's fascinating how many derivations there are from the original waifu2x.

I feel like this thread is beginning to be like a forum.

@asimonf
Copy link

asimonf commented Dec 18, 2018

So I just tested it on the Vega 56 and it took 432 seconds to process all 240 frames with upscale and denoise to 1440x1080. I can't really compare it 100% to what waifu2x-caffe does with video2x, but apparently it isn't that slow compared to the 2080ti if (and this is a big if) they are working to generate similar quality images.

@dealvidit
Copy link

So with 2 threads, this works for me too. But I have 2 1080Ti. So can I spread the load on both the GPUs?

@k4yt3x
Copy link
Owner

k4yt3x commented Dec 25, 2018

@dealvidit spreading the load will unlikely be a thing that this program can control. It's more up to either waifu2x or caffe2.

@k4yt3x
Copy link
Owner

k4yt3x commented Feb 26, 2019

I'm getting some progress. I'm considering using the GPUtil library to monitor the GPU memory usage to solve this issue.

@k4yt3x
Copy link
Owner

k4yt3x commented Feb 26, 2019

I have just pushed update 2.4.2. If you have Nvidia GPU and CUDA drivers installed, the new version will now read output of nvidia-smi.exe to determine usable GPU memory prior to upscaling.

Although it doesn't have support for AMD GPUs, AMD GPUs must use waifu2x-converter-cpp. Therefore I'm not sure if the same problem will occure on that driver. If there is something like that on the other driver, please open a new issue and I'll see if I can fix it.

@k4yt3x k4yt3x closed this as completed Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed type:Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants