New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure on first epoch #4
Comments
@nhalsteadvt hmm, could you try upgrading your torchvision? |
@lucidrains I thought 1.7.1 was the latest version after checking back here |
@nhalsteadvt what is your current cuda version? I'm running 10.2 |
@lucidrains I believe I'm running 11.2. I'll try to reinstall pytorch with CUDA 10.2 / make 10.2 (which I have installed) the active version. |
@nhalsteadvt ohh sorry, actually i am running 11.1, so it should be fine! |
Verified working with CUDA 10.1 and PyTorch for CUDA 10.1 as well. |
@nhalsteadvt Still experiencing issues? |
@enricoros Yeah it's a different error now about CUDA out of memory. I thought 15.8 gigs of usable RAM was enough, but it seems something else is wrong. "RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.08 GiB already allocated; 1.16 MiB free; 4.18 GiB reserved in total by PyTorch)" This stuff really isn't my strongsuit, but this looks like I don't have something configured right to use my GPU or something. I have 70gigs of storage space if that means anything. |
@nhalsteadvt depends on the memory on the video card. With 8GB of Video mem (RTX 2070) I can run size=128 and size=256 images with no problem, but you need more memory for size=512 (stops after hundreds of iterations). What video card do you have? As alternative, you can run this project using the "simplified notebook" that you see on the home page, where the cards are NVIDIA T4s on the Google Cloud. |
@enricoros I've been using the notebook a bit, so that's cool. Task manager says I have 7.9GB of shared memory between my Intel and Nvidia graphics cards. However, the DirectX Diagnostic tool says I have 8095MB (8.095GB) of shared memory. edit: error now says "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)" |
@nhalsteadvt for the task manager stats, look at the "Dedicated GPU memory" value. The Shared GPU memory doesn't mean much (mine has 32GB shared, don't know where that's coming from). It's the dedicated that counts. For example, when running the code right now, I see "Dedicated GPU memory: 7.4/8.0GB" as roughly 90% of the GPU mem is allocated for this operation, As far as the CUDA errors. You should make sure that the CUDA installed in your system matches the PyTorch expectations. For instance, I don't have the latest CUDA, I have a stable one (10.2 on Windows) that can be accessed here: https://developer.nvidia.com/cuda-10.2-download-archive. And then when downloading PyTorch, I select the same combo (Windows, CUDA 10.2) on the website. Finally I even download CUDNN that matches the CUDA version here: https://developer.nvidia.com/rdp/cudnn-download#a-collapse805-102 (selecting 10.2). Yeah it ain't pretty to get a system working nice. |
opens folder where picture should be saved, but this error shows up immediately:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
torch version: 1.7.1
torch.cuda.is_available() == true
what am i missing?
The text was updated successfully, but these errors were encountered: