Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occur "cudamat.cudamat.CUDAMatException: CUBLAS error." when running multimodal_dbm example #69

Open
Demoscai opened this issue Jul 25, 2014 · 8 comments

Comments

@Demoscai
Copy link

hi nitish srivastava
@nitishsrivastava
I have problems when running multimodal_dbm example, like this:
Train Step: 0Traceback (most recent call last):
File "/home/meitu299/deepnet/deepnet/trainer.py", line 60, in
main()
File "/home/meitu299/deepnet/deepnet/trainer.py", line 54, in main
model.Train()
File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 631, in Train
self.GetTrainBatch()
File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 524, in GetTrainBatch
self.GetBatch(self.train_data_handler)
File "/home/meitu299/deepnet/deepnet/dbm.py", line 264, in GetBatch
super(DBM, self).GetBatch(handler=handler)
File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 512, in GetBatch
data_list = handler.Get()
File "/home/meitu299/deepnet/deepnet/datahandler.py", line 627, in Get
batch = self.gpu_cache.Get(self.batchsize, get_last_piece=self.get_last_piece)
File "/home/meitu299/deepnet/deepnet/datahandler.py", line 396, in Get
self.LoadData()
File "/home/meitu299/deepnet/deepnet/datahandler.py", line 327, in LoadData
self.data[i] = cm.CUDAMatrix(mat)
File "/home/meitu299/deepnet/cudamat/cudamat.py", line 195, in init
raise generate_exception(err_code)
cudamat.cudamat.CUDAMatException: CUBLAS error.

and the RAM is 8G and gpu memory is 3G in my computer ,CUDA6.0
I follow your INSTALL, but always happen this
could tell how to resolve this ? It's a bug?
thx

@cbalint13
Copy link

Try reduce batch size from 128 to 100.

@Demoscai
Copy link
Author

I have try to reduce batch size to 50, but it doesn't work

@jormansa
Copy link

Try to fix the value "gpu_memory" of your .pbtxt file to "2G" or "2.5G"

@Demoscai
Copy link
Author

thanks , that's OK

@tengshaofeng
Copy link

thanks to you in advanved
i have the similar problem.
when i run the example of ff,i set the steps from 1000000 to 10000, the batchsize from 100 to 10,the
gpu_memory from 2G to 0.1G,the main_memmory from 4G to 0.7G.
but when i come to the setp 499, it still comes to the problem like this:

File "/home/tbq/Downloads/deepnet-master/deepnet/softmax_layer.py", line 65, in GetLoss
perf.correct_preds = temp.sum()
File "/home/tbq/Downloads/deepnet-master/cudamat/cudamat.py", line720, in sum
return vdot(self,CUDAMatrix.ones.slice(0,self.shape[0]*self.shape[1]))
File "/home/tbq/Downloads/deepnet-master/cudamat/cudamat.py", line1650 in vdot
raise generate_exception(err_code.value)
cudamat.cudamat.CUDAMatException: CUBLAS error.

and the RAM is 1G and gpu memory is 256M in my computer ,CUDA5.5

when i try the dbm and rbm ,it is also comes to the problem
i want to know whether my cpu and gpu is not satisfy the demand.
thx

@tengshaofeng
Copy link

sorry,english is not mother tongue. in addition,gcc:4.6.3

@jnhwkim
Copy link

jnhwkim commented May 11, 2015

In my case, I decreased gpu_mem as 1G in run_all_dbn.sh though my gpu memory is 4G (NVIDIA GeForce GTX 780M 4096 MB).

@chaojiewang94
Copy link

thank you all , ruducing the gpu_mem really helps , and the code strat to work , but at the end of trainning the first layer , the bug happens again , is the gpu_mem still too large?
and what will happen if i reduce the gpu_mem

Thank you all a lot if anyone can help me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants